Preliminary Evidence of the Use of Generative AI in Health Care Clinical Services: Systematic Narrative Review


IntroductionBackground

Generative artificial intelligence tools and applications (GenAI) systems automatically learn patterns and structures from text, images, sounds, animation, models, or other media inputs to generate new data with similar characteristics []. GenAI is used to search, write, and create models, computer codes, and art forms without human assistance. GenAI has emerged significantly in the current decade to help every industry through different products such as ChatGPT, Bing Chat, Bard, LLaMA, Stable Diffusion, Midjourney, and DALL-E [-]. Almost all industries share an optimistic vision, with significant investment in using GenAI to transform aspects of value chains [-]. However, similar to many other technology hypes, whether this optimism will translate to value outcomes or be a “fad or fashion” remains to be tested over time.

The adoption of GenAI in health care is emerging. Studies point to the use of GenAI in service interactions involving breast cancer diagnoses [], bariatric surgery [], cardiopulmonary resuscitation [], and breast cancer radiologic decision-making []. GenAI has the potential to transform by performing tasks at higher quality than humans, which may reduce errors associated with humans in expert domains such as cancer detection [] and neurological clinical decisions []. The rise of GenAI is also referred to as the “second machine age” [], whereby “instead of machines performing mechanical work they are taking on cognitive work exclusively in the human domain” []. Although these instances are encouraging, how exactly GenAI helps in health care processes needs to be articulated and evaluated to provide an understanding of use and value linkages [,]. Thus, we asked the following research questions (RQs) in this study: (1) How is GenAI used across different aspects of health care services? (RQ 1) and (2) What is the preliminary evidence of GenAI use across health care services? (RQ 2).

It is essential to explore these 2 RQs for several reasons. Exploring GenAI’s use in health care services is essential for realizing its potential benefits, addressing ethical concerns, and continually improving its applications to enhance patient care and the health care ecosystem. This impact spans different areas. For instance, GenAI can help analyze data to provide personalized treatment and tailor interventions. It has shown promise in improving diagnostic accuracy, with higher levels of accuracy in the interpretation of images and scans. AI applications can enhance patient engagement by providing personalized health recommendations, reminders for medications, and real-time monitoring of vital signs. On the provider side, GenAI can save costs by streamlining administrative tasks and improving efficiency, early disease detection, and preventive care. Similarly, knowing the preliminary evidence of GenAI use across health care services is crucial for making informed decisions, ensuring regulatory compliance, building trust, guiding research initiatives, and addressing ethical considerations. This sets the stage for the responsible and effective integration of GenAI into the health care landscape.

The impact of GenAI in health care depends on various factors, including the specific application, quality of data used for training, ethical considerations, and regulatory framework in place. Continuous monitoring, evaluation, and responsible deployment are essential to maximize the positive impact and mitigate potential negative consequences. For instance, artificial intelligence (AI) assists pathologists in diagnosing diseases from pathology slides, leading to faster and more accurate diagnoses and improving patient outcomes []. Analysis of oncology literature, clinical trial data, and patient records can help oncologists identify personalized, evidence-based treatment options for patients with cancer, potentially improving treatment decisions []. AI has been applied to analyze medical images for conditions such as diabetic retinopathy, aiding in early detection and intervention []. AI analyzes clinical and molecular data to help physicians make more informed decisions about cancer treatment and steer them toward personalized and effective therapies [].

Concerns about using GenAI remain because of algorithmic bias in predictive models that causes discrimination, unequal distribution of health care resources, and exacerbated health disparities []. Data privacy and the need for clear guidelines on AI in health care remain a gap, with reported misuse []. Misinterpretations or errors in algorithms can lead to incorrect diagnoses, specifically for image readings, which underscores the importance of human oversight in critical health care decisions []. Furthermore, implementing and maintaining AI systems can be costly, and overreliance on technology without sufficient human oversight may result in overlooking critical clinical nuances and potentially compromising patient care []. Therefore, it is essential to note that the impact of AI on health care is a dynamic and evolving field. Regular updates and scrutiny of the latest research and applications are necessary to understand the positive and negative aspects of GenAI in health care.

Using a literature scoping, review, and synthesis approach in this study, we evaluated the proportionate evidence of using GenAI to assist, guide, and automate clinical service functions. Technologies in general help standardize [], provide flexibility [], increase experience and satisfaction through relational benefits [], induce higher switching costs [], and enhance the overall quality [] and value [] of services. However, high technology may reduce personal touch, trust, and loyalty in service settings [-]. Complex technologies may introduce anxiety, confusion, and isolation [] or disconnection, disruption, and passivity stressors [] that can erode satisfaction, loyalty, and retention in service settings [,-]. Given the mixed evidence in previous research on the role of technology in services [,,], it is timely to assess to what extent GenAI may even have a role in shaping or disrupting health care services. Overall, the ground realities of the potential for emerging GenAI to benefit health care services rather than just being another knowledge and collation tool need to be assessed and reported to influence further research and practice activities.

Objectives

This study took a deep dive to review and synthesize preliminary evidence on how GenAI is used to assist, guide, and automate activities or functions during clinical service encounters in health care, with plausible indications for differential use. More evidence on the actual use is needed to assert that GenAI plays a considerable role in the digital transformation of health care. Therefore, this study aims to identify how GenAI is used in clinical settings by systematically reviewing preliminary evidence on its applications to assist, guide, and automate clinical activities or functions.


MethodsArticle Search and Selection Strategy

This study aims to identify how physicians use GenAI in clinical settings, as evidenced in published studies. The design of this study adheres to the protocols outlined in the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) statement [,]. provides a flowchart of this study’s article search and inclusion process.

Figure 1. Literature screening process for relevant articles on generative artificial intelligence (AI) tools and applications.

We focused our search exclusively on PubMed to ensure the credibility of this study’s medical or clinical service settings. PubMed is part of the National Library of Medicine and a trusted national source of peer-reviewed publications on medical devices, software applications, and techniques used in the clinical setting. We performed keyword searches to retrieve relevant GenAI publications in PubMed that used “artificial intelligence” anywhere in the text of the article written in English. The sampling period of the publications was from January 1, 2020, to May 31, 2023. The search yielded 42,459 results in the first round of identification of articles for evaluation.

Within PubMed’s classification system for articles, we used the “article type” that described the material presented in the article (eg, review, clinical trial, retracted publication, or letter). We used this article type feature in the PubMed classification system to identify peer-reviewed articles and other relevant types of publications that are pertinent to our study. A total of 52.02% (22,086/42,459) of the returned articles did not have an article type assigned from the 75 article types in PubMed’s classification system and were excluded from the study sample. We included clinical, multicenter, case report, news, evaluation, and validation studies. We excluded article types that were out of scope, such as uncategorized articles, government-funded studies, reviews, editorials, errata, opinion articles, nonscientific articles, retracted publications, and supplementary files. We also excluded preprint article types that were unlikely to have attracted attention. Errata or retracted publications (404/42,459, 0.95%), supplementary files (117/42,459, 0.28%), and 50 article types that had too few search returns (243/42,459, 0.57%) were also excluded.

The screening stage excluded review articles (6732/42,459, 15.86%) with an objective that was neither aligned with nor redundant to this study’s goal. Opinion articles such as editorials, letters, and commentaries were excluded (2455/42,459, 5.78%). Articles whose funding came from the government or a government agency were not considered because of a conflict of interest for the researchers of the evaluated study (8936/42,459, 21.05%), and preprint articles (77/42,459, 0.2%) were excluded because of lack of availability to the public. We also considered the full text availability of the article, and 32.39% (490/1513) of the articles were excluded in the eligibility stage.

The resulting set of records included 1023 publications. To ensure the credibility of the publication source, we used CiteScore (Elsevier) [] as a citation index to remove publication sources whose influence is limited. Any publication source whose citation index was unavailable or <10 was removed, resulting in 268 records.

In total, 2 raters, 1 author (DY) and 1 graduate assistant (BB), evaluated 161 articles. The 2 raters’ agreement was 91.93%, and the expected agreement was 82.99%. The κ score was 0.5252 (SE 0.0544; Z score=9.66; probability>Z score=0.0000). The author and the graduate student performed manual coding by reading the paper’s title, abstract, and introduction paragraph to gain a preliminary understanding of the study. After reading the abstract and introduction paragraph, each rater classified each article according to the definition of the 3 classes. For articles that were difficult to understand, the rater read the article further to gain a better understanding of the article. We defined clinical service settings to include the life cycle of physician encounters with patients for the diagnosis, prognosis, and management of health conditions. The research and development of drug discovery, for instance, was not considered. This process eliminated 107 records. The final data set of articles considered for this study was 161.

Ethical Considerations

The data collected for this study were obtained from publicly available sources. The study did not involve any interaction with users. Therefore, ethics approval was not required for this study.

Data Extraction and Categorization Process

We adopted a modified thematic synthesis approach for data analysis that involved coding the text, developing descriptive themes, and generating analytical themes []. Initially, each author coded each line of text extracted from the articles, assigning it to different dimensions. This line-by-line coding process facilitated identifying and capturing critical article information and concepts. Next, each author developed descriptive themes by grouping related codes and identifying common patterns or topics emerging from the coded data. These descriptive themes provided a broad overview of the various aspects of AI in the clinical service context. Building on the descriptive themes, each author generated analytical pieces to deepen the understanding and interpretation of the data. The analytical themes involved exploring relationships, connections, and implications within and across the articles, allowing for the extraction of meaningful insights.

Throughout the analysis process, all the authors engaged in extensive discussions to refine and finalize the results of the thematic synthesis. By collectively examining and interpreting the data, the research team ensured the robustness and reliability of the synthesized findings. Similar dimensions were then merged to generate the following 3 meaningful dimensions (assist, guide, and automate) and for relevance to the study objectives, as shown in . The researchers manually coded each article into several groups. They then tried to synthesize them into 1 of the 3 categories of assist, guide, and automate by looking at the title, abstract, and introduction (where applicable).

Textbox 1. Use of generative artificial intelligence tools and applications in clinical services in the reviewed articles (N=161).

Assist

Improve diagnostic accuracy or reduce error by accessing knowledge during clinical services (141/161, 87.6%) [-]Activities:Disease detection (19/161, 11.8%) [,,,,,,,,-]Diagnosis (14/161, 8.7%) [,-]Screening (12/161, 7.5%) [,,,,-]Service areas:Radiology (17/161, 10.6%) [-,,]Cardiology (12/161, 7.5%) [-,,-,]Gastrointestinal medicine (4/161, 2.5%) [-]Diabetes (6/161, 3.7%) [-]Approaches and methods:Deep learning (34/161, 21.1%) [,,,,,,,,,,,,,,,,,-]Machine learning (9/161, 5.6%) [,,,,,-]Image analysis (13/161, 8.1%) [,,,,,,,,,,,,]

Guide

Recommend treatment options, step-by-step instructions, or checklists to improve clinical services (13/161, 8.1%) [,,,,-]Personalized treatment plans (1/161, 0.6%) []Monitoring and managing (1/161, 0.6%) []

Automate

Minimize or eliminate human provider involvement in clinical services or follow-ups (7/161, 4.3%) [,,-]

In addition to manual coding by human researchers, we used ChatGPT (version 3.5; OpenAI) for automatic coding. ChatGPT-3.5 was used for speed and cost. ChatGPT-4 is less accessible to users who do not have the funds to pay for its monthly subscription. ChatGPT-3.5 training used one-shot learning using the standard user interface with the “foundational” mode, and no fine-tuning was performed. Future studies may use focused data sets for fine-tuning to improve classification accuracy. However, our study demonstrates that classification accuracy is high and robust even without fine-tuning. This procedure was implemented to check for any subjective bias and demonstrate AI’s potential use to complement the human coding process. The abstracts and introductions of these 161 articles were fed into ChatGPT using in-context or a few short learning processes that fine-tune a pair of domain-specific inputs and outputs to train, thereby enhancing the relevance and accuracy of ChatGPT’s automated coding output [,].

For instance, a sample of input we used in the study was the abstract, which summarizes the article. The output is the categories identified by the experts. ChatGPT learns how to code a set of articles by repeating the pair of inputs and outputs. One-shot learning, which consists of a single pair of inputs and outputs in general, performs as well as >2 samples and zero-shot learning. The benefits of in-context learning (ICL) in ChatGPT include enhanced relevance, where the foundational model becomes better at generating content for domain-specific tasks without additional training of the full model; controlled output such as developing a single word matching the desired coding category or variable; and reduced biases inherent in manual coding. We used the definitions provided in to train and restrict ChatGPT to choose only 1 of the 3 use-case categories. We further compared ChatGPT’s classification with expert coding and found a high level of agreement between the 2, with a κ score of 0.94.

As mentioned previously, the manual coding process involved the raters coding and evaluating each article. After each rater coded the article, the results were compared and discussed to further refine the classification definition and derive consensus on the final assignment of the article classification. This “gold standard” classification was compared with automatic coding performed by ChatGPT (version 3.5). Automatic coding was performed by ChatGPT-3.5. Classification training was performed using one-shot ICL. ChatGPT learns how to classify articles by being fed a pair of articles and classification labels. For example, a user can feed a prompt or use control tokens to indicate an article abstract and the label associated with the article. In our context, 3 articles and labels were fed to the interface. After this initial prompt session of training on 3 classification labels, subsequent interactions of providing only the article abstract with a prompt asking for a class label would return ChatGPT’s prompt completion. Alternatively, training could involve >1 example of the article and its label, which would then be called few-shot learning. To summarize, 161 articles were coded by ChatGPT-3.5 based on a single instance of ICL.


ResultsFindings From the Synthesis on the Use of GenAI to Assist in Different Aspects of Health Care Services

GenAI can improve clinical services in 3 ways. First, of the 161 articles, 141 (87.6%) reported using GenAI to assist services through knowledge access, collation, and filtering. The assistance of GenAI was used for disease detection (19/161, 11.8%) [,,,,,,,,-], diagnosis (14/161, 8.7%) [,-], and screening processes (12/161, 7.5%) [,,,,-,,] in the areas of radiology (17/161, 10.6%) [-,,], cardiology (12/161, 7.5%) [-,,-,], gastrointestinal medicine (4/161, 2.5%) [-], and diabetes (6/161, 3.7%) [-]. Thus, although the use of GenAI has percolated across almost all disease-relevant and main service–relevant areas in health care, it is mainly for assisting through knowledge access, collation, and filtering.

The use of GenAI in disease diagnosis has long-term implications. For instance, identifying “referrable” diabetic retinopathy using routinely collected data would help in population health planning and prevention [-]; however, rigorous testing and validation of the applications are critical before clinical implementation []. Similarly, using GenAI in remote care helps improve glycemia and weight loss [], yet challenges related to variable patient uptake and increased clinician participation necessitated by shared decision-making must be considered []. In radiology services, prediction models using deep learning and machine learning methods for predictive accuracy and as diagnostic aids have shown potential, and natural language processing has been used to improve readability by generating captions; however, studies report using high-quality images, highlighting the need for a future standardized pipeline for data collection and imaging detection.

In cardiology, AI analysis allows for early detection, population-level screening, and automated evaluation. It expands the reach of electrocardiography to clinical settings in which immediate interrogation of anatomy and cardiac function is needed and to locations with limited resources [-,,-,]. Nevertheless, there is evidence suggesting that integrating AI with patient data, including social determinants of health, enables disease prediction and early disease identification, which could lead to more precise and timely diagnoses, improving patient outcomes.

GenAI aids in diagnostic accuracy, although its focus on higher value creation in health care is limited. The articles in this review reported that they used deep learning (34/161, 21.1%) [,,,,,,,,,,,,,,,,,-], machine learning (9/161, 5.6%) [,,,,,-], and image analysis approaches of GenAI during the assistance process (13/161, 8.1%) [,,,,,,,,,,,,]. Knowledge access using GenAI has the potential to enable more options and flexibility in serving patients.

Evidence of GenAI Use for Guiding or Automation Services

Only 8.1% (13/161) of the studies provided insights into how GenAI is used to guide some services by seeking recommended treatment options, step-by-step instructions, or checklists to improve clinical services [,,,,-]. Of the 161 studies, 1 (0.6%) study sought personalized treatment plans and discussed monitored and managed service processes using GenAI []. Although this use category is nascent, GenAI can help provide speed efficiency and customized solutions in health services as in other contexts [,,].

Finally, only 4.3% (7/161) of the articles indicated the use of GenAI to automate any service functions that could minimize or eliminate human provider involvement. When used appropriately, automation provides a predictable, reliable, and faster experience everywhere, every time for all customers, which will be a standardized way to provide several health care services [,,-].

The use of GenAI in some instances of service automation and guidance may be in its infancy but is encouraging. Providers are trying to explore unique ways to use AI, which requires a set of steps such as understanding the current workflow and the changes needed or aspirational workflows and aligning or designing GenAI to help in the workflow. This is similar to modifying restaurant food delivery options to suit drive-in rather than sit-in options. The providers need some work to fully automate, streamline, or re-engineer the service functions using GenAI in the future.

Summary of Findings

To summarize our findings, in this study, we conducted a systematic scoping review of the literature on how GenAI is used in clinical settings by synthesizing evidence on its application to assist, guide, and automate clinical activities and functions. Of the 161 articles, 141 (87.6%) reported using GenAI to assist services through knowledge access, collation, and filtering. The assistance of GenAI was used for disease detection (19/161, 11.8%), diagnosis (14/161, 8.7%), and screening processes (12/161, 7.5%) in the areas of radiology (17/161, 10.6%), cardiology (12/161, 7.5%), gastrointestinal medicine (4/161, 2.5%), and diabetes (6/161, 3.7%). Thus, we conclude that GenAI mainly informs rather than assisting and automating service functions. Presumably, the potential in clinical service is there, but it has yet to be actualized for GenAI.

Robustness Check Using Additional Database Search

To ensure the comprehensiveness and robustness of our findings, we expanded the search to Web of Science using similar keywords and strategies (suggested by the review team). We used the same keyword, “artificial intelligence,” in all text fields over the sampling period between January 1, 2020, and November 27, 2023. Our search was restricted to peer-reviewed academic journal articles written in English. We used the Web of Science–provided “Highly Cited Papers” criterion as a filtering mechanism to follow influential papers. Given the nonclinical context of the journals in the database, we believe that filtering based on the article’s importance is reasonable. Initial search results included 1958 articles from the Web of Science Core Collection. The preliminary analysis of the annual breakdown comprised 414 articles in 2023, a total of 651 articles in 2022, a total of 519 articles in 2021, and a total of 374 articles in 2020. The search results were further reduced by removing PubMed articles for redundancy, resulting in 1221 articles.

Next, Web of Science journals include medical, nonmedical, and other clinical journals. Thus, we used simple keywords for filtering nonmedical and clinical contexts. We used the keywords “medical” and “health” mentioned in the abstract, which led to 133 articles. Finally, we read the abstracts and titles to exclude survey or meta-review and nonclinical studies. This process further narrowed down the selection to 51 relevant articles. Using ChatGPT-3.5 on November 27, 2023, we applied one-shot learning by providing 3 class definitions. We asked ChatGPT-3.5 to classify the article’s abstract, with 63% (32/51) in the assist category, 29% (15/51) in the guide category, and 8% (4/51) in the automated category. Diagnostic assistance articles dominated, similar to the results from PubMed. However, the other categories—prescriptive guidance and clinical service recommendations—were slightly higher. This difference is explained by the nonmedical and clinical nature of the journals included in the database. The “applied” nature of the journals is more likely to explore prescriptive guidance and clinical service recommendation use cases.


DiscussionPrincipal Findings

This study asked RQs about how GenAI is used, with evidence, to shape health care services. It showed that 11.8% (19/161) of the studies were on automation and guidance, whereas 87.6% (141/161) reflected the assistance role of GenAI. These findings are essential to discuss and distinguish between the optimism and actual use of GenAI in health care.

Study Implications

The aspiration that GenAI has the potential to change health care significantly needs a careful revisit. Health care organizations need to assess the actual ground use for GenAI and prepare for and understand the exciting possibilities with a cautious approach rather than overly high expectations. Concerns related to the cost, privacy, misuse, and regulatory aspects of implementing and using GenAI [-] will become more pronounced, particularly when there is a perceived overreliance without clear promising results or actual practical use [].

The literature synthesis in this study suggests that GenAI is mainly used for screening and diagnostic purposes using knowledge access; diagnostic processes such as predicted disease outcomes, survival, or disease classification; and improvement of the accuracy of diagnosis. This solves the problem of knowledge being available and accessible in time in a well-articulated manner to provide or render the services. This could help health care providers make more accurate and timely diagnoses, leading to earlier treatment and better patient outcomes. Such knowledge distillation helps improve diagnostic accuracy through GenAI, which can provide enough knowledge to physicians during service encounters; however, this is not hugely oriented toward higher value creation in health care.

The research synthesis also suggests that there has been some use of GenAI during different steps and aspects of guiding the service delivery processes. Still, such use could be more encouraging and significant across the board. Plausibly, GenAI can analyze large amounts of disparate data from patients to suggest personalized medicine—which may help inform treatment plans for individuals. Service delivery needs some guidance or step-by-step help to be efficient and meet the duration or time requirements to render the clinical service on time, which GenAI may solve. However, we have not yet found strong evidence for such use by any health system.

Currently, the automation of service functions using GenAI has only seen minimal instances and is yet to see widespread implementation. Automation helps offset some manual activities. However, automation may help in service functions’ cost, efficiency, and flexibility while maintaining some standards across similar services.

Similarly, although we did not consider this area in the synthesis as it was out of the scope of services, GenAI can also be used in drug development and clinical trial pathways—a value proposition yet to be seen in practice. However, we do not undermine that many laboratories and pharmaceutical companies have used machine learning and AI tools and techniques in drug development and clinical trials. However, reported commercial GenAI use has not come to the limelight.

Some other plausible uses of GenAI in health care include managing supply chain data, managing medical equipment assets, maintaining gadgets and equipment, and building a robust intelligent information infrastructure to support several other activities. For example, active efforts are being undertaken to incorporate GenAI, especially in administrative use cases such as the In Basket patient messaging applications. However, assessing the clinical accuracy of such tools remains a concern.

In addition, we must incorporate user-centered design and sociotechnical frameworks into designing and building GenAI for health care use cases, for instance, to explore how GenAI can prevent a common pitfall of developing models opportunistically—based on data availability or end-point labels, adopting a user-centered design framework is vital for GenAI tools []. Similarly, scientific or research-oriented use of GenAI for knowledge search, articulation, or synthesis is helpful []. However, how far that will translate to the transformative clinical health care delivery processes while creating higher-order organizational capabilities to create value remains a concern [].

Limitations of the Study and Scope for Future Research

Several limitations and constraints affect the interpretation and generalizability of the findings of this study. Some of these limitations indicate the need for future research in relevant areas that we discuss further. First, the study’s findings were constrained by the availability of relevant and high-quality publications and the exclusion of preprints and unpublished data to limit the specifically designed scope of the study on using GenAI in health care clinical services, which influences the comprehensiveness and accuracy of the review. There also might be a tendency for studies with positive or significant results to be published, leading to a potential publication bias. In addition, harmful or neutral findings may not be adequately represented in the review, influencing the overall assessment of GenAI's effectiveness in health care. Research should focus on patient-centered outcomes, including patient satisfaction and engagement and the impact of GenAI on the patient-provider relationship. Understanding the patient perspective is crucial for successfully integrating AI technologies into health care.

Second, the field of GenAI in health care is rapidly advancing, and new technologies and applications are continuously emerging. The findings of this study might not capture the most recent developments, and the ’conclusions of this study may become outdated quickly, specifically when some technologies have the potential to be adopted beyond institutional mechanisms, such as using GenAI mobile apps to scan images for retinopathy. Furthermore, an in-depth analysis of specific GenAI applications may open newer directions, and future research should focus on specific GenAI applications to provide detailed insights into their effectiveness and limitations. This could include applications such as diagnostic tools, treatment planning algorithms, and predictive analytics. Such heterogeneity of GenAI in health care encompasses a wide range of applications, and investigating these could make it challenging to draw overarching conclusions about GenAI’s impact on clinical services.

Third, this review may not comprehensively address ethical considerations and potential biases in the use of GenAI in health care. Ethical issues related to data privacy, algorithmic bias, and the responsible deployment of AI technologies may require more in-depth exploration. Future research should systematically explore the ethical considerations associated with GenAI use in health care. This includes issues related to data privacy, consent, transparency, and the ethical deployment of AI algorithms in clinical settings. Finally, more data, papers, articles, and longitudinal developments on some applications may enrich this study and enhance its current limited generalizability. Longitudinal studies are needed to track the impact of GenAI in health care over an extended period. This will help researchers understand the sustained effects, identify potential challenges that may arise over time, and assess the scalability and adaptability of these technologies.

Future studies could undertake comparative effectiveness research to assess how GenAI compares with traditional approaches in health care. Understanding the relative advantages and disadvantages will contribute to evidence-based decision-making. In addition, it is not clear what and how to measure the GenAI applications’ effectiveness in clinical services, leading to a call for standardized study metrics that can incorporate outcome measures and evaluation frameworks. Future research should investigate how the integration of GenAI into clinical health care services affects the workflow of health care providers. This includes understanding the time savings, challenges, and potential improvements in decision-making processes. By addressing these areas, future research can contribute to a more comprehensive understanding of the role, challenges, and potential benefits of GenAI in clinical health care services.

Actionable Policy and Practice Recommendations

The proliferation of technology often outpaces the development of appropriate regulatory and policy frameworks that are necessary for guiding proper dissemination. Our call is that, given that GenAI is emerging, policy agencies and health care organizations play a role in proactively guiding the use of GenAI in health care organizations.

What are some actionable steps for stakeholders, including health care organizations and policy makers, to navigate the integration of GenAI in health care? For health care organizations, the steps may include conducting a technology assessment vis-à-vis goals to achieve outcomes from GenAI. Evaluating the existing infrastructure and technological capabilities within the health care organization to determine readiness for GenAI integration is a first step. This will provide an understanding of the current state of technology and ensure that the necessary upgrades or modifications can be implemented to support GenAI applications, thus garnering the benefits of GenAI.

The second step is to invest in staff training and education through the development of training programs to enhance the skills of health care professionals in understanding and using GenAI technologies. Well-trained staff is essential for the effective and ethical implementation of GenAI, fostering a culture of continuous learning and adaptability. Third, health care organizations need to develop and communicate clear protocols and guidelines for the use of GenAI in different health care services, outlining ethical considerations, data privacy measures, and accountability standards. Transparent protocols help ensure the responsible and standardized use of GenAI, fostering trust among health care professionals and patients.

Fourth, health care organizations need to engage in research on GenAI through collaboration with research institutions and industry partners to participate actively in studies evaluating the effectiveness and impact of GenAI applications in specific health care domains. Involvement in research contributes to the evidence base, informs best practices, and positions the organization as a leader in health care innovation. Finally, as mentioned previously, implementing the gradual integration of GenAI rather than jumping into irrational decisions is a caution. All health systems need to gradually plan and introduce GenAI technologies, starting with pilot programs in specific departments or use cases. Gradual integration allows for careful monitoring of performance, identification of potential challenges, and iterative improvement before broader implementation.

For policy makers, much work must be done at the regulatory framework level to realize GenAI better. Policy makers must establish clear and adaptive regulatory frameworks that address the unique challenges GenAI poses in health care, ensuring patient safety, data privacy, and ethical use. There is a concern that bias in GenAI algorithms could lead to discrimination in care delivery across patients, and the role of policy guidelines in this aspect to train and use GenAI appropriately is critical. Policy frameworks must be developed to ensure less risk, safe and ethical use, and responsible effectiveness of GenAI. Policy and industry partnerships among experts to determine relevant frameworks are vital to guide the future of GenAI to help transform health care. Robust regulations will provide a foundation for the responsible and standardized integration of GenAI technologies. An underlying challenge of GenAI is integrating it across different legacy IT systems, which involves developing and adopting interoperability standards to ensure seamless communication and data exchange between different GenAI applications and existing health care systems. Interoperability enhances efficiency, reduces redundancy, and facilitates the integration of diverse GenAI solutions. In this process, creating incentives for responsible innovation for ethical considerations and the continuous improvement of GenAI applications will drive a culture of responsibility and quality improvement, aligning technological advancements with societal needs.

Policy-level efforts also need to be oriented to allocate resources to enhance health care infrastructure, including robust connectivity and data storage capabilities, to support the data-intensive nature of GenAI applications. Adequate infrastructure is crucial for the reliable and secure functioning of GenAI in health care. Many of these enhancements may require collaboration between public health care systems, private organizations, and academia to leverage collective expertise and resources for GenAI research, development, and implementation. Finally, policies that address potential biases in GenAI applications and ensure equitable access to these technologies across diverse populations are necessary to help with proactive measures to prevent the exacerbation of existing health care disparities through the adoption of GenAI.

Conclusions

GenAI is both a tool and a complex technology. Complexity is the basis for GenAI, and thus, the use of GenAI in health care creates a set of unparalleled challenges. GenAI is costly to implement and integrate across all aspects of a health system []. In envisioning the future of GenAI in health care, we glimpse a transformative landscape in which technology and compassion converge for the betterment of humanity. As we stand at the intersection of innovation and responsibility, the prospect of GenAI holds immense promise in revolutionizing health care, shaping a future in which personalized, efficient, and equitable clinical services are not just aspirations but tangible realities. Our vision embraces a symbiotic relationship between technology and human touch, recognizing that the power of GenAI lies not only in its computational prowess but also in its potential to amplify the capabilities of health care professionals. Picture a world in which diagnostic accuracy is elevated, treatment plans are truly personalized, and each patient’s journey is marked by precision and empathy.

Crucially, this vision hinges on responsible adoption. We envisage a future in which regulatory frameworks ensure the ethical use of GenAI, safeguard patient privacy, and uphold the principles of equity. It is a future in which interdisciplinary collaboration flourishes, bridging the expertise of health care providers, policy makers, technologists, and ethicists to navigate the complexities of this evolving landscape.

In the future, the impact of AI on human lives will be profound. Patients experience a health care system that not only heals but also understands, a system in which the integration of GenAI contributes to quicker diagnoses, more effective treatments, and improved outcomes. The human experience is at the forefront—GenAI becomes a tool for health care professionals to better connect with patients and spend more time understanding their unique needs, fears, and hopes. As we embark on this journey, it is crucial to remember that the heart of health care lies in the compassion, empathy, and wisdom of its human stewards. GenAI catalyzes empowerment, freeing health care professionals from mundane tasks to engage in meaningful interactions. It fosters a health care culture in which technology serves humanity, and the collective mission is to enhance the quality of care and life.

In embracing this vision, we are not just architects of technological progress but also custodians of a future in which GenAI and human touch coalesce to redefine health care possibilities. Let our strides be guided by a commitment to responsible innovation, a dedication to inclusivity, and an unwavering focus on the well-being of those we serve. The future of GenAI in health care is not just a scientific evolution, but it is a narrative of healing; compassion; and a shared commitment to a healthier, more humane world. However, without enough evidence, we are skeptical about the current euphoria regarding GenAI in health care.

This systematic narrative review of the preliminary evidence of using GenAI in health care clinical services provides valuable insights into the evolving landscape of AI applications in health care. The existing literature synthesis reveals promising advancements and critical considerations for integrating GenAI into clinical settings. The positive evidence underscores the potential of GenAI to revolutionize health care by offering personalized treatment plans, enhancing diagnostic accuracy, and contributing to the development of innovative therapeutic solutions. The applications of GenAI in areas such as pathology assistance, oncology decision support, and medical imaging interpretation showcase its capacity to augment health care professionals’ capabilities and improve patient outcomes.

However, this review also highlights several limitations and challenges that warrant careful consideration. Issues such as the quality of available data, the rapid pace of technological evolution, and the potential for algorithmic bias highlight the complexities associated with adopting GenAI in health care. Ethical concerns, data privacy considerations, and the need for transparent guidelines underscore the importance of a thoughtful and measured approach to integration.

As we navigate the preliminary evidence, it becomes evident that a collaborative effort is required among health care organizations, policy makers, researchers, and technology developers. Establishing clear regulatory frameworks, fostering interdisciplinary collaboration, and prioritizing ethical considerations are crucial steps in ensuring the responsible deployment of GenAI. Addressing the identified limitations through targeted research initiatives, ongoing evaluation, and continuous improvement will be essential for maximizing the benefits of GenAI while mitigating potential risks.

Moving forward, it is imperative to recognize that integrating GenAI into health care is dynamic and evolving. Future research should focus on refining our understanding of the long-term impact, patient-centered outcomes, and scalability of GenAI applications. By collectively addressing the challenges outlined in this review, stakeholders can contribute to a health care landscape in which GenAI is a powerful ally in delivering personalized, efficient, and equitable clinical services.

JK expressly acknowledges the Health Administration Research Consortium at the Business School of the University of Colorado Denver for providing a platform for the stimulating discussion and insights on this topic. The authors acknowledge Mr Bhanukesh Balabhadrapatruni, graduate student fellow at the Health Administration Research Consortium, for assisting with data categorization and citation listing. AM thanks the participants from the Society of Physician Entrepreneurs for their input about artificial intelligence in health care. VP thanks Dr Ron Li at Stanford Medicine for insights and a stimulating discussion on this topic. We used the generative AI tool ChatGPT (version 3.5; OpenAI) for automatic coding and checking the accuracy of the human coding process used to categorize the articles reviewed and synthesized in this study [,].

JK is an associate editor of the Journal of Medical Internet Research.

Edited by A Castonguay; submitted 21.08.23; peer-reviewed by SH Kim, Y Wang, S Pesala; comments to author 19.09.23; revised version received 12.10.23; accepted 30.01.24; published 20.03.24.

©Dobin Yim, Jiban Khuntia, Vijaya Parameswaran, Arlen Meyers. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 20.03.2024.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.

留言 (0)

沒有登入
gif