Applying AI to Structured Real-World Data for Pharmacovigilance Purposes: Scoping Review

IntroductionBackground

Pharmacovigilance is defined by the World Health Organization as “the science and activities relating to the detection, assessment, understanding, and prevention of adverse effects or any other drug-related problem” []. Pharmacovigilance plays a crucial role in ensuring the safety of medications and protecting the health of patients because it mostly focuses on the identification of potential adverse drug reactions (ADRs) after medicinal products have been licensed and released to the public.

ADRs can range from mild and tolerable side effects to severe and life-threatening events. They constitute 5% to 7% of emergency department consultations []. Their impact in terms of public health is significant because there are estimates concluding that ADRs can cause an increase in the duration of hospitalization stays for outpatient (mean 9.2, SD 0.2 d) and inpatient (mean 6.1, SD 2.3 d) settings []. Typically, pharmacovigilance professionals analyze data from individual case safety report (ICSR) databases (such as the Food and Drug Administration Adverse Event Reporting System, the database maintained by the US Food and Drug Administration) to identify potential pharmacovigilance signals, namely potential causal relationships between an ADR and a drug. ICSRs are typically submitted either by patients or by health care or pharmacovigilance professionals, and they are the main data source used today for pharmacovigilance. However, ICSR databases are subject to many biases; in addition, underreporting has been identified as a huge issue []. Moreover, such databases frequently lack information that could make a significant difference in the examination of a potential signal (eg, patients’ medical history). Hence, the early detection of potential pharmacovigilance signals by collecting and analyzing data from various sources is critical to prevent serious side effects as soon as possible.

The term “real-world data” (RWD) refers to data collected outside of the controlled environment of clinical trials, such as electronic health records (EHRs), patient registries, insurance claims databases, electronic prescription systems, and so on. There is a growing interest in using RWD for pharmacovigilance signal management to facilitate faster and more efficient postmarketing surveillance []. The significance of RWD in pharmacovigilance lies in its potential for representing longitudinal real-world patient experiences and health care practices that can provide insights into drug safety under real-life conditions. Analyzing RWD could also enrich and consolidate the already existing knowledge on ADRs (eg, by detecting new cofounders). Indicatively, a federated RWD network was used recently to validate the value of RWD in terms of pharmacovigilance signal management [].

To this end, the European Medicines Agency and the US Food and Drug Administration have established infrastructures for the leverage of RWD for drug safety purposes, called Data Analysis and Real World Interrogation Network (DARWIN) [] and the Sentinel Initiative [], respectively. RWD are also being actively investigated for purposes beyond drug safety (eg, epidemiology) []. It should be noted that although RWD could in principle provide a good overview of patients’ clinical course, two major challenges are preventing their use: (1) these datasets typically come with significant data quality risks and usually contain a high proportion of null values and errors; and (2) because of legal, ethical, and regulatory issues (eg, patient privacy issues), it is difficult to access these data sources.

Rationale

Artificial intelligence (AI) is widely acknowledged as a potentially very useful technical breakthrough that could be used to support decisions in health care (eg, clinical decision support systems) due to its ability to efficiently process big data to seek useful information. AI could be used to identify patterns and associations within large amounts of data (eg, RWD) where traditional statistical methods of data analysis may struggle to extract because of the amount and complexity (eg, nonlinear relationships between variables) of the data. AI has been widely investigated regarding its applications in health care (eg, personalized medicine) with promising results [,]; however, it is not yet widely applied in clinical practice. In the context of pharmacovigilance, AI could potentially support multiple aspects (eg, the identification of patient subpopulations who may be more vulnerable to specific ADRs), contributing to the vision of personalized drug safety management.

Objectives

The objective of this scoping review (SR) was to identify and characterize the current research trends regarding the use of AI on structured RWD for pharmacovigilance and identify relevant gaps.

Methods

The PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) [] methodology or rationale was applied. The PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) statement is a road map for authors to describe more precisely the state of the art and the findings of the literature search, as well as discuss the results.

Eligibility Criteria

Journal and conference articles written in English were selected if they focused on pharmacovigilance and reported the use of symbolic and nonsymbolic AI approaches applied to RWD, specifically EHRs, insurance claims databases, and administrative health data ().

Textbox 1. Inclusion and exclusion criteria for the scoping review.

Inclusion criteria

Article type: researchLanguage: EnglishData type: tabularData analysis method: symbolic artificial intelligence (AI) and nonsymbolic AI

Exclusion criteria

Article type: review and opinion articlesLanguage: otherData type: image and textData analysis method: statistical

Review and opinion articles were excluded from the final manuscript selection. Furthermore, research articles focusing on image and text data (eg, social media and clinical notes) were also excluded. In addition, AI methods focusing on the use of natural language processing (NLP), natural language understanding, image processing, or object detection were considered beyond the scope of this work.

A key issue that came up during this SR was the lack of a clear distinction between plain statistical methods and machine learning (ML) approaches because these 2 domains frequently overlap, and these 2 terms are sometimes used interchangeably. In this manuscript, we acknowledge that the difference between AI and statistical methods is that AI creates models that can “learn” from data during iterative training processes, while statistical methods deal with finding relationships between variables. Thus, we considered the iterative “learning” part of an algorithm as the key feature to classify the algorithm as AI and ML. We excluded papers that were based on algorithms with no iterative “learning” scheme because we considered them to be part of the “plain statistical methods” approaches. Finally, we excluded papers that focused on adverse drug events related to medical devices.

Information Sources and Search Strategy

A search query was developed and executed on January 31, 2024, to include research articles from 2010 to 2024 exclusively from the MEDLINE scientific library, given that it is the oldest and biggest repository of journal articles in life sciences. presents the query structure.

Textbox 2. Query structure ([pharmacovigilance terms with OR] AND [artificial intelligence (AI) terms with OR] AND [real-world data (RWD) terms with OR]).

Pharmacovigilance (keywords relevant to known adverse drug reaction [ADR] categories, synonyms of drug safety, pharmacovigilance terminology, and known individual case safety report [ICSR] databases)

V OR “pharmacovigil*” OR “pharmaco-vigil*” OR “side effect*” OR “adverse reaction*” OR “Product Surveillance” OR “postmarket*” OR pharmacoepidemiol* OR pharmaco-epidemiol* OR “drug safety” OR “drug event*” OR “toxicit*” OR “drug reaction*” OR “adverse drug*” OR “allerg*” OR “post-market*” OR “post market*” OR vaccinovigil* OR vaccino-vigil* OR eudravigilance OR “individual case safety report*” OR ICSR OR VAERS OR FAERS OR AERS OR vigibase OR “adverse effect*” OR “adverse event*” OR hypersensitiv* OR “spontaneous report*” OR “yellow card” OR “yellow-card” OR ADR OR “personalized pharmacovigilance” OR “precision pharmacovigilance” OR “pharmacosurveillance” OR “pharmaco-surveillance”

AI (categories of AI, terms that are used in the development of an AI model, explainable and interpretable AI methods, and different AI architectures)

“artificial intelligence” OR AI OR “machine learning” OR ML OR “neural network*” OR NN* OR “deep learning” OR DL OR ontolog* OR “knowledge engineering” OR KE OR reasoning OR inference OR “semantic web” OR “OWL” OR “Web Ontology Language” OR SWRL OR “RDF” OR “Resource Description Framework” OR “prediction” OR “estimation” OR “XAI” OR “SHAP” OR “Shapley value” OR “LIME” OR “Local Interpretable Model-agnostic Explanations” OR “DeepSHAP” OR “DeepLIFT” OR “CXplain” OR “Explainable Artificial Intelligence” OR “Explainable machine learning” OR “Interpretable artificial intelligence” OR “Interpretable machine learning”

RWD or real-world evidence (categories of RWD and data models that are used to store RWD)

“Real World Evidence” OR “Real World Data” OR RWE OR RWD OR “Observational Medical Outcomes Partnership” OR “OMOP” OR “Electronic Healthcare Record*” OR “EHR” OR “Electronic Medical Record*” OR “EMR*” OR EHDEN OR OHDSI OR i2b2 OR Sentinel OR DARWIN OR “Data Analysis and Real World Interrogation Network” OR administrative OR claim* OR “Observational Health Data Sciences and Informatics” OR “European Health Data Evidence Network” OR “multimodal data” OR “multimodal drug data” OR “multidimensional data” OR “multidimensional drug data” OR “multi-modal data” OR “multi-modal drug data” OR “multi-dimensional data” OR “multi-dimensional drug data”

Selection Process

The initial phase (phase 1) focused on screening the titles and abstracts of the articles retrieved from the search query () to map those that potentially met our inclusion criteria and exclude irrelevant studies using the Rayyan tool (Rayyan Systems Inc) []. Rayyan is an AI tool designed to facilitate remote collaboration among researchers when conducting systematic literature reviews. The platform gathers the titles and abstracts of all articles selected for the study, and reviewers can evaluate the eligibility (ie, “include,” “exclude,” or “maybe”) of every article based on their review’s objectives in blind mode, that is, each reviewer assesses the articles without prior knowledge of the other reviewers’ decisions. We resolved any conflicts that arose during this process through consensus meetings involving all reviewers.

The second phase focused on the full-text review of the papers selected during phase 1 to decide on the final set for inclusion in this study. In the full-text review of the studies selected based on titles and abstracts, we excluded research papers that did not meet ≥1 of the inclusion criteria (ie, strong focus on AI, RWD, and pharmacovigilance) as well as studies that met the exclusion criteria (eg, studies related to image and text data or those following only statistical approaches).

Data-Charting Process

A standard data extraction form was used to obtain an overview of the 36 selected studies (Tables S1 and S2 in ). For each study, we extracted information about the authors; journal name (where the study was published); publication year; country of origin (where the study was conducted); the objective of the study; types of organizations that participated in the study (based on the authors’ affiliations); and key findings that relate to the scoping review question, which are described in the next subsection (Data Collection Process and Mapping). Any inconsistencies were discussed and resolved among the reviewers.

Data Collection Process and Mapping

The selected studies were further elaborated and mapped against evaluation criteria using a spreadsheet. The main categories of mapping criteria were as follows: pharmacovigilance objectives (drug safety core activities and drug safety special topics), data provenance (data source categories and data sources), countries of origin, AI algorithm categories, data preprocessing methods, the use of explainable AI (XAI) methods, code availability, the use of models in clinical practice, ethical AI, and so on. presents an external description of the mapping criteria.

Table 1. The categories and subcategories used in the risk-of-bias assessment of included studies designed to characterize artificial intelligence (AI) studies on structured real-world data in pharmacovigilance.Category of biasExplanationSubcategoriesSelection biasThe bias that occurs when the input data of an AI model underrepresent the target populationUnderrepresentation of certain demographic groups
Overrepresentation of adverse drug events from specific health care systems or regions
Measurement biasHow the different features are collected and measuredInconsistent adverse drug event reporting practices
Variations in diagnostic criteria or coding practices for medical conditions
Temporal biasHow the study processes time-dependent featuresChanges in prescribing patterns or drug formulations over time
Seasonal variations in disease prevalence or reporting behaviors
Algorithmic biasThe biases produced form AI model outputsDifferential performance in adverse drug event detection across patient subgroups
Biased risk assessments for certain medications or populations
Implicit biasHow stereotypes influence the AI model design and interpretationOverlooking potential drug interactions more common in specific ethnic groups
Underestimating the severity of side effects reported by certain demographics
Confounding biasHow unaccounted-for confounders influence predictionFailing to consider comorbidities when assessing drug safety profiles
Not accounting for polypharmacy effects in adverse-event analysis
Automation biasThis refers to the tendency to overly rely on automated systemsOverlooking rare or unusual adverse drug events not flagged by AI systems
Reduced critical evaluation of AI-generated safety signals by human experts
Studying Risk-of-Bias Assessment

To effectively map the risk of bias in each included study, we considered selection, measurement, temporal, implicit, confounding and automation biases. Furthermore, we translated these categories into more specific categories according to our study ().

Synthesis Methods

The mapping strategy was designed based on the 3 main pillars of the objective; in addition, we included general information about the research papers (). Furthermore, we included free-text fields in the mapping Microsoft Excel file to add significant extra details that cannot be easily classified. These fields included “objective,” “methods,” “assessment,” and “interesting results.” The criteria encompassed specific attributes (eg, drug safety core activities) that were defined based on previous experience of conducting an SR in the field [] and key interest aspects identified during the review.

Furthermore, in terms of ethical AI, the included studies were evaluated based on trustworthy AI guidelines for solutions in medicine and health care from the FUTURE (Fairness, Universality, Traceability, Usability, Robustness, and Explainability)-AI initiative []. These guidelines are separated into 7 categories (fairness, universality, traceability, usability, robustness, explainability, and a general category). For our evaluation procedure, we included only the highly recommended subcategories from each of the 7 main categories for proof of concept (low technology readiness levels for ML models) []. presents the selected criteria and their description.

Table 2. Mapping criteria architecture for each different category in the query. There are 2 types of criteria: textual and binary (yes or no).Categories and criteraSubcriteriaGeneral informationPubMed and MEDLINE ID (number)ID number of articlesAuthors (text)List of authorsTitle (text)Article titleJournal (text)ID number of articlesYear published (number)Year of article publicationTypes of organizations (text)Types of organizations based on the authors’ affiliation; possible values: health care, government, academia, industry, pharmacovigilance monitoringCountry (text)Country where the research was conducted based on the authors’ affiliationsPharmacovigilanceDrug safety core activities (text)Possible values: ADEa detection, ADE monitoring, ADE prevention, ADE assessment, ADE information collection, and ADE reportingDrug safety special topics (text)Possible values: comparative drug analysis, drug interactions, MoAb identification and analysis, personalized drug safety, signal detection, specific (class of) disease, specific (class of) drugs, specific adverse effect, and vaccine safetyDrug (text)Drugs being examined in the research papersReaction (text)Reactions being examined in the research papersIndication (text)Indications being examined in the research papersReference terminologies (text)Known health informatics terminologies that are detected in the research papersAIcAI categories (text)Possible values: nonsymbolic AI and symbolic AINonsymbolic AI (text)Possible values: classification and regressionClassification (text)Possible values: random forest, logistic regression, artificial neural network, XGBoostd, support vector machine, decision tree, knowledge graph, k-nearest neighbors, gradient boost, naïve Bayes, random survival forest, and extra treeRegression (text)Possible values: logistic regression, linear regression, LASSOe, and regularized Cox regressionData preprocessing type (text)Possible values: dimensionality reduction, feature engineering, null imputation, and data cleansingData cleansing (text)Possible values: data normalization and remove null valuesFeature engineering (text)Possible values: one-hot encoding, binning, splitting, and calculated featuresNull imputation (text)Possible values: regression or classification imputationExplainable AI methods (text)Possible values: LIMEf and SHAPgKnowledge representation formalism (text)Possible values: OWLh and RDFiKnowledge engineering core activities (text)Possible values: knowledge extraction, knowledge integration, and knowledge representationReal-world dataData source categories (text)Possible values: ADE databases, clinical narratives, clinical trials drug information databases, drug regulation documentation, EHRsj, genetics and biochemical databases, spontaneous reporting systems, dispensing records from pharmacies, and administrative claims dataData source or sources (text)Possible values: proprietary closed data sources (eg, specific hospital EHR), FAERSk, SIDERl, SMILESm, UK Biobank, Osteoarthritis Initiative dataset, PharmGKBn, TwoSIDES, EU-ADRo reference set, Stockholm Electronic Patient Record Corpus, MIMICp, OMIMq, DisGeNetr, and AEOLUSsData model (text)Possible values: OMOP-CDMt, Sentinel, and customEvaluation criteriaCode availability (text)The availability of the code in an open registry; possible values: yes and noData preprocessingInformation about the data preprocessing procedures; possible values: yes and noClinical useInformation about the evaluation of the produced work pipeline in clinical environments; possible values: yes and no

aADE: adverse drug event.

bMoA: mechanism of action.

cAI: artificial intelligence.

dXGBoost: extreme gradient boosting.

eLASSO: least absolute shrinkage and selection operator.

fLIME: local interpretable model-agnostic explanations.

gSHAP: Shapley additive explanations.

hOWL: Web Ontology Language.

iRDF: resource description framework.

jEHR: electronic health record.

kFAERS: Food and Drug Administration Adverse Event Reporting System.

lSIDER: Side Effect Resource.

mSMILES: Simplified Molecular Input Line Entry System.

nPharmGKB: Pharmacogenomics Knowledge Base.

oEU-ADR: European Union Adverse Drug Reaction.

pMIMIC: Medical Information Mart for Intensive Care.

qOMIM: Online Mendelian Inheritance in Man.

rDisGeNet: gene-disease association network.

sAEOLUS: Adverse Event Open Learning through Universal Standardization.

tOMOP-CDM: Observational Medical Outcomes Partnership Common Data Model.

Table 3. Detailed description of the FUTURE (Fairness, Universality, Traceability, Usability, Robustness, and Explainability)-AI highly recommended and proof-of-concept machine learning guidelines used for this study, along with a general category (Table S3 Multimedia Appendix 1).Categories and recommendationsDescriptionFairnessDefine sources of biasIdentification of possible types and sources of bias for the AIa tool during the design phase (eg, sex, gender, age, ethnicity, socioeconomics, geography, comorbidities or disability of patients, and human biases during data labeling)UniversalityDefine clinical settingsSpecification of the clinical settings in which the AI tool will be applied (eg, primary health care centers, hospitals, home care, low- vs high-resource settings, and 1 country or multiple countries)Evaluate using external dataTesting of the developed AI model to an external dataset with different characteristics from the training setTraceabilityProvide documentation (eg, technical and clinical)Creation of documentation files that provide technical (eg, public repositories) and clinical information (eg, bias of the model based on its use)UsabilityDefine user requirementsSpecification of the model’s use from health care professionalsRobustnessDefine sources of data variationSpecification of data sources’ variation that may impact the AI tool’s robustness in the real world (differences in equipment, technical fault of the machine, data heterogeneities during data acquisition or annotation, or adversarial attacks)Train with representative dataData for the training process should represent the population based on the case study for which the AI model has been developedEvaluate and optimize robustnessRisk mitigation measures should be implemented to optimize the robustness of the AI model, such as regularization, data augmentation, data harmonization, or domain adaptationExplainabilityDefine explainability needsUse of interpretable or explainable modelsGeneralEngage interdisciplinary stakeholders throughout the AI lifecycle—bImplement measures for data privacy and security—Define adequate evaluation plan (eg, datasets, metrics, and reference methods)—

aAI: artificial intelligence.

bNot applicable.

Reporting Risk-of-Bias Assessment

The selection of only studies written in English and the exclusion of AI studies focused on text mining or NLP, image processing, and statistical analysis could be identified as potential risks for bias. Furthermore, the selection of papers only from the MEDLINE database could be identified as a potential bias risk because it potentially leads to the omission of papers from other databases (eg, AI databases).

ResultsStudy Selection

The PubMed search query originally returned 4264 studies. During the abstract and title screening process (phase 1), we selected 93 (2.18%) of the 4264 articles for full-text screening (phase 2). During phase 2, based on the inclusion criteria, of these 93 research papers, 36 (39%) were selected. The PRISMA-ScR flowchart () presents a detailed overview of the selection procedure. The PRISMA-ScR checklist is presented in .

‎

Figure 1. PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flowchart. ADR: adverse drug reaction; AI: artificial intelligence; NLP: natural language processing. Study Characteristics

The included studies were published between 2015 and 2023, with a notable increase in the number of studies after 2019 ().

Of the 36 studies, 19 (53%) originated from the United States, 4 (11%) from Korea, and 4 (11%) from the United Kingdom, while the rest of the studies (n=9, 25%) were distributed across a variety of other countries ().

Most of the studies (30/36, 83%) were conducted from academia ().

Table 4. The distribution of studies through the years (n=36).YearsStudies, n (%)20151 (3)20162 (5)20173 (8)20182 (5)20192 (5)20204 (11)202110 (29)20227 (20)20235 (14)

aIncludes studies conducted in multiple countries.

Table 5. Country of origin of the included studies (n=36)a.CountriesStudies, n (%)United States19 (53)South Korea4 (11)United Kingdom4 (11)Canada3 (8)Sweden3 (8)China3 (8)France3 (8)Australia2 (6)Netherlands2 (6)Bangladesh1 (3)Israel1 (3)Belgium1 (3)Denmark1 (3)Taiwan1 (3)Ireland1 (3)Switzerland1 (3)

aIncludes studies that involved >1 type of organization.

Table 6. Types of organizations that participated in the including studies (n=36)a.OrganizationsStudies, n (%)Academia30 (83)Health care9 (25)Industry6 (17)Government2 (6)Regulatory bodies1 (3)

aIncludes studies that involved multiple databases.

bEHR: electronic health record.

cSRS: spontaneous reporting system.

dADE: adverse drug event.

In terms of AI, of the 36 studies, 34 (94%) applied only nonsymbolic AI, and 1 (3%) used only symbolic AI, while 1 (3%) study combined the symbolic and nonsymbolic AI technical paradigms. Of the 34 nonsymbolic AI articles, 29 (85%) used classification tasks, whereas 3 (9%) selected regression algorithms, 3 (9%) applied causality algorithms (causal inference: n=2, 67%; causal discovery: n=1, 33%), and only 1 (3%) applied an association rule mining technique. The association rule mining study [] followed a mathematical framework called formal concept analysis to create association rules between drugs and phenotypes to detect possible ADRs. Moreover, of the 29 studies that used classification tasks, 6 (21%) used XAI techniques, of which 4 (67%) used Shapley additive explanations, 1 (17%) used local interpretable model-agnostic explanations, and 1 (17%) tested both approaches.

Regarding RWD (), of the 36 articles, 28 (78%) focused on the use of EHRs (from local hospital databases), 4 (11%) used data from pharmacy dispensing records, and 3 (8%) used administrative claims data, while 2 (6%) focused on patient registries and 1 (3%) on insurance claims. In addition, a variety of other sources were used, including RWD such as drug information databases (3/36, 8%), spontaneous reports (3/36, 8%), adverse drug event databases (2/36, 6%), electronic prescription data (2/36, 6%), and genetics and biochemical databases (1/36, 3%).

Of the 36 studies, 23 (64%) used AI for ADR detection, 4 (11%) examined ADR assessment, 2 (6%) focused on ADR monitoring, 7 (19%) investigated ADR prevention, and 2 (6%) used AI to collect information about ADRs ().

Table 7. Variety of data used in the development of artificial intelligence models in the included studies (n=36)a.Type of databaseStudies, n (%)EHRsb28 (78)Drug information databases4 (11)Dispensing records from pharmacies4 (11)SRSsc3 (8)Administrative claims data3 (8)Patient registries2 (6)Electronic prescription data2 (6)ADEd databases2 (6)Insurance claims1 (3)

aIncludes studies that examined multiple pharmacovigilance core activities.

bADR: adverse drug reaction.

Table 8. Description of pharmacovigilance core activities in the included studies (n=36)a.Pharmacovigilance core activitiesStudies, n (%)ADRb detection23 (64)ADR prevention7 (19)ADR assessment4 (11)ADR monitoring2 (6)ADR information collection2 (6)

aIncludes studies that involved multiple AI algorithms.

bXGBoost: extreme gradient boosting.

cLASSO: least absolute shrinkage and selection operator.

dNo algorithms.

The classification studies (29/36, 81%; ) tested several AI techniques, with random forest (RF) being the most frequently used algorithm (17/29, 59%). However, the regression studies (3/36, 8%) developed AI models only with extreme gradient boosting (1/3, 33%) and logistic regression (2/3, 67%).

Finally, for the evaluation of AI models (), most of the studies (24/36, 67%) reported area under the receiver operating characteristic curve as the primary metric.

Of the 36 studies, 32 (89%) investigated specific drug safety topics: 16 (50%) on specific adverse effects, 14 (44%) on specific class of drugs, 8 (25%) on specific (class of) diseases, 6 (19%) on signal detection, 3 (9%) on drug interactions, 2 (6%) on personalized drug safety, and 1 (3%) on vaccine safety ().

Table 9. The types of artificial intelligence (AI) algorithms that the models developed in the included studies (n=36).AI models and algorithmsStudies, n (%)Classification (n=29)a
Random forest17 (59)
XGBoostb10 (34)
Artificial neural network8 (28)
Logistic regression8 (28)
Support vector machine7 (24)
Decision tree5 (17)
K-nearest neighbor2 (7)
Gradient boost2 (7)
LASSOc2 (7)
Extra tree1 (3)
Naïve Bayes1 (3)
Random survival forest1 (3)
Linear regression1 (3)
Regularized Cox regression1 (3)Regression (n=3)
XGBoost1 (33)
Logistic regression2 (67)Causalityd (n=3)3 (100)

aIncludes studies that involved multiple model evaluation metrics.

Table 10. Evaluation metrics of artificial intelligence (AI) models developed in the included studies (n=36)a.AI model evaluation metricsStudies, n (%)Area under the receiver operating characteristic curve24 (67)Accuracy10 (28)F1-score8 (22)Precision11 (31)Recall10 (28)Negative predictive value6 (17)Sensitivity9 (25)Specificity7 (19)Other (≤2)20 (56)

aIncludes studies that examined multiple pharmacovigilance topics.

Table 11. Specialized pharmacovigilance topics presented in the included studies (n=36)a.Specialization of pharmacovigilance topicsStudies, n (%)Specific adverse effect16 (50)Specific (class of) drugs14 (44)Specific (class of) disease8 (25)Signal detection6 (19)Drug interactions3 (9)Personalized drug safety2 (6)Vaccine safety1 (3)

aIncludes studies that used multiple data sources.

bSIDER: Side Effect Resource.

cFAERS: Food and Drug Administration Adverse Event Reporting System.

presents the diversity in the data sources used in the included studies. Of the 36 studies, 29 (81%) chose proprietary closed data sources (eg, specific hospital EHRs) for their experiments. Along with EHR data, other data sources were also used (eg, Food and Drug Administration Adverse Event Reporting System and Side Effect Resource). Of the 36 studies, 2 (6%) selected the Stockholm Electronic Patient Record Corpus. The remaining RWD sources (Medical Information Mart for Intensive Care and the Osteoarthritis Initiative dataset) are represented in only 2 (6%) of the 36 studies (n=1, 50% for every database).

In terms of data models, of the 36 studies, 27 (75%) used proprietary data models, 3 (8%) did not mention any data model, 5 (14%) used the Observational Medical Outcomes Partnership Common Data Model (OMOP-CDM), and 1 (3%) used the Sentinel model ().

presents the case studies examined in the included articles. Notably, an important number of studies (20/36, 55%) did not work in specific ADR case studies. Another significant outcome is the diversity of case studies; the articles do not focus on a specific drug, indication, or reaction. It can be observed that chemotherapy drugs and their associated reactions in various types of cancers emerge as slightly more prominent categories in this review ().

Table 12. Variety of data sources used in the included studies (n=36)a.Data sourcesStudies, n (%)Proprietary closed data sources29 (81)Other12 (33)SIDERb3 (8)FAERSc2 (6)Stockholm Electronic Patient Record Corpus2 (6)

aOMOP-CDM: Observational Medical Outcomes Partnership Common Data Model.

Table 13. Included studies’ distribution based on data models that the data are stored (n=36).Data modelsStudies, n (%)Custom27 (75)OMOP-CDMa5 (14)Unknown3 (8)Sentinel1 (3)‎

Figure 2. Association pathways between artificial intelligence models, data sources, and drug safety categories in the included studies. ADE: adverse drug event; EHR: electronic health record; LASSO: least absolute shrinkage and selection operator; SRS: spontaneous reporting system; XGBoost: extreme gradient boosting. ‎

Figure 3. Drugs, indications, and reactions in the included studies. (A) Drugs. (B) Indications. (C) Reactions.

Although most of the studies (21/36, 58%) used complex AI algorithms (black boxes), such as RF (an ensemble method) and artificial neural networks (ANNs), to construct their prediction models in all ADR categories, many studies (15/36, 42%) used simple interpretable ML approaches such as logistic regression. Moreover, it is important to highlight that all studies worked on EHR databases, except for the adverse drug event assessment category in which we detected a single study with a vaccine database.

RWD databases were also used alongside other types of data; for example, EHRs were mostly combined with spontaneous reporting systems and drug information databases, vaccine data with adverse drug event databases, and administrative claims data with spontaneous reporting systems. Furthermore, some of the studies (3/36, 8%) integrated different types of observational data to develop AI models, combining pharmacy dispensing records with EHRs and administrative claims data.

Evaluation Results

Only 3 (8%) of the 36 studies included in this SR openly provided their code. In addition, only 16 (44%) of the 36 studies included a detailed description of data preprocessing pipelines for RWD. Moreover, just 4 (11%) of the 36 studies evaluated their methodology within a clinical environment ().

Table 14. Summary of code availability, data preprocessing, and clinical validation evaluation criteria (n=36).Evaluation metricsStudies, n (%)
YesNoCode availability3 (8)33 (92)Data preprocessing16 (44)20 (56)Clinical validation5 (14)31 (86)

In terms of trustworthy AI, only 5 (14%) of the 36 studies scored <50% on the Fairness, Universality, Traceability, Usability, Robustness, and Explainability–AI (FUTURE-AI) criteria ( and ). Among the studies that achieved scores of >75% [-], 3 (75%) out of 4 used external data to evaluate their models, addressing the Universality criterion (Table S4 in ).

Table 15. Distribution of included studies according to the FUTURE (Fairness, Universality, Traceability, Usability, Robustness, and Explainability)-AI guidelines (n=36)a.Data modelsStudies, n (%)Custom27 (75)OMOP-CDMa5 (14)Unknown3 (8)Sentinel1 (3)

aIncludes studies that fell into multiple FUTURE-AI and general categories.

Table 16. Evaluation of articles included in the scoping review based on the Fairness, Universality, Traceability, Usability, Robustness, and Explainability–AI (FUTURE-AI) and Code Availability, Data preprocessing and clinical validation criteria.StudyYearCode availabilityData preprocessingClinical useFUTURE-A

View original article

JOURNAL OF MEDICAL INTERNET RESEARCH

分享书签

0 0 0 0 0 0 0

More from this channel

Applying AI to Structured Real-World Data for Pharmacovigilance Purposes: Scoping Review

留言 (0)