Machine learning has had growing popularity in clinical settings related to the widespread adoption of electronic health records [-], combined with increasing data storage and computational ability []. In this setting, machine learning can be useful for multiple purposes including (1) to facilitate diagnoses, as in pathology [,] and radiology []; (2) to make predictions about outcomes for risk stratification; and (3) to improve resource utilization by anticipating volumes of patients or services []. However, despite the initial enthusiasm around machine learning in health care, domain experts have expressed caution [,]. Similar information technology solutions have commonly failed to be implemented or provide utility [].
An important consideration impacting utility is choosing the clinical setting and problem in which a machine learning model is to be implemented []. A machine learning model’s predictions need to augment current approaches in a way that is meaningful and actionable without introducing excessive burden. It is important to carefully plan a machine learning model’s implementation because the costs of model deployment are considerable. Such costs may include resources required to develop and maintain the machine learning model, training of the intended model users regarding how to access and interpret the model’s predictions, and support to help users implement the results into practice [,].
Given these costs, a systematic approach for determining which machine learning models should be prioritized for implementation into clinical practice may be valuable. In determining priorities, it would be important to involve key stakeholders at the institution in which deployment is planned. We chose to survey 2 pediatric centers, 1 in the United States with a more established biomedical informatics program, and 1 in Canada with a less established biomedical informatics program, to gain insight into whether experience and expertise affected preferences for machine learning model prioritization. Consequently, the primary objective was to determine the health care attributes respondents at 2 pediatric institutions rate as important when prioritizing machine learning model implementation. The secondary objective was to describe their perspectives on machine learning model implementation using a qualitative approach.
This was a mixed methods study that included a quantitative and a qualitative component. The institutions were The Hospital for Sick Children (SickKids) in Toronto, Ontario, Canada, and Lucile Packard Children’s Hospital in Palo Alto, California, United States.
ParticipantsWe included health system leaders, physicians, and data scientists at SickKids and Lucile Packard Children’s Hospital at the time of survey distribution. We excluded trainees.
ProceduresThe survey was developed by the study team based on their impression of health care attributes respondents might consider to be important; the machine learning–focused questions are presented as . Potential participants were identified through organizational emailing lists. The quantitative survey was distributed by email and participants completed the survey in REDCap []. The survey asked respondents to indicate whether they were health system leaders, physicians, or data scientists; respondents could indicate multiple categories. Demographic variables included clinical specialty (if applicable), years employed following completion of training, and gender.
We then asked about their knowledge of artificial intelligence on a 5-point Likert scale ranging from 1 (no knowledge at all) to 5 (a lot of knowledge). We asked them to rate their understanding of how machine learning models are built and interpreted, and how statistics are conducted and interpreted, using 5-point Likert scales ranging from 1 (no understanding) to 5 (fully understand). We asked if they had decision-making ability to implement artificial intelligence initiatives within their work environment, and how many machine learning models had been deployed at their institutions in the last 5 years.
The next section asked respondents to rank the following 5 clinical problem and implementation consequence attributes in terms of whether machine learning implementation would be useful: “the clinical problem being solved is common,” “the clinical problem causes substantial morbidity or mortality,” “risk stratification would lead to different clinical actions that could reasonably improve patient outcomes,” “implementing the model could reduce physician workload,” and “implementing the model could save money.” Important attributes were defined as those ranked as most important or second most important (rank of 1 or 2) by respondents. The survey then asked 2 open-ended questions focused on clinical areas where being able to accurately predict an outcome might be useful, and clinical areas in which prioritization or reorganization of waitlists might be useful. Finally, the survey asked whether they would be willing to participate in a qualitative interview.
For the qualitative aspect, we purposively sampled respondents to maximize variation by institution and self-rated understanding of machine learning. Semistructured interviews were conducted using Zoom (Zoom Video Communications, Inc.) or Microsoft Teams by a member of the SickKids team (EP) with expertise in the conduct of qualitative interviews. Respondents were asked to list 3 scenarios in which a machine learning model for risk stratification could be useful and then to state which scenario was the most important to implement first and the rationale for the choice. They were then asked how they would feel about using a machine learning model for risk stratification as opposed to their current approach, and to describe concerns they had about using a machine learning model to guide patient care. The interviews were recorded and transcribed verbatim.
AnalysisThe data from the quantitative survey from SickKids and Lucile Packard Children’s Hospital were compared using the Fisher exact test. Analyses were performed in R (R Core Team) using RStudio version 3.6.1 [,].
The analysis of qualitative data was performed according to the principles of grounded theory methodology; data collection and analysis occurred concurrently. Qualitative transcripts were analyzed by 2 independent reviewers (NA and EP) using the constant comparative method to develop a theoretical framework for respondents’ perspectives of machine learning that are grounded in their individual experiences and understandings. Sampling was continued until saturation was reached, which was defined as the point in which no new themes emerged from the data.
Ethics ApprovalThe study was approved by the Research Ethics Board at SickKids. The need for Institutional Review Board approval was waived by Lucile Packard Children’s Hospital as the data collection was performed by SickKids personnel. For the quantitative survey, completion of the survey was considered implied consent to study participation. For the qualitative component, respondents provided verbal consent to participate.
The quantitative survey was distributed at SickKids between November 1, 2021, and January 6, 2022 and at Lucile Packard Children’s Hospital between March 15, 2022, and April 12, 2022. Among 613 eligible respondents, 275 (44.9%) responded. shows the participant identification and selection flowchart, including the number participating in the qualitative interviews when saturation was reached.
presents the demographic characteristics of respondents; physician specialty (P<.001) and years from completion of training (P=.006) were significantly different between the 2 institutions. The majority of respondents were physicians (165/195, 84.6%, at SickKids and 73/80, 91.3%, at Lucile Packard Children’s Hospital). The number of respondents who had decision-making ability to implement artificial intelligence initiatives was 99/195 (50.8%) at SickKids and 41/80 (51.3%) at Lucile Packard Children’s Hospital. Most respondents did not know the number of machine learning models deployed at their institution over the last 5 years (137/195, 70.3%, at SickKids and 53/80, 66.3%, at Lucile Packard Children’s Hospital).
illustrates respondents’ self-perceived knowledge of artificial intelligence and understanding of machine learning and statistics. There were no statistically significant differences in these ratings by institution (artificial intelligence knowledge, P=.93; machine learning development and interpretation, P=.72; statistics conduct and interpretation, P=.19). The percentage of respondents who stated they had “moderate” or “a lot” of artificial intelligence knowledge was 17.9% (35/195) at SickKids and 17.5% (14/80) at Lucile Packard Children’s Hospital. compares respondent characteristics by those who self-rated their artificial intelligence knowledge as high (score of 4 or 5 on the 5-point Likert scale) versus not high across institutions. Those who self-rated their knowledge as high were significantly more likely to be males (P=.02) and nonphysicians (P=.006). The percentage of respondents who stated they understood machine learning development and interpretation at a “moderate” level or “fully” was 15.9% (31/195) at SickKids and 11.3% (9/80) at Lucile Packard Children’s Hospital. Across both institutions, the percentage who stated their understanding of machine learning was “none” or “very little” was 146/275 (53.1%). Conversely, the percentage of respondents who stated they understood statistics conduct and interpretation at a “moderate” level or “fully” was 54.4% (106/195) at SickKids and 42.5% (34/80) at Lucile Packard Children’s Hospital. Across both institutions, the percentage who stated their understanding of statistics was “none” or “very little” was 30/275 (10.9%).
Figure 1. CONSORT (Consolidated Standards of Reporting Trials) diagram of participant identification, selection, and participation. View this figureTable 1. Demographic characteristics of participants at 2 pediatric institutions (N=275).CharacteristicSickKids (n=195), n (%)Lucile Packard Children’s Hospital (n=80), n (%)P valueMale gender93 (47.7)35 (43.8).64Professional roleaaRespondent may choose more than 1 option and thus, numbers do not add to 100%.
Table 2. Self-rating of knowledge of artificial intelligence and understanding of machine learning and statistics.AreasSickKids (n=195), n (%)Lucile Packard Children’s Hospital (n=80), n (%)P-valueArtificial intelligence knowledgereveals the proportion of respondents who ranked each attribute as important (ranked first or second among the 5 attributes) for prioritization of machine learning models. There were no significant differences in these proportions by institution for any of the 5 attributes (). Across both sites, the most common important attributes were risk stratification leading to different actions (205/275, 74.5%) and clinical problem causes substantial morbidity or mortality (177/275, 64.4%). The attributes considered least important were “implementing the model could reduce physician workload” (40/275, 14.5%) and “implementing the model could save money” (13/275, 4.7%). The median importance scores for both institutions combined are also shown in (where lower is more important).
Table 3. Ranked as importanta by respondents for prioritization of machine learning.Attributes considered importantSickKids (n=195), n (%)Lucile Packard Children’s Hospital (n=80), n (%)P-valueMedian importance score (IQR)bThe clinical problem being solved is common66 (33.8)35 (43.8).163 (2-3)The clinical problem causes substantial morbidity or mortality133 (68.2)44 (55.0).052 (2-3)Risk stratification would lead to different clinical actions that could reasonably improve patient outcomes145 (74.4)60 (75.0)>.991 (1-2)Implementing the model could reduce physician workload29 (14.9)11 (13.8).964 (3-4)Implementing the model could save money11 (5.6)2 (2.5).425 (4-5)aImportant defined as attributes ranked as most important or second most important (rank of 1 or 2) in terms of whether a machine learning model would be useful.
bAcross both institutions.
shows the themes and subthemes from the qualitative interviews. Perceived benefits of machine learning model implementation included facilitating decision making in complex scenarios, supporting less experienced clinicians, reducing cognitive load, and reducing cognitive bias. It was also expressed that machine learning models can potentially improve the quality of care through standardization, more effective triage, and facilitating precision medicine. Finally, machine learning models had the potential to reduce physician workload. However, perceived challenges of machine learning model implementation included the potential for algorithmic bias, lack of transparency and trust, and failure to incorporate clinical expertise. Machine learning model implementation might also adversely affect quality of care and respondents spoke about the need to evaluate the impact of machine learning model implementation. Practical concerns raised about machine learning model implementation included challenges incorporating the model into the clinical workflow and questions about accountability in the event of poor outcomes arising from machine learning model–directed actions. Finally, uncertainty about the physician’s role was identified. When asked to prioritize 1 clinical scenario for machine learning model implementation, the rationale for choosing which scenario to implement consistently related to impact on patient outcomes: “most benefit to kids,” “leading cause of death,” and “implications can be extremely serious.”
illustrates examples of clinical areas that could be prioritized for machine learning initiatives identified from the quantitative survey.
Table 4. Perspectives of machine learning implementation in pediatric medicine from qualitative interviews.Themes and subthemesExample quotationsBenefits of machine learning implementationIn this mixed methods study, we found that the attributes most commonly listed as important for machine learning model implementation were risk stratification leading to different actions that could reasonably improve patient outcomes and a clinical problem that causes substantial morbidity or mortality. Few respondents considered reducing physician workload and saving money as important. We also found that important attributes were similar at the 2 institutions despite different levels of biomedical informatic program establishment and different health care systems.
The wide range of recommended areas for machine learning model implementation highlights the need for prioritization given the likely limited capacity to develop, deploy, and monitor machine learning models, even at large institutions with mature bioinformatics programs. This study is important as it provides a framework by which institutional leaders could make decisions about which machine learning models to prioritize for implementation. While we found that risk stratification that improves patient outcomes was the most common important attribute, additional considerations include actions that would arise from high- and low-risk labels, evidence that differential actions will improve outcomes, and identifying ideal thresholds for risk categorization. Even once a model is deployed, ongoing monitoring of model performance and the impact of model deployment on patient care and clinical workflows are additional postimplementation considerations.
While we evaluated attribute importance across respondent types, Wears and Berg [] previously discussed the complex relationship between decision makers, beneficiaries of a machine learning solution, and those who shoulder the burden of implementation. They noted that a mismatch between these individuals can lead to failure. More specifically, it is often the administrator who is the decision maker and recipient of benefits, while it is the clinician who often shoulders the burden of implementation []. Anticipation and acknowledgement of conflicting perspectives will be required during the prioritization process among stakeholder types.
We also found that across both institutions, respondents had greater confidence in their understanding of statistics and relatively lower confidence in their understanding of machine learning. These perspectives did not differ between the 2 institutions despite different levels of establishment of their biomedical informatic programs. Our results suggest that across pediatric medicine in general, more education focused on machine learning is required during training and continuing education.
Our results complement the work of others who have highlighted the requirements of clinical decision support including those based on machine learning. Items important to consider include the need to avoid black boxes, excessive time requirement, and complexity in addition to ensuring relevance, respect, and scientific validity [-]. It also accompanies work demonstrating that barriers to adoption of artificial intelligence are not restricted to clinicians but also include parents [,]. It may also be useful to compare our findings with studies conducted outside of pediatric medicine. We found that the main anticipated benefits of machine learning implementation were facilitation of decision making, improvement in quality of care, and reduction in physician workload. Compared with our findings, benefits and challenges associated with artificial intelligence were similar in ophthalmology, dermatology, radiology, optometry, and surgery [,]. However, our study is unique because of the consideration of how to prioritize problems for implementation, a pragmatic consideration in developing a clinical program. In addition, the focus on pediatrics may be important as the nature of clinical problems, perspectives, and stakeholders can differ between pediatric and adult patient populations.
The strengths of this study include its mixed methods design and inclusion of 2 different pediatric institutions by country and establishment of their biomedical informatic programs. However, our results should be interpreted in light of their limitations. We had a relatively low response rate; respondents were likely biased in favor of interest in machine learning. Thus, nonrespondents likely would have had lower familiarity with machine learning and likely would have had less strong opinions about attributes considered important for machine learning prioritization. We also had a greater proportion of physicians than system leaders or data scientists; these groups may have different priorities or implementation concerns.
In conclusion, respondents prioritized machine learning model implementation where risk stratification would lead to different actions and clinical problems that caused substantial morbidity and mortality. Implementations that improved patient outcomes were prioritized. These results can help provide a framework for prioritizing machine learning model implementation.
None declared.
Edited by C Lovis; submitted 02.06.22; peer-reviewed by S Ramgopal, H Hochheiser; comments to author 07.09.22; revised version received 15.09.22; accepted 10.10.22; published 17.11.22
©Natasha Alexander, Catherine Aftandilian, Lin Lawrence Guo, Erin Plenert, Jose Posada, Jason Fries, Scott Fleming, Alistair Johnson, Nigam Shah, Lillian Sung. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 17.11.2022.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.
留言 (0)