A number of challenges hinder artificial intelligence (AI) models from effective clinical translation. Foremost among these challenges are: (1) reproducibility or repeatability, which is defined as the ability of a model to make consistent predictions on repeat images from the same patient taken under identical conditions; (2) the presence of clinical uncertainty or the equivocal nature of certain pathologies, which needs to be acknowledged in order to effectively, accurately and meaningfully separate true normal from true disease cases; and (3) lack of portability or generalizability, which leads AI model performance to differ across axes of data heterogeneity. We recently investigated the development of an AI pipeline on digital images of the cervix, utilizing a multi-heterogeneous dataset (“SEED”) of 9,462 women (17,013 images) and a multi-stage model selection and optimization approach, to generate a diagnostic classifier able to classify images of the cervix into “normal”, “indeterminate” and “precancer/cancer” (denoted as “precancer+”) categories. In this work, we investigated the performance of this multiclass classifier on external data (“EXT”) not utilized in training and internal validation, to assess the portability of the classifier when moving to new settings. We assessed both the repeatability and classification performance of our classifier across the two axes of heterogeneity present in our dataset: image capture device and geography, utilizing both out-of-the-box inference and retraining with “EXT”. Our results indicate strong repeatability of our multiclass model utilizing Monte-Carlo (MC) dropout, which carries over well to “EXT” (95% limit of agreement range = 0.2 - 0.4) even in the absence of retraining, as well as strong classification performance of our model on “EXT” that is achieved with retraining (% extreme misclassifications = 4.0% for n = 26 “EXT” individuals added to “SEED” in a 2n normal : 2n indeterminate : n precancer+ ratio), and incremental improvement of performance following retraining with images from additional individuals. We additionally find that device-level heterogeneity affects our model performance more than geography-level heterogeneity. Our work supports both (1) the development of comprehensively designed AI pipelines, with design strategies incorporating multiclass ground truth and MC dropout, on multi-heterogeneous data that are specifically optimized to improve repeatability, accuracy, and risk stratification; and (2) the need for optimized retraining approaches that address data heterogeneity (e.g., when moving to a new device) to facilitate effective use of AI models in new settings.
Competing Interest StatementThe authors have declared no competing interest.
Funding StatementThis project has been funded in whole or in part with federal funds from the National Cancer Institute, National Institutes of Health, under Contract No. 75N91019D00024, Task Order 75N91019F00134, as part of the National Cancer Institute Cancer Cures Moonshot Initiative. No commercial support was obtained. Brian Befano was supported by NCI/NIH under Grant T32CA09168. The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the U.S. Government.
Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Ethics committee/IRB of the National Cancer Institute (NCI) and the National Institutes of Health (NIH) gave ethical approval for this work.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
留言 (0)