Transparent medical image AI via an image–text foundation model grounded in medical literature

Daneshjou, R., Yuksekgonul, M., Cai, Z. R., Novoa, R. & Zou, J. Y. SkinCon: a skin disease dataset densely annotated by domain experts for fine-grained debugging and analysis. In Advances in Neural Information Processing Systems (eds Koyejo, S. et al.) 18157–18167 (Curran Associates, Inc., 2022).

Mendonça, T., Ferreira, P. M., Marques, J. S., Marcal, A. R. & Rozeira, J. PH 2-A dermoscopic image database for research and benchmarking. In 35th Annual International Conference of the IEEE 5437–5440 (Engineering in Medicine and Biology Society, 2013).

Kawahara, J., Daneshvar, S., Argenziano, G. & Hamarneh, G. Seven-point checklist and skin lesion classification using multitask multimodal neural nets. IEEE J. Biomed. Health Inform. 23, 538–546 (2019).

Article  Google Scholar 

Nevitt, M., Felson, D. & Lester, G. The Osteoarthritis Initiative. Protocol for the cohort study V 1.1 6.21.06 (accessed 1 Nov 2023); https://nda.nih.gov/static/docs/StudyDesignProtocolAndAppendices.pdf

Radford, A. et al. Learning transferable visual models from natural language supervision. In Proc. 38th International Conference on Machine Learning (eds Meila, M. & Zhang, T.) 8748–8763 (PMLR, 2021).

Groh, M. et al. Evaluating deep neural networks trained on clinical images in dermatology with the Fitzpatrick 17k dataset. In IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 1820–1828 (IEEE, 2021).

Daneshjou, R. et al. Disparities in dermatology AI performance on a diverse, curated clinical image set. Sci. Adv. 8, eabq6147 (2022).

Article  PubMed  PubMed Central  Google Scholar 

Gutman, D. et al. Skin lesion analysis toward melanoma detection: a challenge at the International Symposium on Biomedical Imaging (ISBI) 2016, hosted by the International Skin Imaging Collaboration (ISIC). Preprint at https://arxiv.org/abs/1605.01397 (2016).

Codella, N. C. F. et al. Skin lesion analysis toward melanoma detection: a challenge at the 2017 International Symposium on Biomedical Imaging (ISBI), hosted by the International Skin Imaging Collaboration (ISIC). In 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018) 168–172 (ISBI, 2018).

Codella, N. et al. Skin lesion analysis toward melanoma detection 2018: a challenge hosted by the International Skin Imaging Collaboration (ISIC). Preprint at https://arxiv.org/abs/1902.03368 (2019).

Tschandl, P., Rosendahl, C. & Kittler, H. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Sci. Data https://doi.org/10.1038/sdata.2018.161 (2018).

Combalia, M. et al. BCN20000: dermoscopic lesions in the wild. Preprint at https://arxiv.org/abs/1908.02288 (2019).

Rotemberg, V. et al. A patient-centric dataset of images and metadata for identifying melanomas using clinical context. Sci. Data https://doi.org/10.1038/s41597-021-00815-z (2021).

Memorial Sloan Kettering Cancer Center. Consecutive biopsies for melanoma across year 2020. ISIC Archive https://doi.org/10.34970/151324 (2022).

Article  Google Scholar 

Marchetti, M. A. et al. Prospective validation of dermoscopy-based open-source artificial intelligence for melanoma diagnosis (PROVE-AI study). npj Digit. Med. 6, 127 (2023).

Article  PubMed  PubMed Central  Google Scholar 

Ricci Lara, M. A. et al. A dataset of skin lesion images collected in Argentina for the evaluation of AI tools in this population. Sci. Data 10, 712 (2023).

Article  PubMed  PubMed Central  Google Scholar 

He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 770–778 (IEEE, 2016).

Tiu, E. et al. Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning. Nat. Biomed. Eng. https://doi.org/10.1038/s41551-022-00936-9 (2022).

Agbai, O. N. et al. Skin cancer and photoprotection in people of color: a review and recommendations for physicians and the public. J. Am. Acad. Dermatol. 70, 748–762 (2014).

Article  PubMed  Google Scholar 

Sierro, T. J. et al. Differences in health care resource utilization and costs for keratinocyte carcinoma among racioethnic groups: a population-based study. J. Am. Acad. Dermatol. 86, 373–378 (2022).

Article  PubMed  Google Scholar 

DeGrave, A. J., Janizek, J. D. & Lee, S.-I. AI for radiographic COVID-19 detection selects shortcuts over signal. Nat. Mach. Intell. 3, 610–619 (2021).

Article  Google Scholar 

Janizek, J. D., Erion, G., DeGrave, A. J. & Lee, S.-I. An adversarial approach for the robust classification of pneumonia from chest radiographs. In Proc. ACM Conference on Health, Inference, and Learning (ed. Ghassemi, M.) 69–79 (Association for Computing Machinery, 2020).

Bissoto, A., Fornaciali, M., Valle, E. & Avila, S. (De) Constructing bias on skin lesion datasets. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2766–2774 (IEEE, 2019).

Cassidy, B., Kendrick, C., Brodzicki, A., Jaworek-Korjakowska, J. & Yap, M. H. Analysis of the ISIC image datasets: usage, benchmarks and recommendations. Med. Image Anal. 75, 102305 (2022).

Article  PubMed  Google Scholar 

Winkler, J. K. et al. Association between surgical skin markings in dermoscopic images and diagnostic performance of a deep learning convolutional neural network for melanoma recognition. JAMA Dermatol. 155, 1135–1141 (2019).

Article  PubMed  PubMed Central  Google Scholar 

Navarrete-Dechent, C., Liopyris, K. & Marchetti, M. A. Multiclass artificial intelligence in dermatology: progress but still room for improvement. J. Invest. Dermatol. 141, 1325–1328 (2021).

Article  CAS  PubMed  Google Scholar 

Daneshjou, R., Smith, M. P., Sun, M. D., Rotemberg, V. & Zou, J. Lack of transparency and potential bias in artificial intelligence data sets and algorithms: a scoping review. JAMA Dermatol. 157, 1362–1369 (2021).

Article  PubMed  PubMed Central  Google Scholar 

Massie, J. P. et al. Patient representation in medical literature: are we appropriately depicting diversity? Plast. Reconstr. Surg. Glob. Open 7, e2563 (2019).

Article  PubMed  PubMed Central  Google Scholar 

Massie, J. P. et al. A picture of modern medicine: race and visual representation in medical literature. J. Natl Med. Assoc. 113, 88–94 (2021).

PubMed  Google Scholar 

Lester, J., Jia, J., Zhang, L., Okoye, G. & Linos, E. Absence of images of skin of colour in publications of COVID19 skin manifestations. Br. J. Dermatol. 183, 593–595 (2020).

Article  CAS  PubMed  PubMed Central  Google Scholar 

Louie, P. & Wilkes, R. Representations of race and skin tone in medical textbook imagery. Soc. Sci. Med. 202, 38–42 (2018).

Article  PubMed  Google Scholar 

Groh, M., Harris, C., Daneshjou, R., Badri, O. & Koochek, A. Towards transparency in dermatology image datasets with skin tone annotations by experts, crowds, and an algorithm. In Proc. ACM Human Computer Interactions 1–26 (Association for Computing Machinery, 2022).

Rajpurkar, P. et al. MURA: large dataset for abnormality detection in musculoskeletal radiographs. In 1st Conference on Medical Imaging with Deep Learning (MIDL, 2018).

Singh, C., Balakrishnan, G. & Perona, P. Matched sample selection with GANs for mitigating attribute confounding. Preprint at https://arxiv.org/abs/2103.13455 (2021).

Leming, M., Das, S. & Im, H. Construction of a confounder-free clinical MRI dataset in the Mass General Brigham system for classification of Alzheimer’s disease. Artif. Intell. Med. 129, 102309 (2022).

Article  PubMed  PubMed Central  Google Scholar 

Zhao, Q., Adeli, E. & Pohl, K. M. Training confounder-free deep learning models for medical applications. Nat. Commun. 11, 6010 (2020).

Goel, K., Gu, A., Li, Y. & Ré, C. Model patching: closing the subgroup performance gap with data augmentation. In 9th International Conference on Learning Representations (ICLR, 2021); https://openreview.net/forum?id=9YlaeLfuhJF

Sagawa, S., Koh, P. W., Hashimoto, T. B. & Liang, P. Distributionally robust neural networks. In International Conference on Learning Representations (ICLR, 2020); https://dblp.org/rec/conf/iclr/SagawaKHL20.html?view=bibtex

Oakden-Rayner, L., Dunnmon, J., Carneiro, G. & Re, C. Hidden stratification causes clinically meaningful failures in machine learning for medical imaging. In Proc. ACM Conference on Health, Inference, and Learning ACM CHIL ’20 (ed. Ghassemi, M.) 151–159 (ACM, 2020).

Zhu, J., Park, T., Isola, P. & Efros, A. A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In IEEE International Conference on Computer Vision, ICCV 2017 2242–2251 (IEEE Computer Society, 2017).

Jones, O. T. et al. Artificial intelligence and machine learning algorithms for early detection of skin cancer in community and primary care settings: a systematic review. Lancet Digit. Health 4, e466–e476 (2022).

Article  CAS  PubMed  Google Scholar 

Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. In Proc. 31st International Conference on Neural Information Processing Systems (eds Guyon, I. et al.) 4768–4777 (Curran Associates Inc., 2017).

Sundararajan, M., Taly, A. & Yan, Q. Axiomatic attribution for deep networks. In Proc. 34th International Conference on Machine Learning (eds Precup, D. & Teh, Y. W.) 3319–3328 (PMLR, 2017).

Selvaraju, R. R. et al. Grad-CAM: visual explanations from deep networks via gradient-based localization. In 2017 IEEE International Conference on Computer Vision (ICCV) 618–626 (IEEE, 2017).

Kim, B. et al. Interpretability beyond feature attribution: quantitative testing with concept activation vectors (TCAV). In Proc. 35th International Conference on Machine Learning (eds Dy, J. G. & Krause, A.) 2668–2677 (PMLR, 2018).

Crabbé, J. & van der Schaar, M. Concept activation regions: a generalized framework for concept-based explanations. In Advances in Neural Information Processing Systems (eds Koyejo, S. et al.) 2590–2607 (Curran Associates Inc., 2022).

Abid, A., Yuksekgonul, M. & Zou, J. Meaningfully debugging model mistakes using conceptual counterfactual explanations. In Proc. 39th International Conference on Machine Learning (eds Chaudhuri, K. et al.) 66–68 (PMLR, 2022).

Eyuboglu, S, et al. Domino: discovering systematic errors with cross-modal embeddings. In 10th International Conference on Learning Representations, ICLR 2022 (ICLR, 2022); https://openreview.net/forum?id=FPCMqjI0jXN

Chung, Y., Kraska, T., Polyzotis, N., Tae, K. & Whang, S. Automated data slicing for model validation: a big data–AI integration approach. In IEEE Transactions on Knowledge and Data Engineering 2284–2296 (IEEE, 2020).

DeGrave, A. J., Cai, Z. R., Janizek, J. D., Daneshjou, R. & Lee, S.-I. Auditing the inference processes of medical-image classifiers by leveraging generative AI and the expertise of physicians. Nat. Biomed. Eng. https://doi.org/10.1038/s41551-023-01160-9 (2023).

Reyes, M. et al. On the interpretability of artificial intelligence in radiology: challenges and opportunities. Radiol. Artific. Intell. 2, e190043 (2020).

Article  Google Scholar 

Arun, N. et al. Assessing the trustworthiness of saliency maps for localizing abnormalities in medical imaging. Radiol. Artific. Intell. 3, e200267 (2021).

Article  Google Scholar 

Han, S. S. et al. The degradation of performance of a state-of-the-art skin image classifier when applied to patient-driven internet search. Sci. Rep. 12, 16260 (2022).

Navarrete-Dechent, C. et al. Automated dermatological diagnosis: hype or reality? J. Invest. Dermatol. 138, 2277–2279 (2018).

Koh, P. W. et al. Concept bottleneck models. In Proc. 37th International Conference on Machine Learning International Conference on Machine Learning 5338–5348 (PMLR, 2020).

Yuksekgonul, M., Wang, M. & Zou, J. Post-hoc concept bottleneck models. In The Eleventh International Conference on Learning Representations (ICLR, 2023); https://dblp.org/rec/conf/iclr/YuksekgonulW023.html?view=bibtex

Rigel, D. S., Friedman, R. J., Kopf, A. W. & Polsky, D. ABCDE—an evolving concept in the early detection of melanoma. Arch. Dermatol. 141, 1032–1034 (2005).

Article  PubMed  Google Scholar 

Huang, Z., Bianchi, F., Yuksekgonul, M., Montine, T. J. & Zou, J. A visual–language foundation model for pathology image analysis using medical Twitter. Nat. Med. 29, 2307–2316 (2023).

Wang, Z., Wu, Z., Agarwal, D. & Sun, J. MedCLIP: contrastive learning from unpaired medical images and text. In Proc. 2022 Conference on Empirical Methods in Natural Language Processing (eds Goldberg, Y. et al.) 3876–3887 (Association for Computational Linguistics, 2022).

Singhal, K. et al. Large language models encode clinical knowledge. Nature 620, 172–180 (2023).

Jiang, L. Y. et al. Health system-scale language models are all-purpose prediction engines. Nature 619, 357–362 (2023).

Combalia, M. et al. Validation of artificial intelligence prediction models for skin cancer diagnosis using dermoscopy images: the 2019 International Skin Imaging Collaboration Grand Challenge. Lancet Digit. Health 4, e330–e339 (2022).

Article  CAS  PubMed  PubMed Central  Google Scholar 

Corbin, C. K. et al. DEPLOYR: a technical framework for deploying custom real-time machine learning models into the electronic medical record. J. Am. Med. Inform. Assoc. 30, 1532–1542 (2023).

Article  PubMed  PubMed Central  Google Scholar 

Coalition for Health AI. Blueprint for Tustworthy AI Implementation Guidance and Assurance for Healthcare (MITRE Corporation, 2023); https://tinyurl.com/CHAI-paper

Bedoya, A. D. et al. A framework for the oversight and local deployment of safe and high-quality prediction models. J. Am. Med. Inform. Assoc. 29, 1631–1636 (2022).

Article  PubMed  PubMed Central  Google Scholar 

Pianykh, O. S. et al. Continuous learning AI in radiology: implementation principles and early applications. Radiology 297, 6–14 (2020).

Article  PubMed  Google Scholar 

Feng, J. et al. Clinical artificial intelligence quality improvement: towards continual monitoring and updating of AI algorithms in healthcare. npj Digit. Med. 5, 66 (2022).

Article  PubMed  PubMed Central  Google Scholar 

Vokinger, K. N., Feuerriegel, S. & Kesselheim, A. S. Continual learning in medical devices: FDA’s action plan and beyond. Lancet Digit. Health 3, e337–e338 (2021).

Article  CAS  PubMed  Google Scholar 

PMC Open Access Subset. National Library of Medicine www.ncbi.nlm.nih.gov/pmc/tools/openftlist (2022).

Gamper, J. & Rajpoot, N. M. Multiple instance captioning: learning representations from histopathology textbooks and articles. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021 16549–16559 (Computer Vision Foundation/IEEE, 2021).

Huang, G., Liu, Z., Maaten, L. V. D. & Weinberger, K. Q. Densely connected convolutional networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2261–2269 (IEEE Computer Society, 2017).

Tan, M. & Le, Q. V. EfficientNetV2: smaller models and faster training. In Proc. 38th International Conference on Machine Learning, ICML 2021 (eds Meila, M. & Zhang, T.) 10096–10106 (PMLR, 2021).

Dosovitskiy, A. et al. An image is worth 16 × 16 words: transformers for image recognition at scale. In 9th International Conference on Learning Representations, ICLR 2021 (ICLR, 2021); https://openreview.net/forum?id=YicbFdNTTy

留言 (0)

沒有登入
gif