A deep-learning algorithm to classify skin lesions from mpox virus infection

Ethical oversight was provided by the Stanford institutional review board (Protocol: 36050, 67068 and 66980). In this study, we evaluated publicly available images and clinical images acquired prospectively from patients with a laboratory-confirmed MPXV infection at the Stanford University Medical Center. Informed consent was obtained from patients for clinical images, but not for images sourced from publicly available datasets and repositories as it was not required after having received permission to use the images from the database manager(s). We followed the MINimum Information for Medical AI Reporting34 recommendations for reporting (1) data source, (2) detailed information on model architecture and development and (3) approaches to optimize, evaluate and validate the model performance.

Data sources

To train and test the MPXV-CNN, we constructed a new dataset of photographic images of skin diseases (n = 139,198) originating from multiple publicly available sources, an institutional cohort (Esteva Dataset)13 and patients (Fig. 1): 676 images of MPXV skin lesions were aggregated from publications of the scientific literature, encyclopedia articles, news articles, social media (Twitter) and the prospective cohort (MPXV dataset) and 138,522 images of non-MPXV skin lesions (non-MPXV dataset) from five dermatological repositories and three datasets (Table 1). Patients of the prospective cohort were recruited from the Stanford University Medical Center between July and August 2022. We included all patients with a laboratory-confirmed MPXV infection and visible skin lesions. We excluded patients who received any prior treatment due to their MPXV infection. Skin lesion images were taken from all affected body regions with a smartphone camera by a healthcare professional. The original Esteva dataset has been improved since its initial release and received several rounds of data cleansing. We identified duplicate images in the MPXV and non-MPXV datasets by comparing the visual contents of the images using a conservative cutoff value of 80% for similarity. We provided instructions for obtaining publicly available MPXV and non-MPXV images in Data Availability. A bibliography of sources with MPXV images and a list of URLs to non-MPXV images of Danderm, DermIS and HDA were provided as Supplementary Notes 1 and 2.

Image selection and annotation

We observed a higher number of duplicate images in the Esteva dataset and the other non-MPXV datasets of this study (n = 45,440). We excluded images (total n = 47,518) from the MPXV dataset (n = 36) and non-MPXV dataset (n = 47,554) if the following criteria were met: absence of a skin lesion or rash, containing more than one photographic image, showing surgical or other medical interventions, nonphotographic images such as histopathology slides or radiology imaging, duplicate image or inaccessibility. We performed a reverse image search for all MPXV skin lesion images sourced from social media and excluded images that had been published previously in another context. We manually labeled the MPXV dataset for the age group (child: < 18 years, adult: ≥ 18 years, unknown), sex (male, female, unknown), skin tone (type I–VI, Fitzpatrick scale35), continent where the image was taken (Europe, Africa, Asia, South America, North America, Antarctica, Australia, unknown), number of skin lesions (n up to 50, more than 50 lesions were labeled as 50, and highly coalesced lesions as unknown), body region of the skin lesion(s) (head, neck, torso, upper extremity, lower extremity, anogenital, multiple locations, zoomed in/unknown), duration of skin lesion presence (less than 7 d, 7 d or more, unknown) and association with the 2022 MPXV outbreak (yes/no), defined as the publication of the image after May 1, 2022. For the prospective cohort, sex was defined as sex at birth self-reported by the patient. For other sources, sex was defined as reported in the textual information of the source. If no information on sex was reported, sex was assigned following evaluation of the image if sexual anatomy was visible. If the age information was not available, we labeled the age group of the individual from the image using a panel and labeled the age group as unknown if no consensus could be reached. We labeled MPXV images as coalesced if at least two MPXV lesions had grown together (yes/no or not applicable for MPXV rash). We evaluated the diagnoses found in the metadata of the Fitzpatrick 17k, PAD-UFES-20, DermNet and Esteva datasets and scraped metadata from websites of Danderm, DermIS, HDA, DermNet NZ repositories. To enable evaluations of non-MPXV diagnoses of all repositories and datasets, we mapped all diagnoses to a taxonomy of 2,032 individual skin diseases and classified them into nine main categories (benign dermal tumors, cysts, sinuses; cutaneous lymphoma and lymphoid infiltrates; epidermal tumors, hamartomas and milia; epidermal premalignant and malignant tumors; genodermatoses and supernumerary growths; inflammatory; malignant dermal tumor; pigmented benign lesions; pigmented malignant lesions) previously developed at our institute13. All diagnoses were classified as acute or chronic (defined as a persistent, progressive or recurring disease). Diagnoses with the possibility of acute and chronic courses were classified as acute. We specifically analyzed differential diagnoses with a similar appearance: varicella, drug-induced allergies, impetigo, measles, orf, molluscum contagiosum, scabies and syphilis. Where available, we evaluated information in the non-MPXV datasets and repositories in regard to the age group, sex, skin tone and location of the skin lesion(s) using identical definitions as for MPXV lesions.

Data splitting

After image filtering, there were 676 images for MPXV lesions and 138,522 images for non-MPXV lesions. We split these images into training and testing cohorts. The training cohort was used for training, hyperparameter tuning and internal validation, while the testing cohort was used as a hold-out dataset for external validation. For the MPXV lesions, we used 63 skin lesion images from the Stanford University Medical Center, 87 images from a recent publication with the largest MPXV case series to date from 16 countries4 and 8 images from a publication showing MPXV skin lesions in different stages36 as the MPXV testing cohort (total n = 158). The remaining MPXV images (n = 518) were used as the training cohort. While the training cohort contained skin lesion images of the 2022 MPXV outbreak and before, the testing cohort only contained images of the 2022 MPXV outbreak. In the training cohort, we used MPXV images sourced from publications of the scientific literature, news articles and social media. In the testing cohort, we exclusively used MPXV images with a laboratory-confirmed MPXV infection originating from publications and patients from our own institute. For the non-MPXV lesions, we used images (n = 12,045) from the DermNet NZ repository in the training cohort, due to the high number of available pictures, known ratios of sex and age groups and a high variety of diagnoses, races and origins. The remaining non-MPXV images (n = 126,477) were used in the testing cohort. For internal validation, we split the training cohort into 80% for training and 20% for validation.

Image processing and training algorithm

We treated the problem as a binary image classification task for which the model aimed to predict whether a provided photographic image was an MPXV or non-MPXV skin lesion. Several challenges were encountered while developing a robust classification model. First, because the images were collected from different sources such as publications of the scientific literature, encyclopedias, news articles and social media, there was high variability in image features, such as resolution, lighting, angle, zoom, color profiles and filters. Second, despite our best efforts, the number of images collected for the MPXV cases was much smaller compared to the non-MPXV cases. Therefore, the class distribution was highly imbalanced, which caused bias in the predictions toward the majority class (that is, non-MPXV).

To overcome these issues, we incorporated several strategies into image processing, model selection and training algorithms. First, we made use of data augmentation. All images were first resized to 448 × 448 pixels in size, and we then performed random cropping and resizing (224 × 224 pixels), random horizontal flip, random rotation (max degree = 360°), random zoom (max scale = 1.1), perspective warping (max value = 0.2), random brightness and contrast, random affine transformations and random reflections. This data augmentation was performed on both MPXV and non-MPXV images in the training cohort to account for the aforementioned high image variation. Secondly, we pursued a Transfer Learning strategy using a pretrained model, which was later fine-tuned on our domain-specific data. We experimented with a variety of different CNN architectures implementing Transfer Learning, including ResNet18 (ref. 37), ResNet34 (ref. 37), ResNet50 (ref. 37), Resnet152 (ref. 37), DenseNet169 (ref. 38) and VGG19_bn39. We adopted the ResNet34 (ref. 37) CNN architecture, where the weights of the model were initialized using the weights of a model pretrained on ImageNet40 (approximately 14 million images), and we fine-tuned the model using our images of skin lesions. Third, we implemented a weighted categorical cross-entropy loss to account for class imbalance. Because the number of images for MPXV skin lesions was lower than the number of non-MPXV skin lesions, we assigned a higher class weight to MPXV skin lesions in the cost function of the training algorithm so that it could provide a higher penalty to the misclassification of the minority class. To find the optimal pair of class weight for the MPXV and non-MPXV skin lesions, we tested different weight pairs W, where W ∈ . Using each different W, we fine-tuned the model for one epoch on the last layer and 20 epochs on all layers. The minibatch size was set to 64 and the base learning rate lr was set to 0.002. We computed the cross-entropy loss, sensitivity, specificity and AUC for the validation set. The optimal performance was achieved with a class weight W of (1.0, 0.01). Finally, to qualitatively verify that the MPXV-CNN learned to detect MPXV lesions, we generated explanation maps on a subset of images in the testing cohort using SHAP25. This method quantitatively annotated which image area(s) are critical for the final decision made from the MPXV-CNN.

Algorithm evaluationCross-validation

We carried out stratified fivefold cross-validation, where images from the training cohort were split into 80% for training and 20% for validation. Because images from the same source may originate from the same patient and share similar image features, we grouped images by the source such that MPXV images coming from the same patient were not split between the training and validation sets. Running the cross-validation for only a single time may result in a noisy estimate of model performance because different splits of the data may result in different results. Therefore, we repeated the cross-validation five times. In each repeat, we shuffled the order of images so that we could implement a different split of the dataset into the k(5)-folds.

Evaluation metrics

To evaluate our model performance, we used three metrics: sensitivity, specificity and AUC score. For each repeat of the fivefold cross-validation, we averaged the scores evaluated from each fold, and we reported the mean and standard deviation of scores obtained from the five repeats.

Explainability

SHAP25 uses game theoretic approaches to calculate the importance of a feature when the model makes a specific prediction. A higher SHAP value indicates higher importance of the feature. To approximate SHAP values, we used the Gradient Explainer, which explains a model using expected gradients (an extension of integrated gradients41). We applied the explainer to the final model trained on the entire training cohort and used it to generate the SHAP values of the MPXV images from the testing cohort. The SHAP values were then overlaid on the gray-scaled images for visualization.

Development of the PRS

We developed a web-based app named ‘PoxApp’ that implemented a PRS for MPXV patient guidance. The source code was derived from an open-source PRS that we previously created for the SARS-CoV-2 pandemic42. Because the original PRS was purely survey-based, extensive development was necessary to integrate a mobile version of the MPXV-CNN. Survey questions and logical expression were derived from WHO case definitions for suspected and probable MPXV cases,11 and we added an AI-assisted case definition based on the MPXV-CNN classification. Because many MPXV patients developed lesions in the anogenital region, privacy concerns might be a major issue for users when uploading images to the PRS. To increase user acceptance, we, therefore, made design decisions that allowed anonymous usage of the PRS. The PRS had the following components (Fig. 5).

Integrated development environment

We developed a web-based integrated development environment (IDE) to create and update PoxApp’s survey, the MPXV-CNN and logical expressions for MPXV infection risk estimation and personalized recommendations (Fig. 5a). We developed a module for picture-taking that could be integrated into the survey. Using the IDE’s script language, we translated clinical expert knowledge to logical expressions to estimate the risk of an MPXV infection from survey answers and the MPXV-CNN classification. We created personalized recommendations according to the estimated risk of infection. Using an application programming interface, the survey, MPXV-CNN, logical expressions and personalized recommendations were sent to web-based apps.

Web-based app

We developed a web-based app named PoxApp for end users to answer survey questions, take photos of their skin lesion(s) and get personalized recommendations (Fig. 5b). PoxApp could be used from web-enabled devices such as smartphones, tablets or personal computers. A built-in engine used the computing power of the user device to execute logical expression and the MPXV-CNN. This resulted in two key advantages as follows: (1) because the user data was analyzed locally on the user device, there was no need to send survey answers and images to external servers resulting in maximum data privacy; and (2) the system was scalable to a high number of users at a relatively low cost because no expensive servers with high computational power were necessary. We aimed to release PoxApp in the United States and Germany. For this reason, we translated PoxApp’s user interface to English and German and adapted the Terms of Use and Privacy Policies to the US and European jurisdictions.

Data donation service

We developed a data donation service, so users of PoxApp could volunteer to donate their answers and skin lesion images (Fig. 5c). The data donation service removed personal identifiers such as an IP address and forwarded the anonymized information to a database server. The donated data could potentially be used to generate next-generation MPXV-CNNs with higher performance (Fig. 5d).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

留言 (0)

沒有登入
gif