Digital pathology-based artificial intelligence models for differential diagnosis and prognosis of sporadic odontogenic keratocysts

Data collection

The flowchart for the cohorts used in this study is shown in Fig. 1. A total of 543 cases, encompassing OKC, OOC, and GS, were obtained from Peking University Hospital of Stomatology between 2000 and 2020. Of these, 24 cases were excluded due to unclear or faded H&E staining. The remaining 519 cases, along with a total of 2 157 H&E-stained slides, were then randomly assigned to the training and testing cohorts. The training cohort comprised 363 cases, while the testing cohort had 156 cases. This division was made in a 7:3 ratio to facilitate the development of the diagnostic model (Supplementary Table 4). Four hundred cases of OKC were randomly assigned into two groups: the training cohort (280 cases) and the testing cohort (120 cases), in order to develop a prognostic model of OKC using a ratio of 7:3. The baseline data of the prognostic model and diagnostic model were shown in Supplementary Tables 3 and 4. All H&E-stained slides were scanned using a NanoZoomer for digital whole slide imaging (WSI) and then exported to NDPI by NDPView2 software. The Institutional Ethics Board of Peking University Hospital of Stomatology approved this study.

Data processing

To address the significant issue of managing extensive digitally processed images, we have implemented a systematic pre-processing strategy. This has involved segmenting WSIs into smaller 512 × 512 pixels tiles, resulting in over 2.5 million patches. The non-overlapping partitioning approach adhered strictly to a resolution of 0.5 μm/pixel. Our main objective during this process was to secure high-quality data. To achieve this objective, we utilized a white background removal tool of OnekeyAI platform that is based on deep learning models.

In addition, we applied the Macenko method to normalize the color of small tiles. Moreover, we employed Z-score normalization on the RGB channels to obtain a standard normal distribution of image intensities, which served as input for our model. During training process, we employed online data augmentations, such as randomly flipping horizontally and vertically. Nevertheless, during testing process, we solely utilized normalization.

Deep learning training

Our deep learning process comprised two tiers of predictions: patch-level and WSI-level predictions. Considering the images’ significant size and diversity, we started by segmenting the WSIs into smaller patches. Subsequently, we utilized a multi-instance learning algorithm to consolidate the patch likelihoods, leading to the WSI-level prediction. As the diagnostic model and prognostic model have different purposes, we replicated comparable measures to model the information for these two discrete tasks.

For patch-level predictions, we evaluated the efficacy of the widely recognized neural network, Inception_v3. This convolutional neural network has displayed significant outcomes in the ImageNet classification contest. We aimed to establish the probability of each patch receiving the label corresponding to the respective WSI to which it pertained.

To improve the model’s ability to generalize across heterogeneous cohorts, we implemented transfer learning. This entailed initializing the model’s parameters with pretrained weights from the ImageNet dataset, while retaining the patch-level discriminators’ weights. Afterward, we fine-tuned the entire model using a limited dataset (training set with 363 samples) that had been specifically weakly annotated for our task. By utilizing transfer learning, we successfully utilized the knowledge obtained from ImageNet and tailored it to suit the requirements of our classification problem.

To improve generalization, we meticulously set the learning rate using the cosine decay learning rate algorithm in this study. The learning rate is presented as follows:

$$_^-}=_^+\frac\left(_^-_^\right)\left(1+\cos \left(\frac_}}_}\pi \right)\right)$$

\(_^=0\), \(_^=0.01\), \(_=8\) represent the minimum learning rate, the maximum learning rate, and the number of iteration epochs, respectively. The use of a relatively small Ti is justified by our vast dataset, which comprises over 2.5 million training patches. We also utilize transfer learning algorithms to ensure optimal model fitting. As the backbone component already includes pre-trained parameters, fine-tuning is imperative for effective transfer. Therefore, we fine-tune the backbone component parameters when \(_}=\frac_\). Furthermore, the learning rate for the backbone component is defined as follows:

$$_^}=\left\0 & \text\,_}\le \frac_\\ _^+\frac\left(_^-_^\right)\left(1+\cos \left(\frac_}}_}\pi \right)\right) & \text\,_} > \frac_\end\right.$$

Other hyperparameter configurations are as follows: optimizer—SGD, loss function—softmax cross-entropy, with a batch size of 128. We use the Gridsearch algorithm to search for classical model parameters such as n_estimator and max_depth. In practice, our n_estimator is searched from 10 to 50 with a compensation of 5. max_depth is searched for 2, 3, 4, and 5 to form 40 corresponding search models.

Multi-instance learning for WSI fusion

After the training of our deep learning model, we carried out label predictions and their respective probabilities for all patches. A classifier was then used to aggregate these patch probabilities, resulting in a WSI-level prediction. To collect the patch likelihoods, we made use of two different machine-learning methods:

Patch Likelihood Histogram (PLH) pipeline: In this approach, we used a histogram to represent the distribution of patch likelihoods within the WSI. By discretizing the likelihoods and retaining only one decimal place in the development of diagnostic model, and two decimal places in the development of prognostic model, we effectively captured the distribution of likelihoods, which served as a representation of the WSI.

Bag of Words (BoW) pipeline: Building on both histogram-based and vocabulary-based techniques, the BoW pipeline utilized a term frequency-inverse document frequency (TF-IDF) mapping for every patch, which resulted in TF-IDF feature vectors that represented the WSIs. These feature vectors were subsequently employed for training conventional machine learning classifiers to predict the status in each WSI.

By deploying two independent pipelines, we successfully amalgamated the initially dispersed patch-level predictions, producing WSI-level features. These features furnish significant information for subsequent analytical operations.

Signature building

In this study, final patient representations were constructed utilizing patch-level predictions, probability histograms, and TF-IDF features in combination. Initially, a t test statistical analysis was carried out to pinpoint statistically significant pathology features with the purpose of refining the feature selection process for both diagnostic model and prognostic model. Then we utilized machine learning algorithms, such as support vector machines (SVM), tree-based models, such as random forests and extremely randomized trees (ExtraTrees), extreme Gradient Boosting (XGBoost), and light gradient boosting machine (LightGBM), as well as multilayer perceptron (MLP), to develop our models. Each model is explained in further detail below:

Random forest is an integrated learning technique that generates predictions by constructing and merging numerous decision trees. The number of trees in the forest is defined by the parameter of n_estimatores, while max_depth determines the maximum depth of the tree. Additionally, the minimum number of samples required to split the internal nodes is defined by the min_samples_split.

XGBoost is an optimized distributed gradient boosting library that implements state-of-the-art gradient boosting algorithms. The model’s learning and optimization procedures can be regulated by means of parameters such as n_estimatores, max_depth and min_child_weight.

LightGBM is another gradient-boosting framework that employs decision trees as a base learner. The maximum depth of each tree is controlled by max_depth and n_estimatores to regulate the number of learners.

ExtraTrees is a variation of random forest with an increased degree of freedom to explore the parameter space more effectively during the training process. The parameters are similar to those used in random forest.

SVM uses the RBF kernel function, while the other parameters are kept as default. MLP is a fully connected 3-layer perceptron, comprises 128, 64, and 32 hidden nodes, respectively. All of these models employ an implementation of scikit-learn, a widely used machine learning library in Python data science.

Model evaluation

To validate the accuracy of the pathology model in region identification, we carried out a comprehensive assessment using receiver operating characteristic (ROC) curves at patch level. The aggregation of patches into WSI was visualized for performance evaluation, which included predicted labels and probability heatmaps for the patches. For the diagnostic model, we utilized both micro and macro area under the curve (AUC) metrics to achieve a holistic evaluation of the model performance. Additionally, we employed the “One vs. Others” strategy to evaluate the AUC for each prediction class. Confusion matrices were also utilized to assess the model performance. For the prognostic model, we used AUC as the performance metric and calculating sensitivity and specificity. Furthermore, we compared the performance of single-slice and multi-slices fusion models using Delong’s test to measure significance. The study employed a range of software tools, including ITK SNAP v.3.8.0, and custom Python code written in Python v.3.7.12. Python packages used in the analysis included Pandas v.1.2.4, NumPy v.1.20.2, PyTorch v.1.8.0, Onekey v.2.2.3, OpenSlide v.1.2.0, Seaborn v.0.11.1, Matplotlib v.3.4.2, SciPy v.1.7.3, scikit-learn v.1.0.2, PyRadiomics v.3.0.

留言 (0)

沒有登入
gif