Deep learning-based multi-model approach on electron microscopy image of renal biopsy classification

Renal biopsy samples description

The 910 images are from 319 renal biopsies during the period from August 2017 to June 2019. The renal needle-core biopsies in this paper are obtained from 319 patients from Children’s Hospital of Nanjing Medical University in China. All of the patients are children whose ages are younger than 18.

We select 319 patients from 1252 patients based on the following standard. Firstly, we choose the specimens from patients who have been clearly diagnosed as immune-mediated renal disease, combined with the results of renal biopsy and clinical symptoms. Secondly, among the electron microscope images of these patients, we select the clear images in which meaningful parts are taken, such as the electron microscope images that preferably contain glomeruli, and electron-dense granule is present in the TEM images. These images are labeled as “positive”. Thirdly, we choose the specimens from patients who are not diagnosed with immune-mediated renal disease, combined with the results of renal biopsy and clinical symptoms. And the electron-dense granule is not present in the TEM images. These images are labeled as “Negative”. According to this standard, 319 renal biopsy specimens were selected from 1252 patients.

The preparation of renal biopsy specimens was done according to international standards [11]. We got the image under the 2 μm ruler.

The above-mentioned TEM images are diagnosed by two experienced renal pathologists. One of the pathologists from the General Hospital of Eastern Theater Command gave the result, and then the other pathologist from the Children’s Hospital of Nanjing Medical University reviewed the result again. If the results of the two pathologists disagreed, our final solution was not to use this sample. In this study, the total number of TEM images with labels is 910. Of the 910 TEM images, 455 images are labeled as “Positive” which means that Electron dense granule is present in the TEM image, and the rest of 455 images are labeled as “Negative” which means Electron dense granules are not present in the TEM image.

All of the renal biopsy samples used in this study have passed the ethical review. And the Approval number is 202008074–1.

Deep learning-based multi-model architecture

This novel algorithm is mainly about developing a multi-model that combines two single deep learning models into one model to further improve classification accuracy. The first single model is the improved ResNet + SVM model, and the second single model is the conventional ResNet model.

Conventional ResNet50

The ResNet residual component has two types of blocks. One is Identity Block, and the other is Convolutional Block.

$$\mathrm\left(\mathrm\right) =\mathrm\left(\mathrm,\left\\right\}\right) +\mathrm.$$

(1)

where x and H(x) are the input and output vectors of the layers. F(x, ) is the residual mapping function. represents the convolutional layer weights. x is added to the result of F by the shortcut connection path.

Figure 1 illustrates the structure and detailed process of the Identity Block. Firstly, x is the input vector, and x has two paths to go. One is the main path where x will be transformed by the residual mapping function F. In the main path, F(x,) is performed. The main path usually consists of two typical weight layers. One weight layer is composed of a 2D convolutional layer and a Batch-Norm layer which will conduct the Normalization of the channel axis. After the input x multiplied by the weights in the first layer, the nonlinear activation function ‘ReLU’ will be used. And then the result will be into the next weight layer. At the same time, the input x will be fed into the other path called the shortcut path. In the shortcut connection, x will be directly added to the result from the main path. Finally, the combined result H(x) will be applied to the ReLU activation function.

$$\mathrm\left(\mathrm\right) =\mathrm\left(\mathrm, \left\\right\}\right) +}_}\mathrm.$$

(2)

where x and H(x) are the input and output vectors of the layers. F(x, ) is the residual mapping function. Different from Identity Block, in the shortcut path, the input x will also be transformed by the 2D convolutional layer in the Convolutional Block. Ws denotes the weights of the convolutional layer in the shortcut path.

Fig. 1figure 1

Figure 2 shows the structure and process of the Convolutional Block. The only difference from the Fig. 1 is the shortcut path. In Identity Block, the input x is directly added to the result of the main path. However, the input x will be fed into a weight layer then the result is combined with the main path.

Fig. 2figure 2

Convolutional Block Structure

Improved ResNet + SVM model with transfer learning

In this paper, the conventional ResNet model is improved by adding a skip architecture and replaced with an SVM classifier.

ResNet model is the state-of-the-art image classification model. The deeper layer can increase the CNN model accuracy, but it brings a gradient vanishing problem [12]. ResNet model could solve the gradient vanishing problem because of residual network design [13]. As a result, the residual component and very deep layer of ResNet could ensure that the model can extract sufficient image features.

Improved Resnet with skip architecture

As our task is the histopathologic classification of TEM images, it has higher requirements for feature extraction function compared with traditional image classification problems. We improved the feature extraction structure of the ResNet model. As addressed in [14], the deeper layer net tends to extract the global information of an image, since its receptive field is large. Whereas the features extracted by the shallow layer net are partial information of an image, and it focuses on more detailed geometry information of the partial area of an image. In the conventional ResNet50 feature extraction structure, there are about 49 convolutional layers in sequence, so the final extracted features are from the very deep layer. As a result, the final features contain more global and coarse information about the image. However, in our task of classifying electron-dense granules in the TEM image, pathologists pay much attention to partial area information of TEM image by their visual estimations. This requires the extracted features can also contain fine information about the partial area. We developed the skip architecture to combine the deep, coarse layer information with shallow, fine layer information. Figure 3 shows the improved model structure.

(1)

Firstly, the input image will be resized to 224 × 224x3, and then passed through the stage1. Stage 1 is composed of a 2D Convolution which has 64 kernels with a shape size of (7 × 7) and stride of (2,2), a Batch-Normalization layer, ReLU activation function, and a MaxPooling with the kernel size of (3X3) and stride of (2,2).

(2)

Secondly, the result from stage 1 will pass through stage2. Stage2 consists of a Convolutional Block and two Identity Blocks. For each block, 3 convolution layers are stacked one over the other where the first layer has 64 filters with a kernel size of (1 × 1), the second layer has 64 filters with a kernel size of (3 × 3), and the last layer has 256 filters with a kernel size of (1 × 1). Since the features extracted by stage2 contain fine and partial information about the image, the output of the stage2 will have two paths, the first path is into stage 3 and pass-through stage 4 and stage 5 in sequence, and the second path will be combined with the final features from stage5 which contain coarse and global information of the image.

(3)

Thirdly, for stage 3, stage 4, and stage 5, the model structure is similar. Stage 3 has one Convolutional Block and three Identity Blocks. Stage 4 has one Convolutional Block and five Identity Blocks. Stage 5 has one Convolutional Block and two Identity Blocks. Each block has 3 convolution layers stacked. The filter size of the three layers is (1 × 1), (3 × 3), (1 × 1) respectively. For stage 3, stage 4 and stage 5, the filters are corresponding [128,128,512], [256, 256, 1024] and [512, 512, 2048].

(4)

Finally, the features from stage 5 and stage2 will be concatenated. As a result, the concatenated features will contain global information as well as fine information of the image and then flatten to one dimension.

Fig. 3figure 3

The structure of the Improved ResNet + SVM Model: In the feature extraction part, it combines fine information and coarse information. In the classification part, an SVM classifier is used

Improved ResNet 50 combined with SVM model

In the improved ResNet 50 model, the Feature extraction component has been changed to get more accurate features. However, in the classification component of the conventional ResNet model, one fully connected layer with the “softmax” function is used for classification. Support Vector Machine is a classical two-binary classifier. Our task is two binary classification problems, so we will try to combine the improved Resnet50 feature extraction part with the SVM classifier.

Support Vector Machine (SVM): The goal of SVM is to find an optimal hyperplane that best separates two-class datasets so that distance from the nearest data points in space is maximized. This non-linear optimization problem can be transformed into a dual problem by the Lagrange method.

$$\mathrm: Max\mathrm\left(\mathrm\right)=\sum_^_-\frac\sum_^____<\varphi \left(_\right)\bullet \varphi \left(x\right)>$$

$$=\sum_^_-\frac\sum_^____K(_,x)$$

$$\mathrm.\mathrm\sum_^__=0, 0\le _\le C;i=\mathrm,\dots ,l$$

(3)

where \(\mathrm\) is the Lagrange multiplier, \(\mathrm\) is the support vector, C is the penalty factor, \(<\varphi \left(_\right)\bullet \varphi \left(x\right)>\) is the nonlinear kernel function, and x is the input dataset.

Multi-model

Although deep learning models can greatly improve prediction accuracy, they still have errors. Model errors are affected by many uncertainties from various sources, such as the observation data noise, and model structural deficiencies [15]. Recently, some works about model uncertainty in deep learning have been published [16,17,18]. One way to reduce the model uncertainty is a multi-model approach [19]. The multi-model approach is to combine predictions from multiple models. This idea was explored more than 40 years ago with some studies in econometrics and statistics [20,21,22,23].

Ajami et al. [24] proposed a new scheme that seeks to obtain a consensus from a combination of multiple model predictions, so that one model’s output errors can be compensated by others’ in hydrological model prediction. One of the combination techniques is to use the deterministic weights to combine multiple model outputs [25]. The weighting strategy typically tries to give higher weights to the better-performing models. This approach can produce consensus predictions that are better than those from a single model [26, 27].

However, in the computer vision field, to our knowledge, this is the first work to develop the multi-model scheme with ANN weighting strategy to further improve the deep learning models' accuracy.

ANN Based multi-model scheme

The simulated output from the individual model mostly has a nonlinear relationship with the ground truth. While Neural Network Method can model complex non-linear relationships, particularly in situations where the explicit form of the relation between the variables involved is unknown. The NNM can integrate information from physically different sources.

Figure 4 shows the ANN Based Multi-model algorithm scheme. The original input image will be fed into two models respectively. One model is the improved ResNet + SVM model, and the other is the conventional model. The input layer has two neurons, the first input is the prediction from the improved ResNet + SVM model, and the second input neuron is output from the conventional ResNet model. There is one hidden layer and an output layer. In the output layer, the neuron means the ground-truth label.

Fig. 4figure 4

Deep Learning-Based Multi-model Approach Architecture

Dataset for model

We divided 910 images into a developing cohort and a validation cohort in this study. We divided 910 images into two cohorts. The developing cohort for the training model, and the validation cohort for calculating and comparing the model accuracy. The 910 images are divided into three parts: one part contains 374 images, another part contains 352 images, and the other part contains 184 images. In the developing cohort, 374 images are used for training the improved ResNet + SVM model and the traditional ResNet model respectively. In the developing cohort, 352 images are used for training the multi-model. And the rest 184 images are the validation cohort for calculating and comparing the accuracy of these models.

Model performance evaluation

For the model performance evaluation, since it is a classification model, we referred to Jake Lever et al. [28] which is published in Nature Methods. And in the Artificial intelligence industry, Recall score, precision, F1 score, and ROC curve are the mainstream evaluation indicators for classification problems. In Jake Lever et al., classifiers are commonly evaluated using either a numeric metric, such as precision, or a graphical representation of performance, such as a receiver operating characteristic(ROC) curve. Jake et al. shows that classification metrics are calculated from true positives(TPs), false positives(FPs), false negatives(FNs), and true negatives(TNs), all of which are tabulated in the so-called confusion matrix. A confusion matrix is a table that is often used to describe the performance of a classification model on test data. In the confusion matrix, each column represents the model prediction for each category, and each row means the actual label for each category.

For this binary classification problem, when one instance is predicted by the model, four situations are as follows:

(1)

If the true label of the instance is “Positive” and is predicted as a “Positive” class by the model, it is a true class, called “True Positive”, and marked as TP;

(2)

If the true label is “Positive”, while is predicted as “Negative” class by the model, called “False Negative”, and marked as FN;

(3)

If the true label is “Negative”, while is predicted as a “Positive” class by the model, called “False Positive”, and marked as FP;

(4)

If the true label is “Negative”, and the model result is also “Negative”, called “True Negative”, and marked as TN.

In this paper, “Positive” means Electron dense granules are present in the TEM image. And “Negative” means Electron dense granules are not present in the TEM image.

Recall score

Recall score measures how many “Positive” samples are predicted by the model as “Positive”. The recall score expresses the model's ability to find all positive instances which are EDD present in the TEM in the dataset. Recall score measures whether the model omits the true positive instances. The closer the recall score is to 1, the higher the accuracy of the model prediction. The formula of the recall score is:

Precision

Precision indicates the proportion of “Positive” cases that are divided into “Positive”. Precision shows how much of the model predicted positive was correct. Precision measures the accuracy of the model in determining ‘positive’. The closer the precision is to 1, the higher the model accuracy is.

Precision is calculated as:

F1 Score

The precision and recall score sometimes have contradictions, so they need to be considered comprehensively. The most popular classification evaluation is the F1 score, and it is the balance between recall and precision. It can measure the model prediction accuracy comprehensively. The value of the F-score is between 0 and 1. The closer to 1, the higher the model accuracy is.

The formula of F-Score is:

Considering that the image sample size is very small in this study, it is difficult to improve the model accuracy. As a result, we think that a 5% model accuracy improvement is significant.

ROC Curve

ROC curve is short for the receiver operating characteristic curve. Each point on the ROC curve reflects the susceptibility to the same signal stimulus. The horizontal axis of the ROC curve is the specificity of the false positive rate FPR, and the vertical axis is the sensitivity of the true positive rate TPR. AUROC is the area under the ROC curve. It is used to evaluate the model's accuracy. And the closer it is to 1, the higher the model accuracy is. Among them, the formula for calculating FPR and TPR are as follows respectively:

留言 (0)

沒有登入
gif