Tomography, Vol. 8, Pages 2749-2760: Estimation of Cardiac Short Axis Slice Levels with a Cascaded Deep Convolutional and Recurrent Neural Network Model

Conceptualization, Y.-C.K.; methodology, N.H.; software, N.H.; validation, N.H. and Y.-C.K.; formal analysis, Y.-C.K.; investigation, N.H.; resources, N.H.; data curation, N.H.; writing—original draft preparation, N.H. and Y.-C.K.; writing—review and editing, Y.-C.K.; visualization, Y.-C.K.; supervision, Y.-C.K.; project administration, Y.-C.K.; funding acquisition, Y.-C.K. All authors have read and agreed to the published version of the manuscript.

Figure 1. An overview of the proposed method. The input consists of a series of cardiac short axis slice images, which range from the out-of-apical slice level to the out-of-basal slice level. The slice levels were labeled as ‘oap’ for the out-of-apical slice, ‘ap’ for the apical slice, ‘mid’ for the mid-level slice, ‘bs’ for the basal slice, and ‘obs’ for the out-of-basal slice. For each slice image, a pre-trained deep CNN model is used as a feature extractor. The features are denoted by xi for i = 1, 2, …, N, where N is the number of slices. The RNN model takes a series of features (i.e., x1, x2, …, xN) as input and produces probability scores for each slice level via softmax. The red numbers indicate the maximum values in the output prediction scores.

Figure 1. An overview of the proposed method. The input consists of a series of cardiac short axis slice images, which range from the out-of-apical slice level to the out-of-basal slice level. The slice levels were labeled as ‘oap’ for the out-of-apical slice, ‘ap’ for the apical slice, ‘mid’ for the mid-level slice, ‘bs’ for the basal slice, and ‘obs’ for the out-of-basal slice. For each slice image, a pre-trained deep CNN model is used as a feature extractor. The features are denoted by xi for i = 1, 2, …, N, where N is the number of slices. The RNN model takes a series of features (i.e., x1, x2, …, xN) as input and produces probability scores for each slice level via softmax. The red numbers indicate the maximum values in the output prediction scores.

Tomography 08 00229 g001

Figure 2. Learning curves for loss (top) and accuracy (bottom) when EfficientNetB0 was used as a baseline CNN architecture. Solid lines represent validation results, while dashed lines represent training results. Loss curves for EfficientNetB0 were not plotted because their loss value ranges were well above those for EfficientNetB0_2-LSTM and EfficientNetB0_2-GRU.

Figure 2. Learning curves for loss (top) and accuracy (bottom) when EfficientNetB0 was used as a baseline CNN architecture. Solid lines represent validation results, while dashed lines represent training results. Loss curves for EfficientNetB0 were not plotted because their loss value ranges were well above those for EfficientNetB0_2-LSTM and EfficientNetB0_2-GRU.

Tomography 08 00229 g002

Figure 3. A screenshot of slice level predictions on a series of short axis slices on an individual test subject. The prediction was made by the ‘MobileNet_2layerLSTM’ model. The slice index, the ground truth, and the predicted category are shown above each image. The green text indicates correct classification, while the red text indicates incorrect classification. For example, ‘(BS) (P = OBS)’ indicates that the ground truth is ‘basal’, and the model’s prediction is ‘out-of-basal’.

Figure 3. A screenshot of slice level predictions on a series of short axis slices on an individual test subject. The prediction was made by the ‘MobileNet_2layerLSTM’ model. The slice index, the ground truth, and the predicted category are shown above each image. The green text indicates correct classification, while the red text indicates incorrect classification. For example, ‘(BS) (P = OBS)’ indicates that the ground truth is ‘basal’, and the model’s prediction is ‘out-of-basal’.

Tomography 08 00229 g003

Figure 4. Test prediction results from the CNN-alone models (a,b) and from the CNN-RNN models (c,d). In either MobileNet or NASNetMobile, the CNN-RNN model has smaller sums of the elements out of the tridiagonal entries (SOTD) than the CNN-alone model. The elements out of the tridiagonal entries are indicated by the red contours in the confusion matrices. For example, SOTD is 11 for the case of MobileNet_2-LSTM in (c).

Figure 4. Test prediction results from the CNN-alone models (a,b) and from the CNN-RNN models (c,d). In either MobileNet or NASNetMobile, the CNN-RNN model has smaller sums of the elements out of the tridiagonal entries (SOTD) than the CNN-alone model. The elements out of the tridiagonal entries are indicated by the red contours in the confusion matrices. For example, SOTD is 11 for the case of MobileNet_2-LSTM in (c).

Tomography 08 00229 g004

Figure 5. Comparison of SOTD values across deep learning models. SOTD was calculated to be the sum of the elements that are outside the tridiagonal entries. SOTD values were grouped by the type of the baseline deep CNN. The lower the SOTD value, the higher the prediction performance in general.

Figure 5. Comparison of SOTD values across deep learning models. SOTD was calculated to be the sum of the elements that are outside the tridiagonal entries. SOTD values were grouped by the type of the baseline deep CNN. The lower the SOTD value, the higher the prediction performance in general.

Tomography 08 00229 g005

Table 1. The numbers of subjects in training, validation, and testing groups.

Table 1. The numbers of subjects in training, validation, and testing groups.

TrainingValidationTestingTotalNumber of subjects576214184974Percentage (%)59.122.018.9100

Table 2. The numbers of samples in training, validation, and testing groups.

Table 2. The numbers of samples in training, validation, and testing groups.

Model Type TrainingValidationTestingTotalCNN *Number of samples12,0704594386820,532Percentage (%)58.822.418.8100CNN-RNN **Number of samples11524283681948Percentage (%)59.122.018.9100

Table 3. The number of images for each class label in training and validation datasets.

Table 3. The number of images for each class label in training and validation datasets.

Class LabelTotal oapapmidbsobsTrainingNumber of images1878278428002132247612,070Percentage (%)15.523.123.217.720.5100ValidationNumber of images710108611147519334594Percentage (%)15.523.624.216.420.3100

Table 4. The comparison of base deep CNN models.

Table 4. The comparison of base deep CNN models.

CNN Base NetworkNumber of Model ParametersImageNet Top-1 AccuracyNumber of Features after GAP *Batch Size for CNNBatch Size for CNN-RNNEfficientNetB05.3 M77.1%1280322MobileNet4.2 M70.6%1024322NASNetMobile5.3 M74.4%1056322ResNet50V225.6 M76.0%2048162

Table 5. Prediction performance of a variety of deep learning models. The bold indicates the highest value among the models.

Table 5. Prediction performance of a variety of deep learning models. The bold indicates the highest value among the models.

CNN Base
NetworkRNN TypeF1-ScoreAUC *AccuracyoapapmidbsobsCNNMobileNet-0.7590.7170.7710.7400.9020.9570.779ResNet50V2-0.7480.6680.7260.6880.8680.9440.740NASNetMobile-0.6800.5850.6250.5700.7900.9040.650EfficientNetB0-0.7610.6960.7290.7160.8890.9460.757CNN-RNNMobileNet2-LSTM0.8110.7830.8270.8000.9180.9720.829Bi-LSTM0.8250.7790.8120.7930.9220.9720.8272-GRU0.8080.7690.8140.7850.9090.9700.817Bi-GRU0.8190.7840.8010.7840.9070.9700.819ResNet50V22-LSTM0.7590.7630.8040.7590.9040.9660.801Bi-LSTM0.8210.7810.7820.7690.9080.9680.8122-GRU0.7810.7720.7880.7210.8820.9630.791Bi-GRU0.8160.7460.7550.7580.9090.9620.796NASNetMobile2-LSTM0.7710.7130.7330.6830.8610.9520.753Bi-LSTM0.8090.7130.7720.7110.8740.9600.7772-GRU0.7380.7210.7400.6670.8530.9470.746Bi-GRU0.8060.7470.7700.7120.8690.9580.780EfficientNetB02-LSTM0.8050.7720.8000.7770.9010.9670.811Bi-LSTM0.8270.7720.8000.7640.9040.9690.8142-GRU0.8110.7630.7930.7640.9090.9650.808Bi-GRU0.8220.7850.8010.7670.9100.9690.817

留言 (0)

沒有登入
gif