RN-Autoencoder: Reduced Noise Autoencoder for classifying imbalanced cancer genomic data

Evaluation of RN-Autoencoder

The performance of different classifiers is significantly affected by dimensionality reduction, resampling and other data pre-processing techniques. So, to evaluate the effectiveness and performance of RN-Autoencoder, many classifiers have been utilized for this purpose. These classifiers are Classification and Regression Tree (CART), RF, Gradient Boost (GB), Adaptive Boosting (AdaBoost), Extreme Gradient Boosting (XGBOOST), Gaussian Naïve Bayes (GNB), Kth Nearest Neighbours (KNN), Logistic Regression with Stochastic Gradient Descent (SGD-LR), Support Vector Machines with Radial Basis Function kernel (SVM-RBF), SVM with Linear kernel (SVM-Linear) and Linear Discriminant Analysis (LDA). These classifiers have been selected because they performed well on a variety of datasets. Each classifier has been used with its default settings in python libraries. Also, all evaluation experiments have been carried out on a machine with Windows 10 operating system, 8 GB RAM and Intel I5 processor.

RN-Autoencoder has been evaluated by comparing the performance of each classifier in four different scenarios. The first scenario is training the classifier using the original data without any feature reduction. The second scenario is training the classifier using the original data after pre-processing it using RN-SMOTE only. The third scenario is training the classifier using the extracted data after applying the autoencoder only. Finally, the fourth scenario is training the classifier with the data obtained the after pre-processing using RN-Autoencoder. All classifiers are trained in each case on the same training set and evaluated on the same test set for a fair evaluation of the classifiers, which is shown in Fig. 6. The performance of each classifier has been measured in terms of all metrics listed in Table 4.

Fig. 6figure 6

Flowchart of the evaluation of all the used classifiers

We build an autoencoder for each dataset. Each autoencoder is trained using the training set and evaluated using the reserved test set. All the parameters of each autoencoder were manually optimized by trial and error. With all datasets, the autoencoders use an exponential decay learning rate with an initial learning rate, decay steps and a decay rate value listed in Table 2. Furthermore, three datasets (colon, leukemia and DLBCL) use 100 epochs, while the WDBC and lung cancer datasets use only 50 epochs. The architecture of each autoencoder is saved with its weights and used later to convert the dimensionality of the reserved test set to the same dimensions as the training set. Figure 7 shows the autoencoder learning curves that resulted for all used datasets. Each curve shows the training loss versus the validation loss with increasing the number of epochs for each dataset. The figure shows the well-fitting of the autoencoder on the WDBC dataset, resulting in the minimum error among all datasets. Increasing the dimensionality of the dataset resulted in increasing the gap between the validation and training curves and hence increasing the loss, as shown in the colon, leukemia, Lung and DLBCL dataset curves.

Fig. 7figure 7

Autoencoder Learning Curves for all used datasets

Results for all datasets

In this section, the performance of classifiers on each dataset will be discussed. The performance metrics listed in Table 4 have been measured for each classier with each dataset in each of the three evaluation cases. The results for each dataset are summarized in a table containing only the results for classifiers that outperformed their performance with the original and extracted data when they were employed with RN-Autoencoder. The best results have been bolded in each table.

Colon dataset

This section discusses the RN-Autoencoder results for the colon dataset. All classifiers that resulted in an increase in their performance in terms of any metric with the colon dataset are listed in Table 5. The results showed that only GNB, AdaBoost and the GB classifiers gained performance when combined with RN-Autoencoder.

Table 5 Results summary for the colon dataset

The performance of the GNB Classifier has increased by 14.28, 16.67, 20, 18.18, 29.79, 30.12 and 15.63% in terms of test accuracy, precision, recall, F1, Kappa, MCC and GM scores when used with RN-Autoencoder than when used with original data. Also, it has increased by 21.43, 26.67, 40, 32.73, 48.65, 49.27 and 27.24% in terms of test accuracy, precision, recall, F1, Kappa, MCC and GM scores when used with RN-Autoencoder than when used with RN-SMOTE, while it has not increased when compared with extracted data.

The performance of the AdaBoost classifier has gained an increase of 7.14, 15, 6.67, 13.38, 14.077 and 4.72% in terms of test accuracy, precision, F1, Kappa, MCC and GM scores when used with RN-Autoencoder than when used with original data. However, there is no increase in the recall score. Also, it has increased by 14.28, 25, 12.12, 25.63, 26.03 and 9.78% in terms of test accuracy, precision, F1, Kappa, MCC and GM scores when used with RN-Autoencoder than when used with RN-SMOTE. Finally, when used with the extracted data, it has gained an increase of 7.14, 8.33, 20, 16.67, 19.45, 18.12 and 13.4% in terms of test accuracy, precision, recall, F1, Kappa, MCC and GM scores, respectively.

Finally, the GB classifier with RN-Autoencoder has succeeded in classifying colon cancer with 100% percentage in terms of all metrics. The performance of the GB classifier has gained an increase of 35.71, 50, 60, 55.56, 81.4, 81.14 and 44.22% in terms of test accuracy, precision, recall, F1, Kappa, MCC and GM scores when used with RN-Autoencoder than when used with original data. Also, it has increased by 35.71, 50, 40, 45.45, 74.47, 74.18 and 36.75% in terms of test accuracy, precision, recall, F1, Kappa, MCC and GM scores when used with RN-Autoencoder than when used with the RN-SMOTE. Finally, when used with the extracted data, it has gained an increase of 28.57, 33.33, 60, 50, 68.29, 66.27 and 40.37% in terms of test accuracy, precision, recall, F1, Kappa, MCC and GM scores, respectively.

Leukemia dataset

This section discusses the results of RN-Autoencoder with the leukemia dataset. All classifiers that improved their performance with the leukemia dataset in terms of any metric are listed in Table 6. The results showed that SVM-RBF, SGD-LR, KNN and LDA classifiers gained performance when combined with the RN-Autoencoder.

Table 6 Results summary for the leukemia dataset

The performance of the SVM-RBF classifier has gained an increase of 23.53, 57.14, 52.71, 56.89, 45.48 and 42.35% in terms of test accuracy, recall, F1, Kappa, MCC and GM scores when used with RN-Autoencoder than when used with both original data and RN-SMOTE. When used with the extracted data, it has gained an increase of 14.71, 35.71, 28, 34.3, 27.33 and 23.17% in terms of test accuracy, recall, F1, Kappa, MCC and GM scores. Also, The SVM-RBFBF classifier has maintained its precision score value at 100% for the different scenarios.

The performance of the SGD-LR classifier has gained an increase of 20.59, 50, 48.04, 50.34, 39.97 and 38.23% in terms of test accuracy, recall, F1, Kappa, MCC and GM scores when used with RN-Autoencoder than when used with original data.

Also, it has increased by 14.71, 35.72, 30.7, 35.1, 27.52 and 24.76% in terms of test accuracy, recall, F1, Kappa, MCC and GM scores when used with RN-Autoencoder than when used with RN-SMOTE. When used with the extracted data, it has gained an increase of 8.83, 21.43, 16.66, 20.58, 16.29 and 13.81% in terms of test accuracy, recall, F1, Kappa, MCC and GM scores. Also, the SGD-LR classifier succeeded in keeping its precision score value at 100% in the different cases.

The performance of the KNN classifier has gained an increase of 8.82, 5.56, 21.43, 19.57, 21.5, 18.52 and 15.43% in terms of test accuracy, precision, recall, F1, Kappa, MCC and GM scores when used with RN-Autoencoder than when used with original data. Also, it has increased by 2.94, 17.46, 3.66 and 6.74% in terms of test accuracy, precision, Kappa, MCC but decreased by 14.29, 1.86 and 1.91% in terms of recall, F1 and GM scores when used with RN-Autoencoder than when used with RN-SMOTE. When comparing the extracted data, it has gained an increase of 2.94, 14.28, 9.57, 8.21, 2.84 and 8.21% in terms of test accuracy, recall, F1, Kappa, MCC and GM scores and this performance has decreased by 11.11% in terms of the precision score.

Finally, the LDA classifier with RN-Autoencoder has succeeded in classifying leukemia cancer with 100% percentage in terms of all metrics. The performance of LDA classifier has gained an increase of 29.41, 71.43, 55.56, 68, 56.36 and 46.55% in terms of test accuracy, recall, F1, Kappa, MCC and GM scores when used with RN-Autoencoder than when used with original data.

Also, it has increased by 17.65, 25, 14.29, 20, 35.66, 35.2 and 17.19% in terms of test accuracy, precision, recall, F1, Kappa, MCC and GM scores when used with RN-Autoencoder than when used with RN-SMOTE. When used with the extracted data, it has gained an increase by 20.59, 18.18, 35.71, 28, 43.91, 42.89 and 23.94% in terms of test accuracy, precision, recall, F1, Kappa, MCC and GM scores.

DLBCL dataset

This section discusses the results for the DLBCL dataset. All classifiers that performed better according to any metric with the DLBCL dataset are listed in Table 7. The results showed that RF, CART, SVM-RBF, AdaBoost and XGBoost classifiers gained performance when combined with the RN-Autoencoder.

Table 7 Results summary for the DLBCL dataset

The performance of the RF classifier has gained an increase of 4.16, 16.66, 10.91, 13.24, 11.39 and 9.64% in terms of test accuracy, recall, F1, Kappa, MCC and GM scores when used with RN-Autoencoder than when used with both original data and RN-SMOTE. Finally, when used with the extracted data, it has gained an increase of 12.5, 50, 40.91, 45.38, 36.63 and 33.55% in terms of test accuracy, recall, F1, Kappa, MCC and GM scores. The RF classifier has preserved its precision score value at 100% for the different scenarios.

The CART classifier with RN-Autoencoder has succeeded in classifying cancer using the DLBCL dataset with 100% in terms of all metrics. Also, its performance has increased by 8.33, 25, 14.29, 20, 18.35 and 5.72% in terms of test accuracy, precision, F1, Kappa, MCC and GM scores when used with RN-Autoencoder than when used with both original data and RN-SMOTE. Finally, when used with the extracted data, it has gained an increase of 4.17, 16.67, 9.09, 11.76, 11.15 and 8.71% in terms of test accuracy, recall, F1, Kappa, MCC and GM scores. The CART classifier has achieved 100% precision in the cases of extracted data and the RN-Autoencoder.

The performance of the SVM-RBF classifier has gained an increase of 20.83, 100, 83.33, 90.91, 88.24, 88.85 and 91.29% in terms of test accuracy, precision, recall, F1, Kappa, MCC and GM scores when used with RN-Autoencoder than when used with original data. Also, it has increased by 8.33,33.33, 24.24, 28.24, 23.38 and 20.58% in terms of test accuracy, recall, F1, Kappa, MCC and GM scores when used with RN-Autoencoder than when used with RN-SMOTE. Finally, when used with the extracted data, it has gained an increase of 16.66, 66.66, 62.34, 65.16, 52.73 and 50.47% in terms of test accuracy, recall, F1, Kappa, MCC and GM scores, while keeping the precision score at 100% in the two cases.

The AdaBoost classifier with RN-Autoencoder has succeeded in classifying cancer using the DLBCL dataset with 100% percentage in terms of all metrics. AdaBoost performance has gained an increase of 4.17, 16.67, 9.09, 11.76, 11.15 and 8.71% in terms of test accuracy, recall, F1, Kappa, MCC and GM scores with no increase in precision scores when used with RN-Autoencoder than when used with original data, RN-SMOTE and extracted data. The AdaBoost classifier has maintained its precision score value at 100% for all the different scenarios.

Finally, the performance of the XGBoost classifier has gained an increase of 4.16, 16.66, 10.91, 13.24, 11.39 and 9.64% in terms of test accuracy, recall, F1, Kappa, MCC and GM scores when used with the RN-Autoencoder than when used with original data, RN-SMOTE and extracted data. The XGBoost classifier succeeded in keeping its precision score value at 100% in the different cases.

Lung (Michigan) dataset

This section discusses the results for the Lung dataset. All classifiers that resulted in an increase in their performance in terms of any metric with the Lung dataset are listed in Table 8. The results showed that GNB, GB and XGBoost classifiers gained in performance when combined with the RN-Autoencoder.

Table 8 Results summary for the lung dataset

For the GNB and GB Classifiers, their performance has increased by 5, 5.26, 2.7, 35.71, 31.18 and 29.29% in terms of test accuracy, precision, F1, Kappa, MCC and GM scores when used with RN-Autoencoder than when used with both original data and RN-SMOTE, while keeping the recall score at 100% in the two cases.

For the XGBoost Classifier, their performance has gained an increase of 5, 5.56, 2.86, 22.73, 20.65 and 2.82% in terms of test accuracy, recall, F1, Kappa, MCC and GM scores when used with RN-Autoencoder than when used with both original data and RN-SMOTE, while keeping the recall score at 100%.

GNB, GB and XGBoost classifiers have succeeded in classifying lung cancer with 100% percentage in terms of all metrics after pre-processing the dataset using both extracted data and RN-Autoencoder.

WDBC dataset

This section discusses the results for the WDBC dataset. All classifiers that enhanced their performance with the WDBC dataset in terms of any metric are listed in Table 9. The results showed that GNB, SVM-Linear, XGBoost and LDA classifiers gained an increase in performance when combined with the RN-Autoencoder.

Table 9 Results summary for the WDBC dataset

The performance of the GNB classifier has gained an increase of 4.77, 0.43, 0.277 and 1.18% in terms of recall, F1, Kappa and GM scores with a decrease of 4.62 and 0.12% in terms of precision and MCC and no change in test accuracy when used with RN-Autoencoder than when used with both original data and RN-SMOTE. Also, when used with the extracted data, it has gained an increase of 0.88, 0.18, 2.38, 1.33, 1.97, 1.93 and 1.24% in terms of test accuracy, precision, recall, F1, Kappa, MCC and GM scores respectively.

The performance of the SVM-Linear classifier has gained an increase of 4.76, 0.11, 0.08, 0.0707 and 1.01% in terms of recall, F1, Kappa, MCC and GM scores and a decrease of 4.55% in terms of the precision score when used with the RN-Autoencoder than when used with both the original and the extracted data. The SVM-Linear classifier has maintained its test accuracy value of 98.25% in different cases. Unfortunately, with the SVM-Linear classifier, RN-SMOTE exceeded RN-Autoencoder and gained an increase of 0.87, 2.22, 1.15, 1.85, 1.81 and 0.7% in terms of test accuracy, precision, F1, Kappa, MCC and GM scores than when used with RN-Autoencoder.

The performance of the XGBoost classifier has gained an increase of 0.88, 7.14, 1.47, 2.06, 1.811 and 2.3% in terms of test accuracy, recall, F1, Kappa, MCC and GM scores with a decrease in precision score of 4.65% when used with RN-Autoencoder than when used with original data. Also, it has gained an increase of 2.38, 0.08, 0.05, 0.06 and 0.51% in terms of recall, F1, Kappa, MCC and GM scores and a decrease of 2.21% in terms of the precision score when used with the RN-Autoencoder than when used with both the original and the extracted data. When used with the extracted data, it has gained an increase of 0.88, 0.11, 2.38, 1.23, 1.91, 1.93 and 1.19% in terms of test accuracy, precision, recall, F1, Kappa, MCC and GM scores.

Finally, the LDA classifier succeeded in classifying cancer using the WDBC dataset by 100% in terms of all metrics. The LDA performance has gained an increase of 3.51, 9.52, 5, 7.69, 7.42 and 4.88% in terms of test accuracy, recall, F1, Kappa, MCC and GM scores when used with RN-Autoencoder than when used with original data. Also, it has gained an increase of 4.39, 4.88, 7.14, 6.02, 9.47, 9.46 and 4.99% in terms of test accuracy, precision, recall, F1, Kappa, MCC and GM scores when used with the RN-Autoencoder than when used with both the original and the extracted data. When used with the extracted data, it has gained an increase of 2.63, 7.14, 3.7, 5.74, 5.58 and 3.64% in terms of test accuracy, recall, F1, Kappa, MCC and GM scores. The LDA classifier kept the precision score value at 100% for the different scenarios except with RN-SMOTE.

Table 10 summarises which classifiers can perform well on each dataset while also indicating the dimensionality and imbalance ratios of each dataset. RN-Autoencoder enhances the performance of different machine learning classifiers for cancer classification based on high dimensional imbalanced gene expressions datasets. In addition, RN-Autoencoder performs well with different classifiers on cancer subclinical datasets like WDBC dataset.

Table 10 The improved classifiers using RN-Autoencoders on different datasetsComparison with current state of art

In this section, we compare the performance of RN-Autoencoder with the performance of the work done in the recent studies mentioned in the related work section using the colon, leukemia, DLBCL and WDBC datasets that are mainly used to evaluate RN-Autoencoder. These studies are Pandit et.al [44], Devendra et al. [28], Menaga et al. [30], Uzma et al. [45], Majumder et al. [33], Bustamam et al. [35], Samieinasab et al. [47], Singh et al. [48] and Bacha et al. [50]. Also, with each comparison we use only the metrics and datasets used by the comparative study.

By comparing the performance of RN-Autoencoder with the best results obtained by FBBO + CNN [28], RN-Autoencoder outperformed it by 2% in terms of the test accuracy, recall, precision and F1 scores for both the leukemia and colon datasets.

Also, when compared to Wavelet + CNN [44], RN-Autoencoder outperformed it by 1.45, 2.05, 1.57 and 1.95% in terms of the test accuracy, recall, precision and F1 respectively for the colon dataset. Also, for the leukemia dataset, RN-Autoencoder outperformed by 2.3, 2.87, 2.19 and 2.76% in terms of test accuracy, recall, precision and F1 scores respectively. Figure 8 shows the superior performance of the RN-Autoencoder when compared to both FBBO + CNN and Wavelet + CNN.

Fig. 8figure 8

RN-Autoencoder versus FBBO + DCNN and Wavelet + CNN. (A) colon dataset. (B) leukemia Dataset

When comparing RN-Autoencoder to the results of FASO-DEEP-RNN introduced by Menaga et al. [30], RN-Autoencoder outperformed it by 7.13% in terms of the test accuracy with the colon dataset and 7.18% in terms of the test accuracy with the leukemia dataset. Figure 9 shows this comparison.

Fig. 9figure 9

RN-Autoencoder versus FASO-RNN

When comparing RN-Autoencoder to the best results obtained by the work done by Majumder et al. [33] using the colon dataset, we find that RN-Autoencoder outperformed it by 16, 13 and 13% in terms of test accuracy, precision and F1 scores respectively as illustrated in Fig. 10.

Fig. 10figure 10

Contrasted with Gene-encoder [45], the SVM classifier with RN-Autoencoder outperformed SVM with Gene Encoder by 1.71 and 1.18% in terms of accuracy with colon and leukemia datasets respectively, but lagged by 1.17% for the DLBCL dataset. For the KNN classifier, RN-Autoencoder outperformed Gene-encoder by 18.017, 19.183 and 0.67% for colon, leukemia and DLBCL datasets, respectively. Also, for the RF classifier, RN-Autoencoder outperformed Gene Encoder by 18.62 and 18.58% with leukemia and DLBCL datasets respectively, but lagged by 2.2% with the colon dataset. Since some other classifiers with RN-Autoencoder scored 100% in terms of test accuracy, when this result was compared with the Gene-encoder, RN-Autoencoder outperformed it by 16, 10 and 3% with the colon, leukemia and DLBCL datasets respectively. Figure 11 draws this performance comparison.

Fig. 11figure 11

RN-Autoencoder versus Gene Encoder using colon, leukemia and DLBCL datasets: (A) SVM Classifier. (B) KNN Classifier. (C) RF Classifier. (D) Best results

When comparing RN-Autoencoder to the work done in [35] using the lung (Michigan) dataset, we found that the RN-Autoencoder outperformed it by 2% in terms of the accuracy. Also, Fig. 12 shows this comparison.

Fig. 12figure 12

Accuracy of RN-Autoencoder vs SVM-RFE-ABC using Lung (Michigan) dataset

When comparing RN-Autoencoder to the Meta Health Stack introduced by Samieinasab et al. [47] using the WDBC dataset, we found that the RN-Autoencoder outperformed it by 1.8%, 1.5%, 3.2% and 2.4% in terms of test accuracy, precision, recall and F1 scores respectively. Also, when comparing RN-Autoencoder to the DE-RBF-KELM introduced by Bacha et al. [50] using the WDBC dataset, we found that the RN-Autoencoder outperformed it by 8.87% in terms of test accuracy. Finally, when comparing RN-Autoencoder to the best result obtained by the hybrid work done by Singh et al. [48], we found that the RN-Autoencoder outperformed by 2.34% in terms of test accuracy. Figure 13 shows the comparison between RN-Autoencoder and the mentioned models using the WDBC dataset.

Fig. 13figure 13

RN-Autoencoder vs other models using WDBC dataset

The proposed RN-Autoencoder achieves accurate and precise cancer diagnosis using many imbalanced gene expression datasets. Its performance outperformed the performance of many different recent works. This enhancement mainly depends on the non-linear transformation of the gene expressions using the autoencoder. This is besides the oversampling and noise handling using RN-SMOTE. This combination of steps handles the curse of dimensionality, class imbalance and noise problems that existed in the used datasets and had a bad impact on the performance of many classifiers in recent studies. So, RN-Autoencoder with this sequence of steps improves the performance of many different classifiers in terms of various classification metrics compared to earlier proposals.

留言 (0)

沒有登入
gif