Analysis of computer-aided diagnostics in the preoperative diagnosis of ovarian cancer: a systematic review

This systematic review shows numerous studies that use CAD to assess the nature of an ovarian tumor. Due to the large heterogeneity, we were not able to pool data. However, highest performance as measured by AUC was seen in both CT- and MRI-based CAD models.

A meta-analysis was endeavored for the seven studies that used an external dataset for validation. However, this could not be executed for multiple reasons. One study, describing a CAD-MRI model for differentiating borderline from malignant ovarian tumors, only mentioned the sensitivity and specificity for radiologists’ performance and for the model only the AUC [61]. Another study was unclear about which data were used to calculate the diagnostic performance of their model [56]. Consequently, for both studies it was not possible to calculate diagnostic accuracy rates, such as true positive (TP), true negative (NT) values and to use them in the meta-analysis.

For the five remaining studies, heterogeneity proved to be too large with an I2 of 92.8% and 90.7%. In an additional subgroup analysis of only ultrasound CAD models, this was also apparent with an I2 of 94.3% and 83.5%. These analyses can be found in Additional file 1: Appendix 3. This heterogeneity can be explained by (1) different types of CAD models using either conventional or deep learning techniques, (2) different inclusion and exclusion criteria and (3) type of imaging modality used. Among the twenty-four studies without an independent dataset, pooling of the results was not viable since the data were too diverse. This was illustrated by differences in imaging techniques used, e.g., 2D or 3D ultrasound and CT, or 2D, 3D or pharmacokinetic MRI. Furthermore, different CAD techniques were applied, e.g., conventional and deep learning machine learning models. Moreover, some studies combined clinical features such as patients’ age, menopausal status or serum CA125 to support the classifiers. Finally, different outcome measurements per classifier were found, such as benign, malignant and borderline in combination with a different tumor subtype, such as mucinous ovarian tumors.

All studies assessed computer-aided diagnostics based on machine learning. We found that classifying the nature of an ovarian tumor by CAD on ultrasound images results in sensitivities of 40.3% to 100% and specificities of 76.3% to 100%. For CT, sensitivities of 84.6% to 100% and specificities of 69% to 100% were described. For MRI, sensitivities and specificities ranged between 66.7% and 100% and 77.8% and 100%, respectively. Even though some studies report high performances, they are at risk for overfitting due to the lack of an independent test set. Twenty-three studies lacked an independent test set for evaluating model performance.

With conventional machine learning techniques, features extracted from medical imagery are used to optimize a mathematical model for predicting new, unseen data. A model should be built based on a training set of images and validated in a test set. If the model is too tightly fitted to the training data and does not generalize toward new data, it is called overfitting. Overfitting occurs more often with conventional machine learning, where many parameters are hand-selected instead of being learned from the data, especially when the model is not validated on an independent test set [64].

Ultrasound

Earlier published studies assessing ultrasound prediction models show reasonable sensitivity (72–77%) and specificity (85–89%) for the RMI [65, 66]. An external validation of the IOTA ADNEX model showed a better performance, with a sensitivity of 98% (95% CI 93–100%), but with low specificity of 62% at a cutoff value for malignancy of 10% (95% CI 55%–68%) [5]. The GI-RADs and the O-RADs perform better with a sensitivity of 92.7% and 93.6% and a specificity of 97.5% and 92.8%, subsequently [6]. However, all these models depend on specific terminology and expertise of their users. Furthermore, interpretation of ultrasound imaging regarding ovarian tumors has shown to be difficult for novel clinicians and for clinicians who do not perform ultrasonography on a regular basis [8, 9]. Based on the amount of studies included in this review assessing the CAD technique for ultrasound, CAD can be a promising tool to aid clinicians in determining the origin of ovarian tumors. Moreover, when comparing CAD models’ performances with experienced clinicians or existing models they achieve similar or even better diagnostic accuracy. Nevertheless, this performance comparison was performed in only three studies. Even though overfitting mitigation techniques were applied in twenty-one ultrasound studies, only four studies used external validation. Thus, a high risk of overfitting is present, which could lead to an unreliable performance.

CT

The diagnostic performance of CT in preoperatively classifying the origin of an ovarian tumor is primary known for multidetector computer tomography (MDCT), with a diagnostic accuracy of 90–93% [11]. Therefore, no fair comparison on CAD for CT can be made. However, the performance of CAD for CT is indeed promising based on the included studies in this review. The models show a high diagnostic accuracy and low selection bias. Nonetheless, only three studies in total assessed CAD for CT of which only one study utilized an independent validation, thus risking overfitting.

For CAD on CT scans, more research is needed to further evaluate its potential benefits.

MRI

The diagnostic accuracy for MRI in ovarian tumor classification has a sensitivity and specificity of 96% and 91%, respectively [7, 9]. For the O-RADs MRI score, this is comparable with a sensitivity of 93% and a specificity of 91% and it shows a similar performance among junior and senior radiologists (κ = 0.784; 95% CI, 0.743–0824) [9, 10]. CAD for MRI as an additional diagnostic method for ovarian tumors has the potential to aid radiologists due to its high diagnostic performance as a single model or when compared to SA of radiologists. However, caution is needed when using MRI-CAD as a supplementary tool. First, due to the absence of international guidelines when to conduct an MRI for ovarian tumors classification a selection bias is being created. Moreover, the performance of the MRI has no further clinical consequences for the patient. However, if radiologists are trained with MRI O-RADs classification model, the usage of MRI can have an additional beneficial effect on ovarian tumor classification, especially when classifying benign and or possibly malignant lesions [67]. However, for the O-RADs MRI familiarity and expertise are essential to use the scoring system [7, 10].

Second, only one out of six studies showed a low overall risk of bias on using MRI CAD [59]. Unfortunately, the authors did not compare their CAD to ovarian tumor characterization by radiologists or to other models, such as the O-RADs model. Hence, one study alone cannot support clinical implementation of MRI CAD. Moreover, although in three studies CAD outperformed the radiologists’ performance, no external validation sets were used in these studies and risk of bias was mostly unclear [58, 59, 60, 61]. Furthermore, only two of the six studies used an external validation set [59, 61]. Another study used 3D MRI for their model building, showing good results; however, this is a rather expensive MRI technique [57]. Finally, two studies used the same dataset. Therefore, only limited evidence to support the usage of MRI-CAD additionally is available [59, 63].

Hence, more studies should be undertaken with external validation sets in order to be able to implement these CAD-MRI models in clinical practice.

Trends among publications

Over the last three decades, different trends among included studies in the CAD field are observed.

An increasing number of publications presented clear inclusion and exclusion criteria for data before using it to construct a CAD model [33,34,35,36,37,38,39, 44, 48,49,50, 52, 54,55,56,57,58,59,60,61,62,63]. In addition, more studies used statistical tests to select the most promising features to include into the CAD model and in most articles this was precisely described [34, 37, 41, 43,44,45,46, 48,49,50,51, 54,55,56,57,58, 60,61,62,63]. Furthermore, study cohorts became substantially larger [33]. Finally, clinicians are more involved in the CAD model construction, e.g., for the delineation of the images. Thus, uniformity among studies is improved, making studies more comparable.

Regarding the outcomes, almost all studies used the same outcome measurements, i.e., sensitivity, specificity, accuracy and area under the curve (AUC). More connection with the clinical setting is observed. In particular, the comparison of the CAD model to either assessment of scans by clinicians such as radiologists, sonographists or gynecologists or to commonly used models in ultrasound (RMI or LR1-2) is now included [33, 35,36,37,38, 48, 51, 60, 61, 68].

Hence, the difficult technical matter of a CAD model development is made more comprehensible for clinicians.

Finally, more deep learning models have been developed in recent years, showing the potential of this new type of CAD. If these trends continue, it could substantially contribute to patient care.

Previous studies have shown that depending on the imaging technique used the interobserver agreement is low for many features and are prone to contain significant measurement errors when used by inexperienced clinicians. Therefore, more uncertainties in measured features within these imaging techniques can lead to diminished accuracies of a model. It is therefore important to develop new techniques with less inter- and intra-observer variability to reach higher test performances to prevent unnecessary referrals to tertiary centers and unnecessary stress for the patient. Based on this literature review, computer-aided ultrasound, CT and MRI techniques based on different (deep) neural networks and conventional machine learning techniques such as support vector machines are promising. They can either be used as a single entity or combined with SA or with other prediction models. They could potentially offer a noninvasive and cost-effective method in the future. However, this is only shown in eight studies of which five are ultrasound studies and three MRI studies. Of these studies, four used independent validation sets, of which three within ultrasound CAD and one within an MRI CAD. For the remaining studies, lack of a validation cohort might cause a high risk of overfitting. The CT CAD models seem to perform fairly but they consist of small datasets and are in the absence of a SA and only one study used an external validation set; therefore, risk of overfitting is present.

Furthermore, CAD as a technique within the gynecology–oncology is slowly gaining field in comparison with other oncology specialties. Combining datasets with larger test sets is needed in prospective cohorts [22, 33, 69].

It is likely that deep learning in assessing the nature of an ovarian tumor will reach higher test performances than traditional machine learning. For MRI and CT, the number of studies in this review is limited and needs to be broadened [22].

Strengths and weaknesses

To the best of our knowledge, this is the most comprehensive review on computer-aided diagnostics for differentiating benign from borderline and malignant ovarian tumors on ultrasound, MRI and CT scans. We have worked by a clearly defined protocol that was first submitted to PROSPERO, to provide transparency in the review process and avoid reporting bias. There was no substantial disagreement in inclusion of articles by the authors, and this can be regarded as a strong point in the review process. A meta-analysis of the studies with an external validation set was attempted. A limitation of this review is the heterogeneity between studies, the lack of independent validation sets and comparison with SA.

留言 (0)

沒有登入
gif