Artificial Intelligence and Acute Appendicitis: A Systematic Review of Diagnostic and Prognostic Models

Study Selection

Our initial search across multiple databases led to the identification of 628 articles. An additional four articles were included from auxiliary sources. Following the removal of duplicates, 382 articles remained for screening. After a comprehensive review of titles and abstracts, 84 articles were deemed eligible based on our predetermined criteria. Following an in-depth examination of the full texts, 29 articles ultimately met the inclusion standards. The flowchart in Additional file 3: Figure S1 delineates each step of the article selection process.

Characteristics of the Studies

Among the 29 articles that met the inclusion criteria, 21 articles (72%) focused on the diagnosis of AA, seven (24%) on its prognosis, and one article addressed both diagnosis and prognosis [19]. Over half of the studies (51%) adopted a cross-sectional research design. Furthermore, a substantial portion of the included studies, 16 of them (55%), were conducted within the past 5 years (2019 and later). Remarkably, all prognosis-related studies were carried out within the last 3 years (2021 and later). A comprehensive overview can be found in Additional file 4: Figure S2.

Risk of Bias Assessment

Utilizing the PROBAST, our assessment revealed that among the reviewed studies, 11 exhibited low risk of bias [3, 12, 19,20,21,22,23,24,25,26,27], while 18 exhibited high risks of bias [28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45]. The primary factor contributing to a high risk of bias was selection bias [30, 32, 33, 36, 39, 42, 44, 45], identified in eight studies. Furthermore, seven studies lacked internal validation and thus were excluded from further quality assessment [34, 35, 37, 38, 40, 41, 43]. Additionally, six studies had issues related to their analyses [29,30,31,32, 42, 44]. In two studies, a high risk of bias was associated with the outcome or its determination [28, 32]. One study had a high risk of bias introduced by their predictors or assessment [32]. The comprehensive evaluation of each domain’s quality across the studies is illustrated in Fig. 1. For a detailed breakdown of the quality assessment and PROBAST domains, please refer to Additional file 5: Table S1.

Fig. 1figure 1

Quality assessment of included studies

Artificial Intelligence Algorithms

A total of 24 distinct artificial intelligence algorithms were applied across the selected articles. These algorithms fell into six main categories: (1) Statistical classifiers, encompassing Logistic Regression (LR) and Naïve Bayes (NB); (2) ML classifiers such as SVM, Decision Trees (DT), and K-Nearest Neighbors (KNN); (3) Ensemble ML techniques, which include RFs, Pre-clustering Ensemble Learning (PEL), and variants of Boosted DT like Gradient Boosting (GB), Extreme Gradient Boosting (XGB), and CatBoost; (4) ML Neural Networks, including Artificial Neural Networks (ANN) and other specialized forms like Multilayer Perceptron (MLP), Backpropagation Neural Network (BPNN), Radial Basis Function Network (RBFN), Adaptive Resonance Theory (ART), Self-Organizing Maps (SOMs), Learning Vector Quantization (LVQ), Multilayer Neural Network (MLNN), Probabilistic Neural Network (PNN), Extreme Learning Machines (ELM), and Kernel ELM (KELM); (5) DL techniques, particularly CNNs; and (6) other miscellaneous algorithms including fuzzy rule-based and neuro-fuzzy approaches.

Over half of the studies (51%) utilized a singular algorithmic model, while the remainder employed multiple models, varying from two [26, 40, 43, 44] to six [25] in number. Comprehensive data on the types of models employed are outlined in Tables 1 and 3.

Table 1 Characteristics of studies in artificial intelligence applications for appendicitis diagnosis

The ANN model and its variants were the most commonly employed, being featured in 13 studies [21, 24,25,26,27, 32,33,34, 37, 38, 40, 42, 45]. LR followed in frequency, appearing in nine studies [3, 24,25,26, 28, 30, 32, 42, 43]. DT [25, 30, 35, 36, 40, 41, 43] and SVM [23, 24, 28, 30, 37, 45] were each utilized in seven studies, while RF [23, 24, 29, 31, 45] was implemented in six studies.

The distribution of algorithms and their subtypes is outlined in Fig. 2, and temporal trends are depicted in Fig. 3.

Fig. 2figure 2

Algorithms utilized in studies

Fig. 3figure 3

Trends of different algorithms

Diagnosis of Appendicitis

A total of 22 studies applied algorithms for the diagnosis of AA. The detailed information on the included studies is summarized in Tables 1 and 2.

Table 2 Details of artificial intelligence methods applied and outcomes in studies for appendicitis diagnosisInput Features

Each study employed a unique spectrum of input variables to train their models. A majority of the studies predominantly utilized the incorporation of demographic factors, clinical indicators, and laboratory measurements as the primary features for model training [21, 24, 26, 34, 35, 37, 39, 42, 44]. Radiological assessments, particularly CT images, were the chosen input modality in three studies [12, 20, 22]. Laboratory data served as the exclusive input for four studies [19, 23, 32, 36]. Additionally, three studies deployed a combination of clinical observations and laboratory data as their input features [38, 40, 41]. In one particular study, the input comprised a fusion of demographic and laboratory data [33]. Infrared thermographic evaluations of the abdomen were employed in a single study [

留言 (0)

沒有登入
gif