Gastric cancer ranks fifth in the global cancer spectrum with an incidence rate of 14.0 per 100,000 and fourth in mortality with a rate of 9.9 per 100,000 (1). The prognosis of gastric cancer was poor, but it might be improved significantly when detected early (2,3). Although mass screening for gastric cancer has been conducted in countries with a high incidence, such as Japan and South Korea (4–8), in the circumstance that more health resources have been input to control coronavirus disease 2019, limited gastroscopies can be allocated more efficiently by exact risk prediction. Risk prediction models may also inform individual risks and finally contribute to the improvements in the attendance and compliance of cancer screening for high-risk groups (9). Meanwhile, the population assessed with nonhigh risk can avoid nosocomial infection, mental burden, and other physical injuries. Besides, risk stratification could facilitate primary prevention of gastric cancer, including Helicobacter pylori eradication and adoption of early interventions, which was also a valid way to reduce gastric cancer burden (10,11).
Till now, some risk prediction models for gastric cancer have been developed to support the risk-stratified strategy, differing in study design, statistical methods, and performance (12–15). It is unclear which of these models is high-quality, well-performed, and easy to use. Systematic reviews of prediction models for colorectal cancer (16,17), breast cancer (18), and lung cancer were available (19). Still, there were no corresponding reviews for gastric cancer, as far as we know. In this study, we aimed to systematically summarize the published risk prediction models for gastric cancer for the general population, map their characteristics, and assess the risk of bias (ROB) and applicability of the included models, so as to provide information for candidate selection of further practice of gastric cancer prevention and screening.
MATERIAL AND METHODSThis systematic review was prospectively registered at the International Prospective Register of Systematic Reviews (registration number: CRD42021203804) and was conducted following the Checklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies (20). Supplementary Table S1 presents the key items to guide the framing of this review (see Supplementary Table S1, Supplementary Digital Content 1, https://links.lww.com/CTG/A891).
Search strategyA systematic search for relevant publications was conducted in 2 electronic bibliographic databases (PubMed and EMBASE) from inception to August 1, 2021, without language restriction. Search strategies consisted of both free text words and MeSH/Emtree, and the details are provided in the Supplementary Material (see Supplementary Digital Content 1, https://links.lww.com/CTG/A891). Besides, the references and citing articles of all the articles eligible for inclusion were screened to ensure the comprehensiveness of the search.
Eligibility criteriaWe included studies that met the following criteria: (i) published as an original article in a peer-reviewed journal; (ii) developing or validating a tool, score, or algorithm that could calculate individual relative or absolute risk, so as to perform risk stratification; (iii) including only incident gastric cancer as the outcome; (iv) presenting the area under the receiver-operating characteristic (AUC) curves; (v) applicable to asymptomatic individuals or population at average risk of gastric cancer; and (vi) published in English. For articles that reported more than 1 prediction model for gastric cancer, we selected only the model regarded as the primary outcome of the study (e.g., the enhanced model, but not the conventional model) or the one with best performance (e.g., the highest c-statistic).
Studies were excluded if they were (i) not population-based, such as those developed based on natural history, meta-analysis, or literature review; (ii) collecting data from patients with a definite diagnosis of gastric diseases; and (iii) models with less than 2 indicators. Because we intended to review models to be used to select high-risk individuals for endoscopy, we removed diagnosis models that included predictors derived from endoscopy, fluoroscopy, or gastric tissues. Still, prognostic models with endoscopy-derived variables were included.
Data extraction and quality assessmentsAccording to the Checklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies, 2 reviewers independently finished the article screening and conducted data extraction. Any disagreement was resolved by consensus discussion. For each eligible article, we collected information on study design; participants; and the development, validation, and evaluation of prediction models. To assess the quality of the included studies, we used the Prediction model Risk of Bias Assessment Tool (PROBAST) to evaluate the ROB and applicability of each prediction model through signaling questions in 4 domains of participants, predictors, outcome, and statistical analysis (applicability assessment focuses on the former 3 domains) (21).
RESULTSIn this review, 4,223 articles were identified and full texts of 127 articles were screened. Of these, 104 articles were removed because of irrelevant topic, lack of required data, unmatched participants, and language. A total of 28 articles met all the inclusion criteria, reporting 18 diagnostic models (12,13,22–37) and 10 prognostic models for risk prediction of gastric cancer (14,15,38–45). Figure 1 shows the Preferred Reporting Items for Systematic Reviews and Meta-Analyses flow diagram of study selection.
Figure 1.:Preferred Reporting Items for Systematic Reviews and Meta-Analyses flow diagram of study selection.
Basic characteristicsIn general, the prognostic models were basically developed from prospective researches while the diagnostic models were based on case-control studies or medical records. Of all the 18 diagnostic models, 13 models (72.2%) were developed on the Asian population (Table 1). Half were developed only, and 1 model conducted both internal and external validation. There was diversity in the sample size, ranging from 58 to 9,838. Five models (27.8%) focused on gastric adenocarcinomas, with 1 on noncardia adenocarcinoma. Besides, the difference of gender distribution was not negligible in several studies, especially between the case and control groups, while sex was not considered in the development of prediction models (13,15,22).
Table 1. - General characteristics of included studies Study Type of studya Country/region Study period of baselineb Data source Sample size (events) Missing values (method) Sex (male, %) Age (mean ± SD), years Primary outcome Diagnostic models Lee et al. (23) 2a South Korea 2005 Questionnaire 382 (183) None P: 65.0; C: 47.7 NI Gastric cancer Kaise et al. (29) 1a Japan P: 2007–2009C, control group; D, deviation; EV, external validation; IV, interval validation; NI, no information; P, patient group; V, validation.
aType of study: 1a, development only; 2a, development + internal validation; 3a, development + external validation; 4a, development + internal validation + external validation; 5a, external validation only.
Regarding the prognostic models, a great majority (60.0%) was developed in Japan. There were 5 studies only reporting the model development, 1 study conducting external validation of an existing model, and 4 studies reporting both processes. Over half (60.0%) contained missing values, possibly because of the need for long-term follow-up, and 1 study adopted the imputation method. The prognostic models did not limit the subtype of gastric cancer. Although the uneven ratio of men to women also occurred in prognostic models, half studies included sex as a predictor, and Eom et al. developed prediction models of gastric cancer for each sex (41).
From the perspective of real-world practice, the diagnostic models selected in this review were mainly aimed to provide a reliable tool for the pre-examination of large-scale endoscopic screening (12,22,23), surveillance after intervention (27,36), early diagnosis of gastric cancer, or preliminary diagnosis of symptomatic patients based on nongastroscopic predictors (35,37). The prognostic models were applied to screen the appropriate high-risk targets for further endoscopic examination (14,38,40) or to promote cancer prevention (including health education, behavior change, encouraging screening) as a risk reminder (15,41). However, most studies just stated a general purpose, failing to clearly describe models' targeting application scenarios.
Development and performanceTraditional methods, including logistic and Cox proportional hazards regression models, were commonly used to develop prediction models for gastric cancer (Table 2). Machine learning was also adopted in the included studies for modeling. The discrimination of diagnostic models was acceptable, with a range of 0.73–0.99. Different methods were applied to conduct internal validation, such as Bootstrap, random splitting, and leave-one-out cross-validation. Except for 1 study (37), the differences in AUCs between the development and validation processes were not significant, which suggests that the selected diagnostic models might not overfit the training data set (Figure 2). In addition, the performance in external validation did not decrease significantly, so models in this review might not face the modeling error of underfitting. However, only 3 in 18 models reported the performance of calibration. The events per variable (EPVs) values of 10 diagnostic models were over 20, but 4 were with a value less than 10, whose reliability should be taken cautiously.
Table 2. - Key information on the development and validation of included models Study Model development Model evaluation Model validation EPV Type of predictor Modeling method Discrimination Calibration Internal validation External validation Diagnostic models Lee et al. (23) 16.64 Demographic characteristics + medical history + lifestyle-related factors Logistic regression 0.888 H-L test: P = 0.1747 Bootstrap resampling technique: 0.904 (0.876–0.932) None Kaise et al. (29) 93.5 Blood measurements Logistic regression 0.883 (0.856–0.909) NR None None Ahn et al. (13) 10.91 Blood measurements Support vector machine 0.955 NR None None Cho et al. (27) 79 Demographic characteristics + disease stages Logistic regression 0.783 NR None None Yang et al. (36) 26.5 Blood measurements Logistic regression 0.959 (0–1) NR None None Zhu et al. (37) 8 Blood measurements Logistic regression 0.989 NR Random split sampling: 0.812 None Kucera et al. (24) 7.2 Blood measurements Logistic regression 0.9553 NR None None Tong et al. (35) 45.6 Blood measurements Random forest 0.8788 (0.8127–0.9449) NR Random split sampling (NR) None In et al. (22) 11.25 Demographic characteristics + lifestyle-related factors + family history + immigration/acculturation Logistic regression 0.941 (0.901–0.982) H-L test: P = 0.8562 None None Wang et al. (25) 46.5 Blood measurements Logistic regression 0.841 (0.808–0.871) NR None Wang et al.: 0.856 (0.812–0.893) Cai et al. (12) 38.14 Demographic characteristics + lifestyle-related factors + blood measurements Logistic regression 0.76 (0.73–0.79) H-L test: P = 0.605; calibration in the large: P < 0.001 Bootstrap resampling technique: 0.76 (0.71–0.80) Cai et al.: 0.73 (0.68–0.77) Dong et al. (28) 59.5 Blood measurements Logistic regression 0.821 (0.750–0.878) NR None None In et al. (26) 5 Demographic characteristics + lifestyle-related factors + family history + immigration/acculturation Logistic regression 0.95 (0.92–0.98) NR None None Kong et al. (31) 118.5 Lifestyle-related factors + results from genomics Logistic regression 0.745 NR None None Liu et al. (33) 37.5 Results from genomics Lasso logistic regression 0.986 NR None None Kim et al. (30) 5.75 Results from proteomics Generalized linear models + random forest 0.9098 NR Random split sampling: 0.9706 None Lee et al. (32) 10.8 Results from transcriptomics Logistic regression 0.924 (0.845–0.970) NR Bootstrap resampling technique: 0.896 (0.894–0.898) Lee et al.: 0.988 (0.916–1.000)EPV, events per variable; H-L test, Hosmer-Lemeshow test; NR, not reported.
The c-statistics reported by the included models. The models are grouped into prognostic and diagnostic models; the uppers were values reported in prognostic models, and the lowers were values from diagnostic models. The type of model (development, external validation, and internal validation) and internal validation method are indicated in the figure.
In general, the performance of prognostic models was inferior to that of diagnostic models, with the AUCs ranging from 0.66 to 0.86. The model based on machine learning showed better discrimination. The results of external validation for the model were close to the AUCs of the original models, and the EPVs were high, suggesting that the model was more reliable. However, for half of the prognostic models, the EPVs did not reach 20 and were not internally or externally validated, which needed to be further verified and optimized. In addition, there was a diversity of the follow-up time among the included models, with a range of 3–20 years.
Considered variables of the prediction modelsLaboratory indicators were most commonly considered in diagnostic models, mainly routine examinations on H. pylori infection and pepsinogen as well as molecular-level detection on protein, gene, microRNA, and hormone (Table 3). Specifically, a variety of proteins were applied to predict the risk of gastric cancer, including typical carcinoembryonic antigens (CEA, CA125 and CA19-9), antibodies, and proteins involved in life activities (responsible for metabolism, blood coagulation, chemotaxis, and other cytokines). Personal characteristics were also adopted frequently in diagnosing suspected individuals, mainly sociodemographic variables (age and sex), lifestyle-related factors (dietary habits, alcohol intake and smoking), and health conditions.
Table 3. - Predictors included in the risk prediction models for gastric cancer Study No. Demographic characteristics Health situation Lifestyle-related factors Laboratory measurement Age Sex Others Family history Disease history BMI Others Smoking Alcohol drinking Eating habit Others H. pylori infection PG testing Others Diagnostic models Lee et al. (23) 11 • Financial status • History of gastroscopy or UGI series; health status Occupational hazards Kaise et al. (29) 2
留言 (0)