Real-World Practice of Gastric Cancer Prevention and Screening Calls for Practical Prediction Models

INTRODUCTION

Gastric cancer ranks fifth in the global cancer spectrum with an incidence rate of 14.0 per 100,000 and fourth in mortality with a rate of 9.9 per 100,000 (1). The prognosis of gastric cancer was poor, but it might be improved significantly when detected early (2,3). Although mass screening for gastric cancer has been conducted in countries with a high incidence, such as Japan and South Korea (4–8), in the circumstance that more health resources have been input to control coronavirus disease 2019, limited gastroscopies can be allocated more efficiently by exact risk prediction. Risk prediction models may also inform individual risks and finally contribute to the improvements in the attendance and compliance of cancer screening for high-risk groups (9). Meanwhile, the population assessed with nonhigh risk can avoid nosocomial infection, mental burden, and other physical injuries. Besides, risk stratification could facilitate primary prevention of gastric cancer, including Helicobacter pylori eradication and adoption of early interventions, which was also a valid way to reduce gastric cancer burden (10,11).

Till now, some risk prediction models for gastric cancer have been developed to support the risk-stratified strategy, differing in study design, statistical methods, and performance (12–15). It is unclear which of these models is high-quality, well-performed, and easy to use. Systematic reviews of prediction models for colorectal cancer (16,17), breast cancer (18), and lung cancer were available (19). Still, there were no corresponding reviews for gastric cancer, as far as we know. In this study, we aimed to systematically summarize the published risk prediction models for gastric cancer for the general population, map their characteristics, and assess the risk of bias (ROB) and applicability of the included models, so as to provide information for candidate selection of further practice of gastric cancer prevention and screening.

MATERIAL AND METHODS

This systematic review was prospectively registered at the International Prospective Register of Systematic Reviews (registration number: CRD42021203804) and was conducted following the Checklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies (20). Supplementary Table S1 presents the key items to guide the framing of this review (see Supplementary Table S1, Supplementary Digital Content 1, https://links.lww.com/CTG/A891).

Search strategy

A systematic search for relevant publications was conducted in 2 electronic bibliographic databases (PubMed and EMBASE) from inception to August 1, 2021, without language restriction. Search strategies consisted of both free text words and MeSH/Emtree, and the details are provided in the Supplementary Material (see Supplementary Digital Content 1, https://links.lww.com/CTG/A891). Besides, the references and citing articles of all the articles eligible for inclusion were screened to ensure the comprehensiveness of the search.

Eligibility criteria

We included studies that met the following criteria: (i) published as an original article in a peer-reviewed journal; (ii) developing or validating a tool, score, or algorithm that could calculate individual relative or absolute risk, so as to perform risk stratification; (iii) including only incident gastric cancer as the outcome; (iv) presenting the area under the receiver-operating characteristic (AUC) curves; (v) applicable to asymptomatic individuals or population at average risk of gastric cancer; and (vi) published in English. For articles that reported more than 1 prediction model for gastric cancer, we selected only the model regarded as the primary outcome of the study (e.g., the enhanced model, but not the conventional model) or the one with best performance (e.g., the highest c-statistic).

Studies were excluded if they were (i) not population-based, such as those developed based on natural history, meta-analysis, or literature review; (ii) collecting data from patients with a definite diagnosis of gastric diseases; and (iii) models with less than 2 indicators. Because we intended to review models to be used to select high-risk individuals for endoscopy, we removed diagnosis models that included predictors derived from endoscopy, fluoroscopy, or gastric tissues. Still, prognostic models with endoscopy-derived variables were included.

Data extraction and quality assessments

According to the Checklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies, 2 reviewers independently finished the article screening and conducted data extraction. Any disagreement was resolved by consensus discussion. For each eligible article, we collected information on study design; participants; and the development, validation, and evaluation of prediction models. To assess the quality of the included studies, we used the Prediction model Risk of Bias Assessment Tool (PROBAST) to evaluate the ROB and applicability of each prediction model through signaling questions in 4 domains of participants, predictors, outcome, and statistical analysis (applicability assessment focuses on the former 3 domains) (21).

RESULTS

In this review, 4,223 articles were identified and full texts of 127 articles were screened. Of these, 104 articles were removed because of irrelevant topic, lack of required data, unmatched participants, and language. A total of 28 articles met all the inclusion criteria, reporting 18 diagnostic models (12,13,22–37) and 10 prognostic models for risk prediction of gastric cancer (14,15,38–45). Figure 1 shows the Preferred Reporting Items for Systematic Reviews and Meta-Analyses flow diagram of study selection.

F1Figure 1.:

Preferred Reporting Items for Systematic Reviews and Meta-Analyses flow diagram of study selection.

Basic characteristics

In general, the prognostic models were basically developed from prospective researches while the diagnostic models were based on case-control studies or medical records. Of all the 18 diagnostic models, 13 models (72.2%) were developed on the Asian population (Table 1). Half were developed only, and 1 model conducted both internal and external validation. There was diversity in the sample size, ranging from 58 to 9,838. Five models (27.8%) focused on gastric adenocarcinomas, with 1 on noncardia adenocarcinoma. Besides, the difference of gender distribution was not negligible in several studies, especially between the case and control groups, while sex was not considered in the development of prediction models (13,15,22).

Table 1. - General characteristics of included studies Study Type of studya Country/region Study period of baselineb Data source Sample size (events) Missing values (method) Sex (male, %) Age (mean ± SD), years Primary outcome Diagnostic models  Lee et al. (23) 2a South Korea 2005 Questionnaire 382 (183) None P: 65.0; C: 47.7 NI Gastric cancer  Kaise et al. (29) 1a Japan P: 2007–2009
C: 2008–2009 Laboratory test 748 (187) None NI P: 64.3 ± 9.7
C: 52.3 ± 12.4 Gastric cancer  Ahn et al. (13) 2a South Korea P: 2002–2003/2006–2007
C: 2004 Laboratory test D: 240 (120)
IV: 146 (95) None P: 59; C: 28 P: 59.4 ± 11.1
C: 52.1 ± 6.6 Gastric adenocarcinomas  Cho et al. (27) 1a South Korea P: 2006–2008
C: 2007–2010 Medical record 948 (474) None P: 65.0; C: 65.0 P: 52.6 ± 9.1
C: 52.9 ± 9.6 Gastric adenocarcinomas  Yang et al. (36) 1a China 2011–2013 Laboratory test 426 (106) None P: 72.64; C: 62.5 P: 59.7 ± 13.4 Gastric cancer  Zhu et al. (37) 2a China 2007–2011 Laboratory test D: 80 (40)
IV: 150 (48) None D: P: 72.5; C: 72.5
IV: P: 72.9; C: 70.6 D: P: 53.83 ± 10.34; C: 53.55 ± 10.11
IV: P: 56.63 ± 10.37; C: 54.03 ± 10.45 Gastric noncardia adenocarcinoma  Kucera et al. (24) 1a Czech 2013–2015 Laboratory test 105 (36) None NI P: 65.2; C: 63.6 Gastric cancer  Tong et al. (35) 2a China 2008–2010 Laboratory test D: 418 (228)
IV: 95 (48) None P: 71.93; C: 63.16 P: 59.82 ± 11.32
C: 59.15 ± 9.27 Primary gastric adenocarcinoma  In et al. (22) 1a United States NI Questionnaire 140 (90) NI P: 50.0; C: 24.0 NI Gastric cancer  Wang et al. (25) 3a China 2013–2015 Laboratory test D: 558 (279)
EV: 327 (186) None D: 74.9; EV: 73.7 D: 58.7 ± 12.0
EV: 58.8 ± 11.6 Gastric cancer  Cai et al. (12) 3a China 2015–2017 Questionnaire + laboratory test D: 9,838 (267)
EV: 5,091 (138) Yes (delete) D: 49.63; EV: 49.77 D: 56.2 ± 9.6
EV: 56.3 ± 9.7 Gastric cancer  Dong et al. (28) 1a China 2016–2017 Laboratory test 150 (119) None P: 74.79 P: range (23–82) Gastric cancer  In et al. (26) 1a United States NI Questionnaire 14 0 (40) Yes (subgroup) P: 50.0; C: 24.0 NI Gastric cancer  Kong et al. (31) 1a China 2016–2017 Questionnaire + laboratory test 1,017 (474) None P: 43.88; C: 47.88 P: 58.00 ± 6.98
C: 57.41 ± 5.50 Gastric cancer  Liu et al. (33) 1a United States 2017 Public domain 407 (375) None P: 37.33 P: 64.92 ± 10.65 Stomach adenocarcinoma  Kim et al. (30) 2a South Korea NI Laboratory test D: 484 (69)
IV: 207 (30) None NI P: 61 ± 11.0 (27–88)
C: 57 ± 8.8 (38–79) Stomach cancer  Lee et al. (32) 4a South Korea 2012–2015 Laboratory test D: 85 (54)
EV: 58 (35) None D: 61.18; EV: 81.03 D: P: 55; C: 48
EV: P: 59; C: 54 Gastric cancer  Song et al. (34) 2a Poland 1994–1996 Laboratory test 200 (100) None 61 65 Stomach cancer (ICD-O 151 or ICD-O-2 C16) Prognostic models  Shikata et al. (40) 1a Japan 1988 Questionnaire + medical record 2,446 (69) Yes (delete) 41.5 57.3 ± 11.4 Gastric cancer  Eom et al. (41) 3a South Korea 1996–1997 Questionnaire + medical record Yes (imputation) Model for males and females, respectively D: P: 45.08 ± 10.47; C: 48.7 ± 11.0
EV: P: 46.83 ± 12.80; C: 51.08 ± 12.05 Gastric cancer (C16)  Charvat et al. (15) 2a Japan 1993–1994 Questionnaire + laboratory test 19,028 (412) Yes (delete) P: 61.9; C: 35.7 P: 63.3
C: 59.3 Gastric cancer (C160-C169)  Ikeda et al. (38) 1a Japan 1988 Questionnaire + medical record + laboratory test 2,446 (123) Yes (delete) 41.5 58.3 ± 11.4 Gastric cancer  Iida et al. (14) 3a Japan 1988–2002 Questionnaire + laboratory test D: 2,444 (90)
EV: 3,204 (35) Yes (delete) D: 41.6; EV: 42.1 D: 58 ± 11
EV: 62 ± 13 Gastric cancer  Taninaga et al. (44) 2a Japan 2006–2017 Medical record D: 1,144 (74)
IV: 287 (15) None P: 84.2; C: 77.6 P: 56.7 ± 8.8
C: 46.2 ± 1.0 Gastric cancer  Charvat et al. (42) 5a Japan 1990–1993 Questionnaire + laboratory test 1,292 (27) None 34.1 56.52 ± 5.78 Gastric cancer (C160-C169)  Jang et al. (39) 1a South Korea 1993–2004 Questionnaire + laboratory test 476 (238) Yes (delete) 41.01 53.50 ± 10.23 Gastric cancer  Sarkar et al. (43) 1a United States 2015–2016 Questionnaire 140 (40) None 31.4 NI Gastric cancer  Trivanovic et al. (45) 1a Croatia NI Laboratory test 116 (25) None 60.3 68.34 ± 13.93 Gastric cancer

C, control group; D, deviation; EV, external validation; IV, interval validation; NI, no information; P, patient group; V, validation.

aType of study: 1a, development only; 2a, development + internal validation; 3a, development + external validation; 4a, development + internal validation + external validation; 5a, external validation only.

Regarding the prognostic models, a great majority (60.0%) was developed in Japan. There were 5 studies only reporting the model development, 1 study conducting external validation of an existing model, and 4 studies reporting both processes. Over half (60.0%) contained missing values, possibly because of the need for long-term follow-up, and 1 study adopted the imputation method. The prognostic models did not limit the subtype of gastric cancer. Although the uneven ratio of men to women also occurred in prognostic models, half studies included sex as a predictor, and Eom et al. developed prediction models of gastric cancer for each sex (41).

From the perspective of real-world practice, the diagnostic models selected in this review were mainly aimed to provide a reliable tool for the pre-examination of large-scale endoscopic screening (12,22,23), surveillance after intervention (27,36), early diagnosis of gastric cancer, or preliminary diagnosis of symptomatic patients based on nongastroscopic predictors (35,37). The prognostic models were applied to screen the appropriate high-risk targets for further endoscopic examination (14,38,40) or to promote cancer prevention (including health education, behavior change, encouraging screening) as a risk reminder (15,41). However, most studies just stated a general purpose, failing to clearly describe models' targeting application scenarios.

Development and performance

Traditional methods, including logistic and Cox proportional hazards regression models, were commonly used to develop prediction models for gastric cancer (Table 2). Machine learning was also adopted in the included studies for modeling. The discrimination of diagnostic models was acceptable, with a range of 0.73–0.99. Different methods were applied to conduct internal validation, such as Bootstrap, random splitting, and leave-one-out cross-validation. Except for 1 study (37), the differences in AUCs between the development and validation processes were not significant, which suggests that the selected diagnostic models might not overfit the training data set (Figure 2). In addition, the performance in external validation did not decrease significantly, so models in this review might not face the modeling error of underfitting. However, only 3 in 18 models reported the performance of calibration. The events per variable (EPVs) values of 10 diagnostic models were over 20, but 4 were with a value less than 10, whose reliability should be taken cautiously.

Table 2. - Key information on the development and validation of included models Study Model development Model evaluation Model validation EPV Type of predictor Modeling method Discrimination Calibration Internal validation External validation Diagnostic models  Lee et al. (23) 16.64 Demographic characteristics + medical history + lifestyle-related factors Logistic regression 0.888 H-L test: P = 0.1747 Bootstrap resampling technique: 0.904 (0.876–0.932) None  Kaise et al. (29) 93.5 Blood measurements Logistic regression 0.883 (0.856–0.909) NR None None  Ahn et al. (13) 10.91 Blood measurements Support vector machine 0.955 NR None None  Cho et al. (27) 79 Demographic characteristics + disease stages Logistic regression 0.783 NR None None  Yang et al. (36) 26.5 Blood measurements Logistic regression 0.959 (0–1) NR None None  Zhu et al. (37) 8 Blood measurements Logistic regression 0.989 NR Random split sampling: 0.812 None  Kucera et al. (24) 7.2 Blood measurements Logistic regression 0.9553 NR None None  Tong et al. (35) 45.6 Blood measurements Random forest 0.8788 (0.8127–0.9449) NR Random split sampling (NR) None  In et al. (22) 11.25 Demographic characteristics + lifestyle-related factors + family history + immigration/acculturation Logistic regression 0.941 (0.901–0.982) H-L test: P = 0.8562 None None  Wang et al. (25) 46.5 Blood measurements Logistic regression 0.841 (0.808–0.871) NR None Wang et al.: 0.856 (0.812–0.893)  Cai et al. (12) 38.14 Demographic characteristics + lifestyle-related factors + blood measurements Logistic regression 0.76 (0.73–0.79) H-L test: P = 0.605; calibration in the large: P < 0.001 Bootstrap resampling technique: 0.76 (0.71–0.80) Cai et al.: 0.73 (0.68–0.77)  Dong et al. (28) 59.5 Blood measurements Logistic regression 0.821 (0.750–0.878) NR None None  In et al. (26) 5 Demographic characteristics + lifestyle-related factors + family history + immigration/acculturation Logistic regression 0.95 (0.92–0.98) NR None None  Kong et al. (31) 118.5 Lifestyle-related factors + results from genomics Logistic regression 0.745 NR None None  Liu et al. (33) 37.5 Results from genomics Lasso logistic regression 0.986 NR None None  Kim et al. (30) 5.75 Results from proteomics Generalized linear models + random forest 0.9098 NR Random split sampling: 0.9706 None  Lee et al. (32) 10.8 Results from transcriptomics Logistic regression 0.924 (0.845–0.970) NR Bootstrap resampling technique: 0.896 (0.894–0.898) Lee et al.: 0.988 (0.916–1.000)
Bootstrap: 0.947 (0.946–0.949)  Song et al. (34) 25 Results from immunoproteomics Lasso logistic regression 0.73 NR Leave-one-out cross validation (NR) None Prognostic models  Shikata et al. (40) 5.75 Demographic characteristics + medical history + lifestyle-related factors + health examination results Cox proportional hazards model 0.809 (0.761–0.856) NR None None  Eom et al. (41) Men: 2433.13
Women: 929.83 Demographic characteristics + family history + lifestyle-related factors Cox proportional hazards model Men: 0.764 (0.760–0.768); women: 0.706 (0.698–0.715) Calibration plot and slope: men: 1.000 (0.983–1.017); women 1.000 (0.962–1.038) None Eom et al.: men: 0.782 (0.777–0.787); women: 0.705 (0.696–0.714)  Charvat et al. (15) 68.67 Demographic characteristics + family history + lifestyle-related factors + blood measurements Cox proportional hazards model 0.777 H-L test: P = 0.06; and calibration plot Bootstrap resampling technique: 0.768 Charvat et al.: 0.798 (0.725–0.861)  Ikeda et al. (38) 12.3 Demographic characteristics + lifestyle-related factors + health examination results + blood measurements Cox proportional hazards model 0.773 NR None None  Iida et al. (14) 18 Demographic characteristics + lifestyle-related factors + blood measurements Cox proportional hazards model 0.79 (0.74–0.83) H-L test: P = 0.31 None Iida et al.: 0.76 (0.69–0.83)  Taninaga et al. (44) 9.25 Health examination results XGBoost 0.899 NR Cross validation: 0.874 None  Charvat et al. (42) 4.6 Demographic characteristics + family history + lifestyle-related factors + blood measurements Parametric survival regression model 0.798 (0.725–0.861) The Nam-d’ Agostino χ2 test: χ2 = 5.57, P = 0.23 / /  Jang et al. (39) 47.6 Demographic characteristics + lifestyle-related factors + blood measurements Logistic regression 0.71 (0.64–0.78) NR None None  Sarkar et al. (43) 10 Demographic characteristics Logistic regression 0.859 (0.796–0.922) NR None None  Trivanovic et al. (45) 12.5 Blood measurements Logistic regression 0.700 (0.57–0.83) NR None None

EPV, events per variable; H-L test, Hosmer-Lemeshow test; NR, not reported.


F2Figure 2.:

The c-statistics reported by the included models. The models are grouped into prognostic and diagnostic models; the uppers were values reported in prognostic models, and the lowers were values from diagnostic models. The type of model (development, external validation, and internal validation) and internal validation method are indicated in the figure.

In general, the performance of prognostic models was inferior to that of diagnostic models, with the AUCs ranging from 0.66 to 0.86. The model based on machine learning showed better discrimination. The results of external validation for the model were close to the AUCs of the original models, and the EPVs were high, suggesting that the model was more reliable. However, for half of the prognostic models, the EPVs did not reach 20 and were not internally or externally validated, which needed to be further verified and optimized. In addition, there was a diversity of the follow-up time among the included models, with a range of 3–20 years.

Considered variables of the prediction models

Laboratory indicators were most commonly considered in diagnostic models, mainly routine examinations on H. pylori infection and pepsinogen as well as molecular-level detection on protein, gene, microRNA, and hormone (Table 3). Specifically, a variety of proteins were applied to predict the risk of gastric cancer, including typical carcinoembryonic antigens (CEA, CA125 and CA19-9), antibodies, and proteins involved in life activities (responsible for metabolism, blood coagulation, chemotaxis, and other cytokines). Personal characteristics were also adopted frequently in diagnosing suspected individuals, mainly sociodemographic variables (age and sex), lifestyle-related factors (dietary habits, alcohol intake and smoking), and health conditions.

Table 3. - Predictors included in the risk prediction models for gastric cancer Study No. Demographic characteristics Health situation Lifestyle-related factors Laboratory measurement Age Sex Others Family history Disease history BMI Others Smoking Alcohol drinking Eating habit Others H. pylori infection PG testing Others Diagnostic models  Lee et al. (23) 11 • Financial status • History of gastroscopy or UGI series; health status Occupational hazards  Kaise et al. (29) 2

留言 (0)

沒有登入
gif