Development and validation of a claims-based algorithm to identify incidents and determine the progression phases of gastric cancer cases in Japan

Patient characteristics

We performed chart reviews of “candidate patients with incident GC”, who were identified using the cancer registry and claims information at the three participating institutions. The number of patients with incident GC was 355 (7 from only the cancer registry, 14 from only claims, and 334 from both) in the development cohort; 1142 (23 from only the cancer registry, 31 from only claims, and 1088 from both) in the temporal validation cohort; and 95 (1 from only the cancer registry, 7 from only claims, and 87 from both) in the external validation cohort (Online Resource Fig. 3). None of the patients opted out. The mean ages (SD) of patients with incident GC in the development cohort, temporal validation cohort, and external validation cohort were 72.0 (9.9), 71.2 (10.7), and 73.2 (11.8) years, respectively; the proportions of male patients were 70.4%, 69.1%, and 68.4%, respectively (Table 1). The disease progression phases in patients in each cohort are shown in Table 1. The number of patients in the endoscopic curative, surgical curative, and non-curative groups was 156, 131, and 68 in the development cohort; 493, 406, and 243 in the temporal validation cohort; and 20, 54, and 21 in the external validation cohort, respectively.

Table 1 Patients’ characteristics in each cohortAlgorithm development

As detailed in the following subheadings, we initially prototyped an algorithm to identify incident GC cases (referred to as Algorithm X) according to our previous study [26]. After evaluating the performance of Algorithm X, we refined and named it the improved version Algorithm Y. The difference between Algorithm X and Y was in the specific treatment lists used for patient extraction. Algorithm X utilized specific treatment list X, whereas Algorithm Y employed specific treatment list Y (Online Resource Table 1). Figure 1 shows a diagram depicting how Algorithm X and Y could identify incident GC cases using claims data, and Fig. 2 shows a flowchart of patient selection according to Algorithm X and Y.

Fig. 1figure 1

The algorithm diagram depicts how incident gastric cancer cases in a given month are identified using claims data. It is read from top to bottom, showing the sequence of actions taken for patient extraction. First, patients diagnosed with GC within 6 months including the index months are included in inclusion assessment window 1 (INCL 1). Next, patients receiving specific treatments in the index month are selected in INCL 2. Prevalent cases are filtered out in exclusion assessment windows 1 and 2 (EXCL 1 and 2). Patients receiving any listed treatments within 6 months before the index month are considered as those under ongoing treatment and excluded (EXCL 1). Additionally, patients diagnosed with GC between 12 and 6 months before the index month are excluded to eliminate recurrence (EXCL 2). These constraints result in the extraction of incident cases

Fig. 2figure 2

A flowchart showing the extraction of patient data by the algorithms developed in this study

The prototyping process of Algorithm X

Patient identification: We initially identified patients who had been diagnosed with GC (ICD-10 code C160-169 or corresponding Japanese claim codes for diagnosis) (Online Resource Table 2) within 6 months including the index months, to ensure comprehensive capture of all relevant cases. This extended inclusion period was set because we anticipated instances where the diagnosis might have been delayed due to factors such as changes in diagnosis after pathological results or significant delays in establishing the diagnosis by physicians. Subsequently, patients who received specific treatments in the index month listed in specific treatment list X (Online Resource Table 1), which constituted a comprehensive list of GC-specific surgical procedures and GC-specific chemotherapy medications in addition to endoscopic treatment, radiation therapy, and treatment of obstruction, were included. Accordingly, 2,200 cases were extracted, which was far from the true number of patients with incident GC (n = 355) since prevalent cases (e.g., patients under ongoing treatment and recurrent cases) were included.

Filtering prevalent cases: To ensure accuracy, we implemented constraints to exclude the prevalent cases in the index month. Patients who received any listed treatments within 6 months before the index month were considered as those under ongoing treatment and were excluded. Additionally, patients diagnosed with GC between 12 and 6 months before the index month were excluded to eliminate recurrence. These constraints resulted in the exclusion of 1,779 cases, leaving 421 cases.

Results of Algorithm X: After a chart review, 12 cases were removed, including 7 of patients undergoing highly advanced medical care not covered by insurance and 5 referred between institutions after receiving initial GC-specific treatment, resulting in 409 cases for analysis. Performance metrics showed a PPV and sensitivity of 85.3% (95% CI, 81.5–88.6%) (349/409) and 98.3% (95% CI, 96.4–99.4%) (349/355), respectively. This algorithm yielded 60 false positives, which were categorized into four types: (1) non-epithelial tumors labeled as GC and treated with surgery or other procedures (41.7%, 25/60), (2) gastric adenomas labeled as GC and treated with endoscopic treatments (33.3%, 20/60), (3) cases labeled as GC and treated with chemotherapy or radiotherapy for other organ cancers (21.7%, 13/60), and (4) recurrent GC (3.3%, 2/60).

Algorithm refinement

Algorithm Y incorporated an updated specific treatment list (specific treatment list Y), excluding surgeries rarely performed for GC and drugs not recommended in the guidelines (Online Resource Table 1). The refined algorithm extracted 398 cases. After a chart review, the same 12 cases that were excluded from Algorithm X analysis were removed, resulting in 386 cases for analysis. Algorithm Y demonstrated improved accuracy with a higher PPV (90.2%, 95% CI, 86.7–92.9%, 348/386) (Table 2). The 38 false positives included non-epithelial tumors (23.7%, 9/38), gastric adenomas treated with endoscopic treatments (52.6%, 20/38), cases of GC diagnosis treated for other organ cancers (18.4%, 7/38), and recurrent GC (5.3%. 2/38). When gastric adenomas were included in the definition of GC, the PPV of Algorithm Y was 95.3% (95% CI, 92.7–97.2%) (368/386). Algorithm Y outperformed Algorithm X and was adopted as the established algorithm for identifying incident GC cases in this study.

Table 2 Performance metrics of Algorithm Y in the identification of incident gastric cancer cases in each cohortProgression phase determination

Based on the aforementioned methodology, Algorithm Y was customized to determine the disease progression phase of the identified patients with incident GC. The initial diagnostic accuracy of progression phase determination was 92.8% (95% CI, 89.6–95.3%) (323/348). Additional constraints were integrated for refinement (Fig. 3).

Fig. 3figure 3

A refined algorithm for progression phase determination based on the initial series of treatments

Endoscopic group

–Patients who initially underwent endoscopic treatment in the index month, followed by chemotherapy or radiotherapy. In these cases, early GC was cured by endoscopic resection and chemotherapy and radiotherapy were likely administered for concurrent cancers of other organs.

Moreover, given individual variability in the postoperative recovery time, the period for chemotherapy initiated after radical surgery, i.e., postoperative chemotherapy, changed from within 2 months to within 3 months postoperatively.

These adjustments improved the diagnostic accuracy of progression phase determination to 94.5% (95% CI, 91.6–96.7%) (329/348) (Table 3).

Table 3 Diagnostic accuracy of the developed algorithm in progression phase determination in each cohortValidation of the developed algorithm

The developed algorithm was applied to the temporal validation dataset to evaluate its performance on the temporal validation cohort. PPV and sensitivity in identifying patients with incident GC were 90.0% (95% CI, 88.1–91.6%) (1119/1244) and 98.0% (95% CI, 97.0–98.7%) (1119/1142), respectively (Table 2). The diagnostic accuracy for progression phase determination was 94.1% (95% CI, 92.6–95.4%) (1053/1119) (Table 3).

Similarly, the algorithm was applied to the external validation dataset to evaluate its performance on the external validation cohort. PPV and sensitivity in identifying patients with incident GC were 95.9% (95% CI, 89.9–98.9%) (94/98) and 98.9% (95% CI, 94.3–100.0%) (94/95), respectively (Table 2). The diagnostic accuracy in progression phase determination was 93.6% (95% CI, 86.6–97.6%) (88/94) (Table 3).

Subgroup analysis

Subgroup analysis showed no apparent differences in algorithm performance between the institutions and study periods (Online Resource Tables 3 and 4). Online Resource Tables 5 and 6 present the differences in algorithm performance across age brackets. The algorithm’s accuracy for progression phase determination was the lowest at 90.5% (95% CI, 82.1–95.8%) (76/84) in the development cohort for those aged ≥ 80 years.

Differences in the algorithm performance by washout periods

Online Resource Table 7 shows the performance differences in the identification of incident GC cases when the washout period varied from 1 year to 1.5, 2, 2.5, and 3 years, using the development cohort from April to September 2019. When the washout period was set to 1 year, the sensitivity was 98.3% (95% CI, 95.1–99.6%) (174/177) and the PPV was 89.7% (95% CI, 84.5–93.6%) (174/194), whereas when the washout period was extended to 3 years, the sensitivity was 97.7% (95% CI, 94.3–99.4%) (173/177) and the PPV was 90.1% (95% CI, 85.0–93.9%) (173/192). The number of extracted cases and algorithm performance were almost identical across different washout periods.

留言 (0)

沒有登入
gif