Accurate staging of non-metastatic colon cancer with CT: the importance of training and practice for experienced radiologists and analysis of incorrectly staged cases

Study population

We included patients who underwent presurgical CT followed by surgical resection of colon cancer within the MATCH database. The MATCH study is a prospective multicenter cohort study including patients with stage I-III colorectal cancer from 2007 until December 2017 in seven hospitals in the region of Rotterdam, the Netherlands [9]. The MATCH study was approved by the Erasmus MC medical ethics review board (MEC-2007-088) and all patients provided written informed consent.

Inclusion criteria were colon cancer patients who underwent pre-treatment CT with slice thickness of 3 mm for adequate evaluation. Exclusion criteria were patients with rectal cancers, small sized colon tumor lesions which could not be visualized on CT, poor image quality, and patients who received neoadjuvant therapy. These selection criteria left us with scans from one of the seven centers from the MATCH study, with slice thickness as the main discriminating criterion. From this center, 45 cases were consecutively selected, in sequential order of the original trial. Next, patients were divided evenly into 5 batches (1 baseline batch of 5 cases and 4 batches of 10 cases), so that each batch contained a similar variety of pathologic TNM stages. The baseline batch containing 5 cases was used to assess the baseline accuracy of radiologic staging, the other batches were utilized for the evaluation of the learning curves.

CT scans of the included colon cancer patients

All patients were kept on nil per os for 2–4 h, and bowel preparation was not performed before the CT scan. CT scans were performed with a 16-channel CT scanner (Aquilion, Canon, Tokyo, Japan). All patients underwent preoperative abdominal CT with iodine-based intravenous contrast (3–5 ml/s, total amount of 90–150 ml, followed by bolus injection of 30 cc normal saline) in portal-venous phase at 70 s delay. Images from all CT scanners were reconstructed at 3-mm slice thickness.

Image interpretation

A total of 5 board-certified radiologists (all with 5 + years of experience in abdominal images, of which two with 10 + years) from two separate academic hospitals participated in this study. Readers were blinded to all clinical and pathological data, except for the tumor location. The following imaging features were recorded independently: (1) T-staging of tumor and reader’s confidence using a 0 to 4 scale with 4 as the most confident and 0 as the least confident; (2) N-staging and reader’s confidence; and (3) reading time in seconds. T1-2 tumors were defined as an intraluminal mass with no evidence of extraluminal extension or bowel wall deformation. T3 tumors were defined as tumors with a smooth or nodular, not spiculated, extension beyond the normal delineation of the bowel wall. T4 tumors were defined as tumors extending into adjacent peritoneum or growing into other adjacent tissues or organs. A lymph node with metastasis was defined as a lymph node with a short axis diameter over 8 mm [10].

First, all readers scored 5 scans without any instructions or training to assess the baseline accuracy, followed by a 45-min lecture on colon cancer CT staging provided by an experienced board-certified radiologist (EKH, with over 8 years of experience, who evaluated over 600 cases of colon cancer staging), under the supervision of a senior faculty member (RBT, with over 20 years of experience in abdominal CT imaging). This lecture covered the principles and criteria of colon cancer staging, including radiologic and pathologic definition of T- and N-staging of colon cancer. Next, readers were provided with 1 batch of 10 scans per week, with a total of 4 batches. All scans were additionally scored by EKH to assess the expert radiologist performance.

The readers were randomly divided into the feedback group (n = 2) or the no-feedback group (n = 3). Each group contained one reader with more than 10 years of experience. The readers in the feedback group were provided with histopathological data after interpretation of each batch, allowing a comparison with their radiological findings.

Pathologic interpretation

Routine pathologic staging was used as the reference standard. Processing of the specimen was performed according to local institutional protocols. The national pathology database (i.e., nationwide network and registry of histo- and cytopathology in the Netherlands, PALGA) protocol was used for standardized reporting of the histopathological information [11]. Pathologic T- and N-staging were utilized for analysis. Additionally, we performed a thorough review of the pathology reports in cases scored incorrectly by 3 or more radiologists and/or the expert (regarding T-stage) or scored incorrectly by 4 or more radiologists and/or the expert (regarding N-stage). Hematoxylin and eosin (H&E)-stained sections from the challenging T-stage cases were re-evaluated by an experienced pathologist (HK).

Statistics

Accuracies, sensitivities, specificities, and positive and negative predictive values (PPV and NPV) of the readers in differentiating T1-2 from T3-4 and N0 from N1-2 colon cancer were evaluated both by batch and overall. To assess improvement in these quantities with increased reader experience, the difference in performance between groups of batches was compared between batch 0 and batches 1–4, between batches 0–1 and 2–4, between batches 0–2 and 3–4, and finally between batches 0–3 and batch 4. Testing for significance of the difference between groups of batches was done using Wald tests with robust standard errors obtained from logistic generalized estimating equations (GEE) models with the group of batches as the only independent variable, an independence working correlation structure, and patient id as the clustering variable. These analyses were repeated with only the post-training batches, i.e., batches 1–4.

Averages for confidence and reading time were obtained both separately by batch, and overall. Since additional radiologic features were scored in batches 1–4, the reading time for batch 0 was not comparable and was therefore not studied. Because individual batches were not large enough to fit ordinal GEE models, we treated confidence as a continuous score and tested for differences in mean confidence between feedback groups using Wald tests with robust standard errors obtained from standard linear GEE models. For these models, feedback group was the independent variable, we used an exchangeable working correlation structure, and patient id was used for clustering. Testing for differences of mean reading time between feedback groups was done in the same way, with reading time as the dependent variable instead of confidence. For confidence, groups of batches were also compared to assess the effect of increased reader experience, and an overall difference in confidence between correctly and erroneously staged cases was assessed using a GEE model with correctness of staging (yes/no) as the independent variable. Coherence between pathologic and radiologic staging was assessed using Cohen’s kappa.

Finally, learning curves were obtained for T-staging and N-staging using logistic GEE models with individual reader intercepts and a separate learning effect for both feedback groups, and an exchangeable working correlation structure.

All statistical analyses were performed using IBM SPSS Statistics software (version 28), R version 4.1.1 and MedCalc version 19.1.3. P values < 0.05 were considered statistically significant.

留言 (0)

沒有登入
gif