Diameter thresholds for pure ground-glass pulmonary nodules at low-dose CT screening: Chinese experience

Introduction

The advent of low-dose CT (LDCT) has significantly increased the detection of pure ground-glass nodules (pGGNs) in regions like China.1–3 These pGGNs, which may be transient or evolve into preinvasive or invasive lung adenocarcinomas, often exhibit slow growth and low metastatic potential, raising concerns about overdiagnosis and overtreatment.4 Establishing precise follow-up criteria is crucial to balance therapeutic benefits with the risks of unnecessary treatment.

Prior research has employed various criteria for defining positive results in lung screening. The National Lung Screening Trial (NLST) set a threshold of nodules over 4 mm,1 while the Early Lung Cancer Action Project (ELCAP)5 and its extensions like New York ELCAP (NY-ELCAP) and International ELCAP (I-ELCAP) used a threshold of ≥5.0 mm.4 The NELSON trial6 applied volumetric measurements, considering nodules less than 100 mm³ or 5 mm as non-predictive of cancer, requiring immediate evaluation for nodules larger than 300 mm³ or 10 mm. Tools like the Brock University calculator, shown to outperform Lung-RADS in predicting nodule malignancy, have been integrated into various studies.7 The Fleischner Society recommends measuring the average diameter of nodules for a more accurate three-dimensional representation.8 Recent studies advocate adjusting thresholds for solid or part-solid nodules. Given the high prevalence of pGGNs in the Chinese population, setting a specific diameter threshold for this group is recommended.

The WHO’s 2021 classification identifies atypical adenomatous hyperplasia (AAH) and adenocarcinoma in situ (AIS) as precancerous stages,9 with studies showing nearly universal 5-year survival for these early-stage lung adenocarcinomas.2 This supports a conservative management approach in such cases.3 Nonetheless, the potential for minimally invasive adenocarcinomas (MIA) to require intervention and the invasiveness of some pGGNs are well-documented.10 11 A pivotal aspect of our research is to assess whether the inclusion or exclusion of AAH and AIS in defining positive results changes the ideal lung cancer diagnosis threshold for pGGNs.

Our study specifically targets the optimisation of LDCT screening thresholds for pGGNs in the Chinese population, aiming to refine criteria for positive results and improve the diagnostic accuracy of lung cancer screenings.

Materials and methodsStudy design

Our study conducts a secondary retrospective analysis using data from the ‘Guangzhou Lung-Care Project’,12 which screened 11 708 participants with a one-off LDCT from 2015 to 2021 for early lung cancer detection (figure 1). We extracted data from all LDCT scans that identified pGGNs, focusing on metrics such as average transverse diameter,13 location and available pathological results. Using diameter thresholds ranging from 5 to 10 mm, we assessed diagnostic performance—comprising area under the curve (AUC), sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV)—to evaluate the efficacy of these thresholds in predicting malignant pGGNs. Malignancy was confirmed via histopathological or cytopathological examination, adhering to the WHO’s 5th Edition classification of lung neoplasms.

Figure 1Figure 1Figure 1

Consort flow diagram for nodules selection in the current study. AAH, atypical adenomatous hyperplasia; AIS, adenocarcinoma in situ; IA, invasive adenocarcinoma; LDCT, low-dose CT; MIA, minimally invasive adenocarcinoma.

Participants from ‘the Guangzhou Lung-Care Project’ who had pGGNs detected by LDCT were included, excluding those who did not follow the planned follow-up. Detailed inclusion and exclusion criteria are provided in online supplemental material. For this study, lung cancer was defined as any pGGNs confirmed as malignant, including categories such as AAH, AIS, MIA and invasive adenocarcinoma (IA). Malignant nodules were classified into three groups to evaluate diagnostic thresholds: group 1 (MIA+IA), group 2 (AAH+AIS+MIA+IA) and group 3 (IA-only). This categorisation helped assess the impact of including or excluding pre-cancerous lesions (AAH+AIS) on the optimal threshold for malignant pGGNs diagnosis. Additionally, we examined the diagnostic accuracy of various diameter thresholds across these groups and gender, age and multiple/solitary subgroups. This retrospective secondary analysis obtained an Institutional Review Board waiver for informed consent.

Diagnosis of lung cancer

Lung cancer diagnoses were confirmed through histopathological examination of resected specimens or cytopathological assessment of needle-aspiration biopsy samples. The classification of resected tumours followed the WHO’s 5th Edition guidelines for lung neoplasms.

Definition of average transverse diameter

In alignment with the Fleischner Society’s guidelines,8 we measured the ‘average transverse diameter’ of detected nodules rather than the ‘maximal transverse diameter’. Our methodology involved using software to determine the longest and its perpendicular diameter on each transverse CT image containing the nodule. The maximum of these average values across images was designated as the nodule’s average transverse diameter.14 This measurement method has been recommended by the Fleischner Society since 2005 and is also used in other protocols such as iELCAP.1 14

Definition of screening results and outcomes

In LDCT lung cancer screening, a positive result triggers follow-up LDCT or further diagnostic procedures.15 Nodules meeting or exceeding set diameter thresholds that are histologically confirmed as benign are categorised as false positives, while those confirmed as AAH, AIS, MIA or IA are true positives. Subsequent cancer diagnoses are verified through medical record review.

Unnecessary follow-up CTs

Defined for participants monitored over 3 years without meeting surgical criteria or who are postsurgery confirmed to have benign nodules. Any CTs after 3 years are deemed unnecessary.14

Delayed lung cancer diagnoses

This metric estimates potential diagnostic delays when the diameter threshold is increased, potentially leading to initial negative classifications for nodules like a 7 mm ground-glass nodule diagnosed later as IA. For example, under a 5 mm threshold, such a nodule would be marked for immediate follow-up, but an 8 mm threshold might delay necessary intervention, increasing the delayed diagnosis count for each threshold change. Additionally, we evaluated positive rates (proportion of positive results among all participants) for each threshold. Furthermore, we analysed changes in ‘immediate intervention’ versus ‘delayed intervention’ (surgical treatment at least 1 year after the first LDCT) for lung cancer cases as the diameter threshold increased.

Statistical analysis

We assessed the diagnostic performance of each nodule type in identifying patients with lung cancer at various thresholds using the AUC and decision curve analysis (DCA).16 We analysed sensitivity, specificity, accuracy, PPV and NPV for each threshold individually, along with their respective 95% CIs. Fisher’s test was employed to compare sensitivity and specificity across different thresholds, considering differences with a p value <0.05 as statistically significant. Statistical analyses were performed using IBM SPSS statistical software (V.24.0), MedCalc statistical software V.20.2 and STATA V.14.0 (Stata Corp., College Station, Texas, USA), with WY, WF and CL as the statistical guarantors.

ResultsCharacteristics of the participants

In our study, 2720 (22.8%) participants from ‘the Guangzhou Lung-care Project’, had at least one pGGN detected on their lung LDCT scans. 2720 participants were enrolled in our study, including 1224 (45.0%) men and 1496 (55.0%) women, with a median age of 61 years (IQR, 55–66). Among them, 547 were current smokers, 256 were former smokers and 1917 were never smokers (table 1). After baseline LDCT and follow-up, 73 (2.7%) patients with pGGNs were diagnosed with lung cancers.

Table 1

Characteristics of participants with pure ground-glass nodules

Characteristics of nodules

In the LDCT scans of 2720 participants, a total of 3580 pGGNs were detected, averaging 1.32 pGGNs per participant. The average transverse diameter for pGGNs was 5.18 mm (IQR, 4.0–6.0). Among the 3580 pGGNs, 105 (2.9%) pGGNs were diagnosed as malignant nodules, 90 (2.5%) of them were diagnosed as MIA or IA and 15 (0.4%) of them were diagnosed as precancerous lesions (table 2). Further details on the proportion of lung cancers and benign and malignant nature of the resected lung nodules are shown in online supplemental e-figure 1 and e-table 1.

Table 2

Distribution of pure ground-glass nodules size in three groups

Diagnostic performances at different thresholds

Table 3 and figure 2 illustrate the diagnostic performance of pGGNs across various diameter thresholds. An inverse relationship was observed between sensitivity and specificity (figure 2), as well as between NPV and PPV (figure 2). As the threshold increased, sensitivity and NPV decreased while specificity and PPV increased. In the ‘MIA+IA’ group, raising the diameter threshold from 5 mm to 10 mm resulted in a decrease in sensitivity from 94.4% to 60.0% and in NPV from 99.8% to 98.9%, while specificity increased from 61.0% to 93.5% and PPV from 5.9% to 19.2%. Similar trends were noted in the ‘AAH+AIS+MIA+IA’ and ‘IA-only’ groups, with sensitivity decreasing and specificity increasing as thresholds rose. The ‘IA-only’ group showed the highest sensitivity and NPVs, while the ‘AAH+AIS+MIA+IA’ group had the lowest sensitivity. Sensitivity and specificity changes were statistically significant across all three groups for each 1 mm increase in diameter threshold p<0.001. Online supplemental e-table 2 illustrates the diagnostic performance of all types of pulmonary nodules across various diameter thresholds.

Table 3

Diagnostic performance for three outcomes in pGGNs after low-dose CT

Figure 2Figure 2Figure 2

Diagnostic performance of different diameter thresholds in pGGNs. (a) The left Y-axis shows the proportion of MIA and IA in pGGNs in each diameter range and the right Y-axis shows AUCs using various diameter thresholds-define a positive result for pGGNs in MIA and IA. (b) Diagnostic performances in pGGNs at different diameter thresholds: specificities for three outcomes of the LDCT were increased while sensitivity shows a gradual decrease with increasing diameter thresholds from 5 to 8 mm. Beyond an 8 mm diameter threshold, the rate of increase in specificity becomes more gradual, while the rate of decrease in sensitivity becomes more pronounced. No significant difference between the three outcomes. (c) Diagnostic performances in pGGNs at different diameter thresholds: PPVs (c) for three outcomes of (d) the LDCT were increased while NPVs (d) show a gradual decrease with increasing diameter thresholds from 5 to 8mm. Beyond an 8 mm diameter threshold, the rate of increase in PPVs becomes more gradual, while the rate of decrease in NPVs becomes more pronounced. AAH, atypical adenomatous hyperplasia; AIS, adenocarcinoma in situ; AUCs, areas under the curve; IA, invasive adenocarcinoma; LDCT, low-dose CT; MIA, minimally invasive adenocarcinoma; NPV, negative predictive value; pGGNs, pure ground-glass nodules; PPV, positive predictive value.

AUC initially increased before declining after surpassing optimal thresholds in three groups (online supplemental e-figure 2). The best diagnostic performance was at 7 mm for the ‘MIA+IA’ and ‘AAH+AIS+MIA+IA’ groups and 8 mm for the ‘IA-only’ group, as shown in online supplemental e-figure 2. In the ‘MIA+IA’ group, AUC rose from 0.777 to 0.829 between 5 mm and 7 mm thresholds, then dropped to 0.767 at 10 mm. The ‘AAH+AIS+MIA+IA’ group experienced a similar trend, with AUC increasing from 0.748 to 0.804 and then decreasing to 0.744. The ‘IA-only’ group’s AUC increased from 0.783 to 0.846 before falling to 0.80 at the 10 mm threshold. Receiver operating characteristic (ROC) analysis across the three groups revealed that incremental 1 mm changes from 7 mm to 8 mm in AUC were statistically significant (table 3).

We conducted subgroup analyses for the ‘MIA+IA’ group by gender, age and nodule count, observing similar trends. In the male subgroup (online supplemental e-Table 3 and e-figure 3), increasing the threshold from 5 mm to 10 mm led to decreased sensitivity (from 93.0% to 65.12%) and NPV (from 99.7% to 98.9%), while specificity and PPV increased significantly (from 59.4% to 92.4% and from 5.8% to 18.9%, respectively). Optimal AUC was at an 8 mm threshold (0.82), decreasing slightly at 10 mm (0.79). The female subgroup showed comparable results, with optimal AUC reached at 8 mm (0.852), then declining at 10 mm (0.75). Age-specific analyses (online supplemental e-figure 4) revealed similar patterns, with the highest sensitivity and specificity in the <60 age group, peaking AUC at 8 mm across age groups (online supplemental e-table 4). For nodules number (online supplemental e-table 5), increasing the threshold led to higher specificity and PPV, with the AUC peaking at 8 mm for multiple nodules and at 7 mm for solitary nodules. The multiple nodule group showed significantly higher sensitivity and AUC than the solitary nodule group (online supplemental e-figure 5). More information about distribution of pGGNs in different thresholds of Multiple and Solitary groups and different age groups are shown in online supplemental e-tables 6 and 7.

Effect on unnecessary CT scans and the presumed delayed lung cancer diagnosis

In a cohort of 2270 participants, positive rates for pGGNs decreased with increasing diameter thresholds (table 4), with rates ranging from 11.1% at a 5 mm threshold to 2.7% at 10 mm. Elevating the threshold from 5 mm to 7 mm increased the frequency of unnecessary follow-up CT scans, rising from 28.8% to 42.5% (figure 3). This adjustment also led to presumed delays in diagnosing MIA and IA, with delayed diagnosis rates escalating from 4.5% to 16.9%. Conversely, increasing the threshold from 8 mm to 10 mm reduced unnecessary CT scans slightly but increased the presumed delay in lung cancer diagnosis rates for MIA and IA from 23.6% to 47.2%. Additionally, the rates of presumed delayed diagnoses for AAH and AIS also increased significantly, from 46.7% at a 5 mm threshold to 73.3% at 10 mm (figure 4). The analysis for unnecessary CT scans and the presumed delayed lung cancer diagnosis in all pulmonary nodules are displayed in online supplemental e-table 8.

Table 4

Presumed unnecessary follow-up CT and delayed diagnosis of lung cancer at modified thresholds in pure ground-glass nodules

Figure 3Figure 3Figure 3

(a) The proportion of unnecessary follow-up CT scans in each diameter range. (b) The proportion of confirmed invasive adenocarcinoma in groups of ‘Immediate intervention’ and ‘Delayed intervention’.

Figure 4Figure 4Figure 4

The proportion of delayed diagnosis of MIA and IA or AAH and AIS in pure ground-glass nodules in each diameter range. AAH, atypical adenomatous hyperplasia; AIS, adenocarcinoma in situ; IA, invasive adenocarcinoma; MIA, minimally invasive adenocarcinoma.

Effect on the proportion of IA

With the increment in diameter threshold, there was a concomitant increase in the proportion of IA. Notably, beyond an 8 mm threshold, the frequency of ‘delayed interventions’ in IA cases exhibited a substantial rise, from 30.0% to 50.0%. Concurrently, the trends of IA proportions in ‘immediate interventions’ and ‘delayed interventions’ began to converge, as evidenced in figure 3 and table 4.

Decision curve analysis

DCA17 indicated that all models outperformed the ‘treat-all’ and ‘treat-none’ strategies (figure 5). Despite overlaps among the models’ DCA curves, the 8 mm threshold model consistently showed the highest net benefit (NB), especially within the critical probability range of 0.05–0.20. Notably, at a decision threshold of 10%, the 8 mm model’s NB was 0.011 higher than that of the 9 mm model, confirming its superiority in providing NB across threshold probabilities from 0.35 to 1.8.

Figure 5Figure 5Figure 5

Decision curve analysis for a lung cancer prediction model using nodule diameter, with the addition of different diameter thresholds.

Discussion

Previous large-scale lung cancer screening studies, such as the European NELSON,18 American NLST and I-ELCAP, have focused on solid and part-solid nodules. Predictive models like Brock19 and others developed in Western populations have shown limited applicability in Asian populations due to differences in disease prevalence and environmental factors. Validation in Chinese cohorts20 21 reported AUCs of 0.58 to 0.71, highlighting the need for region-specific models. New models, including a random forest approach, have shown promising AUCs of up to 0.88 for the Chinese population. Our study pioneers in focusing on pGGNs in early lung cancer screening in China.

In the NELSON trial, lung cancer probability was low for participants with nodules under 5 mm in diameter (0.4% (0.2–0.7)), similar to those without nodules (0.4% (0.3–0.6), p=1.00). For nodules between 5 and 10 mm, the probability was intermediate, necessitating follow-up CT scans (1.3% (1.0–1.8)). However, nodules 10 mm or larger showed a significantly higher likelihood of lung cancer (15.2% (12.7–18.1)).22–24 Furthermore, within the Lung-RADS classification, nodules 10–19 mm were associated with a higher malignancy risk (6%) compared with smaller pGGNs (<10 mm, 1.3%; p=0.01).18 These findings support reclassifying pGGNs larger than 10 mm into a higher Lung-RADS category due to increased cancer risk.25 26 The NELSON trial underscores the need for more precise diameter thresholds within the 5–10 mm range to accurately assess malignancy risk in early lung cancer screening. Large lung cancer screening initiatives like Korean Lung Cancer Screening Project (K-LUCAS), NLST and I-ELCAP have refined the prediction of pulmonary nodule malignancy by implementing precise diameter thresholds. In I-ELCAP, pGGNs were categorised as semi-positive in LDCT screenings,27 contrasting with NLST and NELSON, which generally regarded pGGNs as benign, focusing more on solid and part-solid nodules. The prevalence of pGGNs in the I-ELCAP baseline was 4.2%, leading to 73 lung cancer diagnoses out of 57 496 participants. In our study, pGGNs were more prevalent, occurring in 22.8% of 11 708 participants.28 29 Comparative analysis shows our positivity rates (7.7% at 6 mm, 3.3% at 9 mm) were similar or slightly lower than those reported by NLST (10.5% at 6 mm, 4.1% at 9 mm),30 I-ELCAP (10.2% at 6 mm, 4.0% at 9 mm)27 and K-LUCAS (8.7% at 6 mm, 3.7% at 9 mm).14 NLST data indicated that sensitivity exceeded specificity below a 7 mm threshold, whereas specificity dominated above this level.30 Our findings mirrored this trend: specificity and PPV increased with higher thresholds, while sensitivity decreased, particularly beyond 8 mm, consistent with trends observed in K-LUCAS.14

Our study aligned closely with major lung cancer screening data regarding PPVs for ‘MIA+IA’ at different thresholds: 8.3% at 6 mm and 17.2% at 9 mm, comparable to NLST (8.5% at 6 mm, 19.6% at 9 mm)30 and surpassing both K-LUCAS (5.5% at 6 mm, 12.6% at 9 mm)14 and I-ELCAP (5.5% at 6 mm, 13.2% at 9 mm).27 A notable factor is our cohort’s lower prevalence of tuberculosis sequelae (5.3%) compared with K-LUCAS (13.2%),31 potentially influencing the higher PPVs observed. Variations in demographics likely drive these differences in positivity rates and PPVs. Incremental increases in the diameter threshold for LDCT screenings, suggested by NLST data, might slightly delay lung cancer diagnoses but significantly reduce false positives,30 which can lead to unnecessary follow-ups, invasive procedures and increased healthcare costs. In our study, delayed diagnoses for IA and MIA increased with higher thresholds, reaching 47.2% at 10 mm, with even higher rates for AAH and AIS. An 8 mm threshold for pGGNs optimally balances diagnostic accuracy and reduces unnecessary follow-ups, which dropped from 31.4% at 7 mm to 71.9% at 9 mm, similar to K-LUCAS data (35.5% at 7 mm, 60.2% at 9 mm).14 Diagnostic performance declined beyond 8 mm, despite fewer unnecessary follow-ups. We recommend an 8 mm threshold as the upper limit to minimise risks while maintaining diagnostic accuracy.

DCA provides a framework for evaluating prediction models by comparing NB across decision thresholds (0–1), considering true and false positives. In cancer research,32 thresholds typically range from 0.05 to 0.20, reflecting varying patient preferences. In our study, five models with different diameter thresholds for predicting lung cancer in pGGNs were assessed. The 8 mm threshold model showed the highest NB, especially at a 10% decision threshold, predicting 12 additional malignant nodules per 100 screenings compared with the 7 mm model. This model outperformed others across decision thresholds from 0.35 to 1.8. Lower thresholds enhance sensitivity for patients concerned about disease risk, while higher thresholds reduce overtreatment. Incorporating tumour markers like carcinoembryonic antigen, DNA methylation, and circulating tumour DNA may further improve diagnostic accuracy for those favouring higher thresholds, optimising treatment and resource use.

Our study found that increasing the diameter threshold from 7 mm to 8 mm slightly reduced sensitivity, resulting in more missed cases, as evidenced by the increase in delayed lung cancer diagnoses at higher thresholds (figure 4). This decrease in sensitivity, though a trade-off, was accompanied by improvements in specificity and accuracy (figure 2), enhancing the detection of more clinically significant nodules. Notably, this adjustment from 5 mm to 8 mm did not introduce any sex-based disparities in diagnostic performance. We observed the highest proportion of IA and MIA diagnoses in nodules sized around 8 mm (figure 2). Although sensitivity decreased, the improved specificity and accuracy suggest more effective identification of larger, potentially more malignant nodules. Furthermore, as the diameter threshold increased, the rate of unnecessary follow-up CTs initially rose but then significantly dropped at 8 mm—from 42.51% at 7 mm to 29.64% at 8 mm (figure 3, table 4), thus conserving medical resources. Our analysis also showed that the proportion of IA in the delayed intervention group was lowest at 8 mm threshold, indicating a more targeted and efficient approach to early lung cancer detection. Increasing the threshold beyond 8 mm did not significantly improve early screening outcomes for IA, as shown by the convergence of IA proportions in different groups (figure 3). For the ‘MIA+IA’ group, the AUC values at 6 mm, 7 mm and 8 mm were 0.777, 0.829 and 0.822, respectively. Although the AUC is slightly higher at 7 mm, there are no significant differences between 6 mm and 7 mm. While diagnostic accuracy is important, clinical utility is best assessed through DCA. The 8 mm threshold model shows superior NB in the 0.05–0.20 probability range (figure 5). At a 10% decision threshold, it outperforms the 7 mm model, detecting 12 additional malignant nodules per 100 screenings. The 8 mm model offers the highest NB, highlighting its efficacy in this cohort.

Subgroup analysis shows that raising the diameter threshold to 8 mm consistently improves diagnostic accuracy for malignant persistent pGGNs across various demographics, including age, sex and nodule count. This threshold is particularly effective in patients with multiple nodules and yields the highest accuracy in those under 60 years. ROC analysis indicates that, except for solitary nodules (which peak at 7 mm), the AUC for all other subgroups is highest at 8 mm. Notably, increasing the threshold from 5 to 8 mm has a similar impact across sex, age and nodule count.

We categorised malignant pGGNs into three groups to assess the impact of including pre-malignant lesions (AAH and AIS) on early detection models. At an 8 mm threshold, all groups showed consistent diagnostic performance, with sensitivity ranging from 71.4% to 80.6%, specificity around 88.9% and AUC values between 0.80 and 0.85. The IA-only group had the highest sensitivity, while the inclusion of smaller precancerous lesions in the ‘AAH+AIS+MIA+IA’ group led to more missed diagnoses. Adding AAH and AIS did not significantly improve accuracy, suggesting current threshold settings effectively balance sensitivity and specificity for early detection.9 10 28 33

This study’s retrospective design and use of higher nodule diameter thresholds to define positive results leave a gap in direct evidence on whether elevated thresholds contribute to stage progression or impact outcomes, as seen in NLST and I-ELCAP. Longer follow-up may also reveal more false negatives, potentially reducing sensitivity in lung cancer detection.

Acknowledgments

We are indebted to the coordination of Mr Xiaoping Tang, Mr. Zhongqi Liu, Mr Ruihua Zhou, Mrs Yi Zhang, Mr Tiegang Li, Mrs Suya Zeng, Mrs Xiaoping Zhong (Guangzhou Municipal Health Commission), Mr Jingqing He, Mr Jinjian Wei, Mr Wei, Mr Jiguo Xie, Mrs Zhen Li, Mr Zhuanghui Zhuang (Guangzhou Civil Affairs Bureau and Guangzhou Charity Federation), Mr Huazhang Liu, Mrs Weiyun He, Mr Guozhen Lin, Mr Boheng Liang (Guangzhou Center for Disease Control and Prevention), Mr Huanqing Huang, Mr Qiang Yan (People's Government of Guangzhou Yuexiu District), Mr Fuchu Huang, Mrs Yunjie Cheng, Mr Keng Li (Health Bureau of Guangzhou Yuexiu District), Mr Chenghua Gong, Mr Fan Weng (Guangzhou Yuexiu District Center for Disease Control and Prevention), which greatly facilitated the initiation and development for the Guangzhou Lung-Care project. We thank the staffs of Renming Street, Zhujiang Street, Guangta Street, and Beijing Street Community Health Service Centers and Subdistrict Offices for their assistance in the enrolment and screening of residents and dedication to data collection and verification.

留言 (0)

沒有登入
gif