This study could inform the development of new risk prediction models which minimise the number of individuals that need to go into surveillance following lung cancer screening.
Defining the outcome as lung cancer in the patient is novel, as most existing models predict risk of malignancy in a specific nodule.
A team of experts in the field will oversee and advise the meta-analysis team.
It is expected that some publications will report figures for the outcome of malignancy of the nodule, while others will report figures for the outcome of lung cancer in the screenee which is a potential limitation.
The lack of patient and public involvement (PPI) in the current stage is a limitation of the study. However, involvement of a PPI panel will be sought in the future.
IntroductionWorldwide, lung cancer (LC) is the second most frequent cancer and has the highest cancer-related mortality.1 Approximately 48 500 people are diagnosed in the UK each year with survival highly dependent on the stage.2 According to the Office of National Statistics, for people diagnosed with stage 1 lung cancer in England between 2013 and 2017, there was a 55% 5-year survival, while this was 35% for people diagnosed with stage 2 lung cancer, 15% for people diagnosed with stage 3 lung cancer and 5% for people diagnosed with stage 4 lung cancer.3
Therefore, shifting the distribution of diagnoses towards early stages would significantly improve mortality, and there is increasing implementation of lung cancer screening, using low-dose computed tomography (LDCT) as an early detection strategy for populations at high risk of lung cancer. The US National Lung Screening Trial (NLST) showed a 20% reduction in lung cancer mortality from three annual LDCT screens, compared with annual chest X-ray,4 while the Dutch-Belgian Nederlands–Leuvens Longkanker Screenings Onderzoek (NELSON) showed a 24% reduction among men in the screening group (LDCT) relative to men in the control group (no screening).5 In addition to the NLST and the NELSON, worldwide, there have been numerous completed and ongoing clinical trials on lung cancer screening with LDCT. Field et al published the results of the UK Lung Screening (UKLS) trial alongside a meta-analysis of the randomised worldwide evidence which indicated a significant reduction in lung cancer mortality with a pooled overall relative rate of 0.84 (95% CI 0.76 to 0.92) from nine eligible trials.6 Furthermore, the Cochrane systematic review of eight major international randomised clinical trials (RCTs found that CT screening reduced deaths from lung cancer by 21%.7
Currently, there are implementation studies ongoing around the world (eg, USA, Italy and France), including the UK, where the UK National Screening Committee recommended a targeted national screening programme in June 2022, although targeted screening is already offered in 23 selected areas of England through NHS England’s Targeted Lung Health Check Programme and is expected to be expanded to 49 during the course of 2023.8–10 In the research context, implementation studies are also ongoing to test feasibility and build the necessary capacity and infrastructure. Following on from the UKLS LDCT trial, the Yorkshire Lung Screening Trial is randomising high-risk individuals in Leeds for community-based LDCT screening using mobile units or no screening and the SUMMIT study is a single-arm demonstration study assessing the feasibility of delivering LDCT screening in North Central and East London.11–13
One challenge for implementing LDCT screening is the substantial proportion of screenees needing further investigation or more intensive surveillance for abnormal findings detected through screening that ultimately transpire to be harmless. Lung nodules are a particularly common finding. Across five UK lung cancer screening programmes, 15% of screening participants had nodules detected on their baseline scan which needed surveillance for 2 years using repeat LDCT imaging, but less than 3% of participants undergoing screening were diagnosed with screen-detected LC.14 Further investigation and surveillance for individuals with low immediate risk for lung cancer exposes individuals to potential physical harm through cumulative radiation exposure from follow-up scans, as well as risk of psychological harm as the individual is put in a prolonged period of uncertainty and stress.15–18 The increased demand surveillance places on staff and imaging capacity also causes unnecessary cost for the health system.
LDCT screening programmes follow established guidelines for the investigation and management of lung nodules, which are designed to reduce unnecessary surveillance of low-risk nodules. For example, the British Thoracic Society’s 2015 guidelines are followed by the majority of UK programmes.19 However, these guidelines cite evidence published predominantly between 2010 and 2015, before the surge in publications on screen-detected nodules and some of the studies cited are on incidentally detected nodules rather than screen detected nodules. According to the guidelines, the risk of malignancy of a nodule and follow-up decisions are based on factors including radiological characteristics and the use of the Brock model. The recommendation is that the same management approach is used regardless of the route to diagnosis (screening or incidental). However, this approach might be problematic because individuals with screen-detected nodules are likely to have a different risk profile for LC than individuals with incidentally detected nodules. The guidelines are currently being revised, and the new version is anticipated in 2023 or 2024 and will incorporate the latest published evidence on screening populations. When more than one scan is available, management of nodules is to a large degree dependent on nodule growth.20 However, management of nodules based on factors known at the time of a single screen is less straightforward.
Identifying risk factors for lung cancer which are known at the time of screening could reduce the number of individuals needing to enter the surveillance process at all, thereby reducing the associated physical, psychological and resource implications. Although most studies to date evaluate risk factors for malignancy in specific screening-detected lung nodules, this study aims to investigate factors known at the time of screening, including nodule characteristics, which are predictive of lung cancer in the individual at follow-up, rather than predictive of a specific nodule containing or progressing to malignancy. This choice is based on the implications for patient management during participation in lung cancer screening and identified as a priority during discussions with respiratory physicians suggesting that this outcome will have more clinical relevance.
ObjectiveThe objective of this systematic review is to identify risk factors for lung cancer in individuals who are undertaking LDCT screening and are found to have at least one lung nodule in one of their routine screening scans. Only factors that are known at the time of the routine screening scan and are related to the lung nodules or to the screenees’ characteristics will be considered. The association of the factors with subsequent diagnosis of lung cancer will be assessed. The objective of the meta-analysis is to evaluate the unadjusted predictive capacity of the risk factors for lung cancer in these participants on a per-participant basis, as opposed to risk models that predict malignancy on a per-nodule basis. The subsequent development of a multivariable model will be the subject of a future project.
For the individual participant data (IPD) meta-analysis process, individual participant data will be used where available. For the studies for which individual level data are not available, aggregated data from publications will be extracted where possible.
Methods and analysisA multidisciplinary steering group consisting of researchers and clinicians who are active in the field of lung cancer screening has been assembled to advise the core study team. This includes statisticians, radiologists, respiratory clinicians and a behavioural scientist with expertise in cancer screening. The group will provide guidance and advice throughout the whole process.
Information sourcesA systematic search of publications will be performed on the MEDLINE and EMBASE databases using the syntax described in online supplemental table S1). The results will be stored on EndNote. One investigator (PA) will screen studies for titles that should be excluded. The same investigator will then screen studies and exclude them based on abstract. In addition, three other investigators (RG, SD, SQ) will independently screen one-third of the abstracts each. The results will be compared, and abstracts will be included if any one of the four investigators have selected them. The full texts of potential articles will then be screened for eligibility by the first investigator (PA) with any ambiguity around inclusion resolved by discussion with the full investigator screening team.
Additional sources for individual data suggested by the steering group and other experts will also be considered. Also, backward reference searching will be performed.
For each cohort identified in the above process, publications containing extractable data will be identified where possible. A Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) IPD flow diagram will be developed to describe the systematic search and paper screening processes.
Eligibility criteriaPeer-reviewed articles will be included if:
They are on cohorts or mentioned cohorts of screenees who were found to have a lung nodule in one of their scheduled screens.
They are published between 1 January 2000 and 30 August 2024.
They are published in English
They report or mention cohorts who have collected numbers or incidence of LC in the participants who are found to have screen detected lung nodules.
They are on individuals with nodules detected as a result of a screening CT scan rather than a diagnostic examination prompted by consultation for symptoms.
IPD cohorts will be identified if they are mentioned not only in the included articles but also through other means (eg, suggestions by collaborators).
If an IPD cohort mentioned in an article is not obtainable, then aggregate data from that article will be extracted where possible.
Articles will be excluded if:
They are on modalities other than LDCT
They are based on predicted or simulated data only.
They are on symptomatic individuals or a non-screening context.
It is anticipated that a major contribution to this evidence base will be the intervention groups of randomised controlled trials.
Eligibility criteria apply across studies regardless of whether IPD or aggregate data will be extracted. However, studies with larger population sizes will be prioritised for obtaining IPD. All studies from the systematic review will be included and aggregated data will be extracted when available for inclusion in meta-analysis.
Data extraction and managementIPD on individuals with screen-detected nodules will be obtained. Data from cohorts of studies and programmes that do not have published results will also be considered. If IPD are not available, then aggregate data will be extracted from publications, if possible.
References will be imported into EndNote for publication management and Excel. Excel will be used to track whether publications are relevant when screening titles and abstracts. Also, Excel will be used to record information extracted from each study included according to the data extraction form.
Individual participant dataMeta-data from each cohort will be requested directly from the authors of papers identified during the systematic review (online supplemental table S2a). Furthermore, a form describing the requested variables will be shared with all data providers. A cohort will not be excluded if some of the variables are not available. The data will consist of data from the scan, data on the most suspicious nodule of the scan, personal data of the screenee (including spirometry data) and outcome data (online supplemental table S2b).
The outcome will be defined as a lung cancer diagnosis at any time during the screenee’s follow-up. Histology and stage of lung cancer will be collected as secondary outcomes. Temporal data will be collected if available and used to perform survival analysis.
The sharing of individual data will be governed by a data sharing agreement between the lead institution and each of the data providers.
The datasets obtained will be formatted and checked by one of the investigators (PA), and any inconsistencies will be dealt with in collaboration with the data provider.
Aggregate data from publicationsFor the published studies for which individual data cannot be obtained, aggregated data will be extracted (online supplemental table S3).
Assessment of the quality of studiesThe Prediction model Risk Of Bias ASsessment Tool (PROBAST) will be used for assessment of quality of the studies.21 This tool will be used to assess the risk of bias in study conduct and analysis and the study’s applicability based on the participants, the predictors and the outcomes. Three investigators (PA, SQ and SD) will independently make the assessment for the first ten studies. Then one investigator (PA) will assess the risk for bias for remaining the studies. Finally, the second investigator (SQ) will assess the risk for a subsample of the remaining studies.
AnalysisIPD meta-analysis: main analysisA one-stage approach will be followed in the IPD meta-analysis. The IPD from all cohorts will be formatted and combined into a single dataset. In the main analysis, data from the earliest available screening scan for each individual will be used. Univariate logistic regression will be performed on each of the collected variables as the primary analysis. This will be augmented by time-to-event analyses if sufficient temporal data (ie, time from the LDCT scan when the nodule was detected to diagnosis of LC) is available from the IPD datasets. Clustering will be accounted for, by having separate intercept terms for each cohort, along with random effects for the slopes to allow for variability in effects of risk factors between cohorts, as indicated by significant heterogeneity. For the time-to-event analysis, univariate Cox models with shared frailty will be employed used to account for random effects.
The same analysis will be carried out for individuals with prevalent screening scans and incident screening scans separately. In the incident screening scan analysis, data from the first available incident screening scan where a nodule was detected will be used.
Aggregate data meta-analysisThe effects of the risk factors will be extracted from published papers that have been identified during the paper screening process. Where the effects are not reported, they will be calculated, where possible, using reported figures. All effects, whether directly extracted or calculated using published figures, will be converted into ORs and meta-analysed using random effects models. There is a potential publication bias issue, in that some factors with no association with cancer in the original studies may not be reported. For this reason, it will be important to check consistency of IPD and aggregate results. A sensitivity analysis will be performed to investigate the degree to which using the outcome of malignancy in the nodule instead of the outcome of lung cancer in the screenee influences the effects extracted.
Comparison of IPD and aggregate data effectsThe effects from the IPD meta-analysis will be the main finding of the study and will be compared with the results from the aggregate data meta-analysis. IPD results are not anticipated to be subject to publication bias as we will have data on all variables whether significant or not. Major differences between aggregate and IPD results may indicate publication bias in the former. Tables of characteristics of the studies contributing either aggregate or individual data, risk factors identified and their individual and pooled effects will be presented.
Governance issues in evidence synthesisThe overall strength of the body of evidence for each risk factor will be assessed by four investigators (PA, SD, SQ, RG) using the Grading of Recommendations, Assessment, Development, and Evaluations (GRADE) criteria for guidance.22
The PRISMA 2020 statement will be followed for reporting.23
This study started in October 2021 and will be completed in December 2024.
Strategy for data synthesisHeterogeneityObservational studies can display heterogeneity for a number of reasons.24 Quite commonly, different studies may measure the factor of interest by different means. In the case of lung nodules, some studies will have automatic volumetric measurement of size; others will have two-dimensional human measurement often summarised as maximum diameter. To address the issue of varying means of measurement, all effects from IPD and non-IPD studies will be synthesised where possible, with appropriate transformation of some measures (see section Differences in coding of data), but heterogeneity of results will be assessed by different methods (eg, interaction tests or the I² statistic) and subgroup results will be presented if necessary. Furthermore, the population followed up in different observational studies may be very different in terms of demographics or risk of the outcome. In a meta-analysis of RCTs, the populations tend to have common features in that all the subjects have the disease under study and are in principle suitable for the therapy being researched. Lung cancer screening trials and projects all have their own eligibility criteria, which may differ markedly. For example, in some parts of East Asia, LDCT screening is offered to lifelong non-smokers, which would be very unusual in a European or North American study.25 In addition, studies may differ significantly in follow-up periods, method of determination of the disease outcome and whether the outcome was the original study endpoint or an incidental one. Hierarchical models with random effects for population factors will be used in such cases. However, it should be noted that the inclusion criteria of a nodule or nodules detected by screening may confer a degree of homogeneity.
Differences in coding of dataDifferences in recording or coding information in observational studies are common. For example, some studies may record nodule diameter and others may record volume. If such is the case in the IPD collected, assumptions will be made (such as a cubic relationship between diameter and volume). Furthermore, it is likely that in the aggregate data meta-analysis different studies will report different kinds of effects (such as relative risks, Poisson regression coefficients, ORs). The effects of risk factors from all studies will be converted to ORs or HRs (depending on the completeness of time-to-event data) for each study and for each risk factor for the purposes of pooling.
Patient and public involvementOne identified limitation of the study is the absence of patient and public involvement in the current phase. Although respiratory clinicians and radiologists actively contribute to the collaborative team advising the primary project, the engagement of patients has not been initiated thus far. Recognising the importance of incorporating diverse perspectives, the Early Diagnosis Team within the Centre for Prevention, Detection and Diagnosis, where the research is based, is in the process of assembling a panel comprising up to 15 public and patient representatives. This panel will play a pivotal role in advising on the formulation, interpretation and dissemination of the centre’s studies, with a specific focus on early cancer detection and diagnosis. The selection process ensures a varied representation of experiences with different cancer types, including lung cancer, encompassing a spectrum of skill sets and backgrounds. Involvement from this panel will be sought in two key activities: (1) interpreting the study findings and (2) disseminating the results across public forums and scientific audiences in a manner that prioritises and reflects the perspectives of patients.
BiasesDifferent types of observational studies are known to be susceptible to certain biases. While this project includes only prospective studies and is therefore less prone to bias than retrospective case-control studies, they are not guaranteed to be completely free of bias. These may include selection bias, information bias, measurement error, confounders and differing rates of study participation depending on the subjects’ cultural background, age or socioeconomic status. When dealing with smoking data, there is the possibility of social desirability bias which could affect variables related to smoking history with under-reporting of smoking.26 The problem can be approached by quantifying the biases and correcting for them. Repeated measures can be particularly useful in the quantification of and correction for bias caused by mismeasurement or misclassification.27–30 While we do not anticipate repeat measures data in the cohorts in this meta-analysis, independent repeated measures results are available for example in observer variability studies on nodule patients.31–33
The metan, admetan and ipdmetan suites of commands in Stata will be used to provide regression-based tests and funnel pooled estimates and forest plots in order to assess meta-biases (eg, publication bias).
Ethics and disseminationEthical approval was not required as this study is a secondary analysis where precollected data will be used from participants in studies that have consented to the sharing of their data.
The results of the study will be disseminated through publication in peer-reviewed journals, targeting professionals who influence policy and protocols for managing screenees with nodules. Furthermore, the findings will be presented at relevant scientific conferences to engage experts in the field.
DiscussionThe main aim of this systematic review and IPD meta-analysis is to identify risk factors for LC, either now or in the future, in the subcohort of screenees who are found to have nodules. This initial univariate meta-analysis stage will inform the development of a future multivariable risk model. By better understanding the risk profile of screenees with nodules, both short and long term, it will be possible to better classify those who should have immediate further investigation or surveillance and those who may be left without any follow-up until the time of their next scheduled screening scan.
Defining the outcome as LC in the patient adds an element of novelty to the study, as most existing models predict the risk of malignancy in a specific nodule (eg, the Mayo Clinic Model, the Veterans Administration model and the Brock model). 34–36
As with all meta-analyses, we are limited by the availability of the data of relevant cohorts. It is expected that in some publications, figures for the outcome of malignancy of the nodule will be reported, while in others, figures for the outcome of lung cancer in the screenee will be reported. The degree to which that affects the effects extracted will be assessed by performing a sensitivity analysis. This is a potential weakness of the aggregate data meta-analysis process.
The current study will focus solely on identifying individual risk factors. However, ultimately, this meta-analysis could inform the basis of new risk prediction models which minimise the number of individuals that need to go into surveillance following lung cancer screening. The IPD obtained can be used to train and validate such multivariable models, while the risk factors identified can guide the strategy for the building of those models. These models would need to take into account the nature of the screen, such as being first or a repeated screen, and risk at future time points aligned with screening intervals. The implications of the results for practice in terms of surveillance policy will vary according to the interscreening interval in use.
Ethics statementsPatient consent for publicationNot applicable.
留言 (0)