Lung function indices are important in the prognostication of patients with interstitial lung disease. They also form the premise of inclusion criteria for clinical trials. However, these lung function indices were based on the European Coal and Steel Community (ECSC)/Miller reference equations.
WHAT THIS STUDY ADDSCompared with the ECSC/Miller reference equations, the updated Global Lung function Initiative (GLI) reference equations led to greater diffusing capacity of the lungs for carbon monoxide (DLCO) readings and smaller lung volume readings, overall and when stratified by diagnosis, sex and ethnicity.
HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICYUsing the updated GLI reference equations, fewer patients will qualify for antifibrotic agents based on existing clinical trial inclusion criteria. However, risk prediction models consistently predict mortality with the new GLI reference equations.
IntroductionInterstitial lung disease (ILD) encompasses a heterogeneous group of diffuse parenchymal lung diseases that have variable disease behaviour patterns and prognoses.1 2 Worldwide prevalence of ILD has risen over the last decade likely related to improved disease awareness and earlier diagnosis.3–8 The high morbidity and mortality associated with ILD is associated with substantial healthcare costs and financial burden.9
The approach to diagnosis, prognostication and management of ILD is multifaceted, with lung function evaluation playing an invaluable role.10 Apart from the role that lung function testing plays in the initial diagnosis, it has additional prognostic value among patients with ILD, both at baseline as well as longitudinally.10–12 Poor baseline forced vital capacity (FVC) and diffusing capacity of the lungs for carbon monoxide (DLCO) both predict increased mortality among patients with ILD.12 13 Longitudinally, the measurement of serial lung function indices helps determine disease progression.11 14 These indices have also been incorporated into risk prediction models for ILD such as the Composite Physiologic Index (CPI)15 and ILD-GAP (Gender, Age, Physiology) index.16 In addition, lung function indices, including FVC and DLCO, have served as inclusion criteria for ILD clinical trials. FVC change is now accepted by the US Food and Drug Administration as a valid surrogate end-point.17–20
Changes introduced in a 2022 revision of the technical standards in lung function interpretation have several potential implications for ILD management.21–23 First, the new severity classification advocates using Z-scores rather than percentage predicted values used in the 2005 guidelines.24 Second, a new multi-ethnic reference equation, derived from the average of four other Global Lung Function Initiative (GLI) 2012 reference populations, was also suggested.25 26 However, incorporating race into these prediction equations remains controversial. The observed differences may be related to systemic differential health, social and environmental exposures rather than true genetic differences.27 Bowerman et al eventually introduced the race-neutral GLI reference equation for spirometry (GLI Global) in 2023 to reduce the racial bias inherent in the traditional reference equations such as the European Coal and Steel Community (ECSC) reference equation.25 The GLI Global reference equations have since been officially endorsed by the American Thoracic Society and the European Respiratory Society despite the recognised limitations. Of note, the GLI equations for lung volumes and DLCO (GLI-2017) were validated among people of white ancestry only.23 28 All reference equations have inherent limitations, and any changes may impact the percentage predicted values of the various lung function indices with potential clinical implications for eligibility for clinical trials involving antifibrotic agents, assessment of disease progression and risk prediction models.
We sought to evaluate the impact of the GLI reference equations and the 2022 lung function technical updates for patients with ILD, with respect to changes in (a) differences in lung function percentage predicted values; (b) eligibility for clinical trial participation and (c) ILD risk prediction models.
Materials and methodsStudy cohortsThis observational, prospective cohort study comprised patients with ILD enrolled in two separate registries—the Australasian ILD registry (AILDR)29 and the National Healthcare Group ILD registry from Singapore (NHGILDR). Patients aged >18 years from these registries who had a baseline lung function test were eligible for inclusion. Patients were excluded when (1) demographic details, including age, ethnicity and height, were missing, (2) baseline FVC was unavailable or (3) patients were determined not to have ILD after discussion at local multidisciplinary meetings (MDM). Missing forced expiratory volume in 1 s (FEV1), total lung capacity (TLC) and DLCO results did not preclude these patients from being included in the study.
Data collectionWe compared the GLI Global reference equation to the ECSC reference equation for FVC and FEV1, and the GLI-2017 reference equation to the Miller reference equation for DLCO. Both ECSC and Miller equations have been well validated in large data sets and shown to perform adequately in comparison to other reference equations among patients with ILD.30–32 The absolute volumes of lung function indices (FVC, FEV1, TLC and DLCO) were collected. To derive the reference values based on the above equations, demographic parameters (age, sex, self-reported ethnicity and height) were also collected. Reference values were computed based on the existing ECSC,30 Miller31 and GLI reference equation formulas.21 23 28 Smoking status was also determined.
ILD diagnosis was based on the consensus diagnosis at ILD MDMs. Where no MDM diagnosis was made, the diagnosis was based on the local treating ILD physician’s clinical diagnosis. Vital status was collected and censored at 200 months or 31 May 2023, whichever was earlier.
Statistical approachDescriptive analyses, including demographics and lung function indices, were expressed as categorical variables as frequencies (percentages) and continuous variables as medians (IQR) or means (SD). We compared categorical variables with the χ2 test or Fisher’s exact test and continuous variables with one-way analysis of variance and linear regression analysis. Differences between the percentage predicted values derived from the ECSC/Miller reference equations and that of the GLI reference equations were determined overall and stratified by diagnosis, sex and ethnicity.
We also compared the severity classification for individuals using the GLI reference equations against severity with the ECSC reference equation. GLI severity classification has four categories for all lung function indices: normal, mild, moderate and severe, corresponding to Z-score cut-offs of >−1.645 to, –1.65 to −2.5 (1 SD), −2.51 to −4 (2 SD) and <−4.1 (3 SD), respectively.21 In contrast, ECSC severity classification for FEV1 has five linear-scaled categories: mild, moderate, moderately severe, severe and very severe, corresponding to percentage predicted cut-offs of >70%, 60%–69%, 50%–59%, 35%–49% and <35%, respectively.24 For this evaluation, the ECSC FEV1 severity classification criteria was extrapolated to FVC. Severity classifications for FVC and FEV1 were considered matched in the following permutations: (a) GLI normal and mild categories corresponded to ECSC mild category, (b) GLI moderate category corresponded to ECSC moderate and moderately severe categories, and (c) GLI severe category corresponded to ECSC severe and very severe categories. Separately, DLCO severity classification has three categories: mild, moderate and severe, corresponding to >60%, 40%–60% and <40%, respectively.24 Severity classifications for DLCO were considered matched in the following permutations: (a) GLI normal and mild categories corresponded to Miller mild category, (b) GLI moderate category corresponded to Miller moderate category, and (c) GLI severe category corresponded to Miller severe category.
Based on the participant’s ILD diagnosis, we also computed the number of patients who would fulfil the inclusion criteria used in recent multinational clinical trials evaluating nintedanib versus placebo in fibrotic lung disease. For idiopathic pulmonary fibrosis (IPF) patients, we used inclusion criteria from the INPULSIS study, that is: FVC >50% and DLCO 30%–80%.19 For non-IPF ILDs, we used the inclusion criteria from the INBUILD study: FVC >45% and DLCO 30%–80%.20
For the risk prediction models, the values imputed for the lung function indices were based on either the GLI or ECSC reference equations. Thereafter, time-to-mortality models were constructed using Cox proportional hazard models for comparison. In the ILD-GAP index, the lung function indices (FVC and DLCO) were categorised into three levels of severity initially using the same linear cut-off points used in their derivation.16 We also attempted an exploratory analysis to replace the categories with the new Z-scores and generated the receiver operator curves to compare the two versions of ILD-GAP index.
Analyses were performed with STATA statistical software: Release V.15, StatCorp.
ResultsBaseline characteristics of the study populationA total of 2219 patients were studied (table 1), including 1265 (57.9%) males, 1712 (77.2%) white individuals and 404 (18.2%) Asian subjects. The median age was 68 (58–75) years, with 1053 (47.5%) patients being ex-smokers or current smokers, and 426 (20.4%) recorded deaths during the follow-up period. The three most common diagnoses were IPF (n=636, 28.6%), connective tissue disease-associated ILD (n=457, 20.6%) and unclassifiable ILD (n=245, 11.0%). Patients with non-IPF ILD were younger, more likely Asian and had lower mortality. Asians were shorter, more likely to have underlying non-IPF ILD and had a higher rate of mortality (online supplemental table 1).
Table 1Baseline characteristics of study population
Median FVC was 2.60 (2.01–3.36) L, FEV1 was 2.09 (1.67–2.66) L and DLCO was 13.60 (10.16–17.60) mL/min/mm Hg (table 2). 67 DLCO and 143 TLC readings did not have percentage predicted values based on the GLI reference equations due to exceeding the age limit of 85 and 80 years old, respectively. Despite this, the GLI reference equations consistently demonstrated greater DLCO percentage predicted values and lower lung volume percentage predicted values when compared with ECSC. As shown in table 2, the mean FVC percentage predicted was 7.62% (4.50%–12.68%) lower using GLI while the mean DLCO pp was 3.86% (1.85%–6.20%) higher, which were all statistically significant (p<0.001). This finding remained consistent when stratified by diagnosis, sex and ethnicity, though differences were larger for FVC in females and those of Asian ancestry. These findings were also reflected when assessing severity classification. The discordant rates between the new GLI and ECSC/Miller severity classifications were 207/2219 (9.3%), 222/2215 (10.0%) and 140/1918 (7.3%) for FVC, FEV1 and DLCO, respectively (table 3). This remained consistent when stratified by ethnicity and sex (online supplemental tables 2–4).
Table 2Spirometry indices
Table 3Rate of disagreement of lung function indices severity between ECSC/Miller and GLI reference equations
Eligibility for nintedanib clinical trialsWhen both reference equations were applied to antifibrotic clinical trial inclusion criteria,19 20 there was a decrease in the number of patients who qualified for the nintedanib clinical trials when the GLI reference equations were applied (table 4). This relationship remained largely consistent when stratified by ethnicity among those with non-IPF ILD. Among non-IPF ILD patients, 1050/1413 (74.3%) met the inclusion criteria when applying the ECSC/Miller reference equations while 889/1377 (64.6%) met the inclusion criteria using the GLI Global reference equations. These findings were similarly reflected among the IPF group, where 463/572 (80.9%) and 412/541 (76.2%) met the inclusion criteria when using the ECSC/Miller and GLI reference equations, respectively. Although there was no difference in eligibility among the Asian IPF patients, the number of patients analysed was small (n=76).
Table 4Comparison of reference equations in fulfilling study inclusion criteria for INPULSIS and INBUILD studies
When using the GLI reference equations, 3% and 1.5% of patients based on DLCO and FVC criteria, respectively, did not meet criteria for inclusion in the nintedanib clinical trials for IPF compared with ECSC/Miller equations. In those with non-IPF ILD, there was an additional 6.5% and 2.8% of patients who did not meet DLCO and FVC criteria respectively when using the GLI reference equations. This translates to 19 IPF patients and 119 non-IPF ILD patients who would no longer qualified for the nintedanib clinical trials.
Risk prediction modelsRisk prediction models performed similarly in predicting mortality using both reference equations. When either of the reference equations was applied, the HR of the ILD-GAP index was 1.56 (1.36–1.79, p<0.001) (figure 1). The findings were similar with CPI. The HR for a CPI threshold of 40 was 2.23 ((1.77–2.80), p<0.001) (figure 2) when either of the reference equations was applied.
Figure 1Kaplan-Meier survival curve for ILD-GAP index (GLI). GAP, Gender, Age, Physiology; GLI, Global Lung function Initiative; ILD, interstitial lung disease.
Figure 2(A) Kaplan-Meier survival curve with CPI (GLI) cut-off of 40. (B) Kaplan-Meier survival curve with CPI (ECSC) cut-off of 40. (C) Kaplan-Meier survival curve with CPI (GLI) cut-off of 50. (D) Kaplan-Meier survival curve with CPI (ECSC) cut-off of 50. CPI, Composite Physiologic Index; ECSC, European Coal and Steel Community; GLI, Global Lung function Initiative.
In an exploratory analysis, we replaced the FVC and DLCO severity categories in the ILD-GAP index with Z-scores. When comparing its performance with the original ILD-GAP index using the ECSC/Miller reference equations, there was no difference in the area under the curve (0.676 vs 0.677, p=0.899) (figure 3).
Figure 3Area under the curve of ILD-GAP index comparing ECSC and z-scores severity values. AUC, area under the curve; ECSC, European Coal and Steel Community; GAP, Gender, Age, Physiology; ILD, interstitial lung disease.
DiscussionWhen compared with the ECSC/Miller reference equations, applying the GLI reference equations among patients with ILD consistently demonstrated higher DLCO percentage predicted values (mean 4.9% higher) and smaller lung volume percentage predicted values (mean 8.8% lower FVC) overall and when stratified by diagnosis, sex and ethnicity. This led to disagreement rates in severity classification of 9.3%, 10.0% and 7.3% for FVC, FEV1 and DLCO, respectively. Interestingly, the GLI reference equations classified many patients to have ‘normal’ lung function (n=1291), despite all these patients having a confirmed ILD which may potentially lead to under-recognition of disease. Importantly, applying the GLI reference equations translated to fewer patients qualifying for nintedanib clinical trials, and therefore, treatment in some countries. Despite these differences, risk prediction models were similarly efficacious in predicting mortality with the GLI approach.
Applying the latest lung function interpretative strategies has raised concerns about the impact of disagreements in the thresholds advocated between the two interpretation guidelines.21 24 Among obstructive airway diseases, studies have already shown that different interpretative strategies may give rise to very different impressions of disease prevalence and distribution, thus impacting public policy planning.33 34 While the severity classification disagreement rate among patients with ILD was low, the discordant rate remains at 11.3% and 7.3% for FVC and DLCO, respectively. Discordance regarding DLCO values between Miller and the GLI-2017 reference equation is well known, as Miller reference values were based on data collected prior to the standardisation the DLCO test technique. In addition, the GLI-2017 reference equation relied on data after correction for equipment dead space and test altitude above mean sea level and used more complex statistical analyses.35 Importantly, 1291 patients were classified as having normal lung function using the GLI reference equations, despite the study being conducted among known patients with ILD. This normal category, which did not exist in the ECSC classification, may have significant implications in the recognition of early disease as it may give physicians a false sense of reassurance. Importantly, this also re-emphasises that the diagnosis of patients with ILD should not rely on lung function alone but should also incorporate clinical and radiological considerations.
Despite the inherent variabilities in lung function interpretation due to the dependency on patient’s effort, patient’s technique and reference equations applied,21 there remains a lack of other objective and easily measurable outcomes that consistently demonstrate progression and mortality.11 17 In many countries, the prescribing criteria for antifibrotic agents is thus based primarily on lung function indices.18–20 Some countries also rely on these lung function study inclusion criteria to formulate policies for financial claims.31 Based on our population, the increase in patients with ILD failing to qualify was due to these patients having DLCO >80% with the GLI reference equations. The lower FVC percentage predicted values may also lead to further decrease in patients meeting study inclusion criteria. However, we did not demonstrate this as most of our patients suffered from mild disease at recruitment.
It is also reassuring that the risk prediction models—ILD-GAP index and CPI—performed equally well with both reference equations despite the differences in severity estimation interpretation. This reiterates that the prognostic assessment of patients with ILD is a multidimensional one and lung function is only one part of an overall comprehensive assessment. As the world gradually gravitates to using z-scores for severity assessment with the 2022 technical update,21 this provides further validation of the utility of z-scores in providing accurate severity assessment while accounting for population variability.
Aside from the 2022 technical updates, the development of the race-neutral reference equations also requires further validation.25 The rationale behind the development of the GLI reference equations was to account for other factors that may influence normal values, such as pollution exposure, perinatal events, socioeconomic status and genetic associations.22 Furthermore, the original GLI reference populations comprised mainly healthy white individuals. The GLI lung volume and DLCO values are currently based on only European ancestry data.21 23 28 Several studies in Asian populations have already provided mixed results about the appropriateness of these reference populations for FVC and FEV1.36–38 As far as we are aware, our study is the first to determine the impact of the GLI reference equations, a hitherto proposed construct, among the ILD population, rather than a healthy population.
The strength of our study lies in our large multi-ethnic study population of patients with ILD with a wide range of ILD conditions. We were also able to validate the GLI reference equations within the largest Asian ILD cohort to date. We acknowledge, however, that other ethnic groups, including First Nations people, were under-represented in this analysis as well as in the reference equation populations, limiting the applicability of findings more broadly. Our study population also comprised chiefly of patients with ILD at the milder end of disease severity. This could be due to several factors, including increasing incidental discovery of ILD prior to symptom onset with CT scanning for other indications. Also relevant is the growing trend towards early, proactive recruitment of patients with ILD to registries through specialist clinics, allowing for timely institution of disease-modifying interventions, along with population-level research opportunities.
ConclusionApplying the GLI reference equations among patients with ILD consistently demonstrated higher DLCO percentage predicted values and smaller lung volume percentage predicted values when compared with the ECSC/Miller reference equations. Despite the population being predominated by white individuals, the findings remained consistent when stratified by diagnosis, sex and ethnicity. Despite this, risk prediction models for patients with ILD still consistently predicted mortality even with the changes in severity classification and reference equations. However, this translated to fewer patients meeting criteria for inclusion in clinical trials of nintedanib.
Data availability statementData are available on reasonable request.
Ethics statementsPatient consent for publicationNot applicable.
Ethics approvalThis study involves human participants and was approved by AILDR, Sydney Local Health District Research Ethics and Governance Office, Protocol No. X16-0275 and 2019/ETH06440 'The Australasian Interstitial Lung Disease Registry' NHGILDR, National Healthcare Group Domain Specific Review Board 2019/00894. respectively. Participants gave informed consent to participate in the study before taking part.
留言 (0)