Diagnostic yield and safety of diagnostic techniques for pulmonary lesions: systematic review, meta-analysis and network meta-analysis

Introduction

Lung cancer, which is a leading cause of cancer morbidity and mortality, is still diagnosed at an advanced stage with a 5-year overall survival rate of only 10–33% globally [1, 2]. Screening for lung cancer with low-dose computed tomography (CT) has demonstrated survival benefits in high-risk populations [3]. Around 1.6 million pulmonary nodules have been detected by CT scans as of 2010 in the United States and the incidence continues to increase each year with an extended screening of lung cancer with low-dose CT [4].

Sampling of the lesion via a nonsurgical approach is indicated for patients with intermediate to high-risk pulmonary nodules (pre-test probability of cancer >5%) and is usually performed through bronchoscopy techniques or transthoracic needle biopsy under CT guidance [5]. Traditionally, CT-guided transthoracic needle biopsy or needle aspiration (CT-TBNA) was preferred for peripheral and smaller lesions with a reported diagnostic yield of >90% [6, 7]. With the advent of enhanced bronchoscopic navigational tools, including radial endobronchial ultrasound (r-EBUS), virtual bronchoscopy (VB), electromagnetic navigation (EMN) and robot-assisted bronchoscopy (RAB), it is now more feasible to sample peripheral pulmonary lesions. The bronchoscopic diagnostic yield for peripheral pulmonary nodules has increased to 84% with RAB [8, 9]. Furthermore, sampling via bronchoscopy comes with the added advantage of performing hilar and mediastinal lymph node staging via linear endobronchial ultrasound in the same setting.

Given the rapid emergence of newer technologies within the last 10 years, comprehensive and consolidated evidence comparing the different diagnostic techniques is needed to assess the most appropriate sampling method, considering the risks and benefits of each procedure. We performed a meta-analysis and a network meta-analysis comparing the diagnostic yield and complications of percutaneous CT-guided transthoracic and guided bronchoscopic techniques (rEBUS, VB, EMN and RAB) for the biopsy of peripheral pulmonary lesions (PPLs) suspected of lung cancer.

Materials and methodsSearch strategy and selection criteria

A comprehensive and highly sensitive electronic search was performed in Medline, Scopus, Embase and Web of Science by review authors (P.B. and G.L.) using search terms as mentioned in table S1). The initial search was performed on 20 June 2023 and the search was updated on 27 October 2023. In addition, the abstracts presented at conferences from 2015 to 2023 at the American Thoracic Society, American College of Chest Physicians, European Respiratory Society and American Association for Bronchology and Intervention Pulmonology were searched in Google Scholar. We included observational studies, case series greater than five patients and randomised controlled studies published in the English literature which included patients who underwent biopsies for PPL suspected of lung cancer and reported an overall diagnostic yield of the procedure reported. Case reports, case series of fewer than five patients and studies that did not include data on the diagnostic yield were excluded from our meta-analysis. We screened the studies using the software Covidence (Melbourne, VIC, Australia). Titles and abstracts of all the identified studies were reviewed independently by two reviewers (A.B. and A.G.) and disagreements were resolved by discussion. The full text of the selected articles was reviewed by the same reviewers and any disagreements were resolved by discussion. We adhered to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA), PRISMA for Network Meta-analyses (PRISMA-NMA) and Meta-analysis of Diagnostic Test Accuracy Studies statement [1012]. Our study protocol was registered in PROSPERO (www.crd.york.ac.uk/prospero/, reference ID CRD42023432829).

Data extraction and quality assessment

The following data were collected: first author, publication year, publication type (retrospective or prospective or randomised control trial), number of participants, index test, additional techniques, number of techniques, diagnostic procedures and total number of procedures. Data for subgroup analysis included the data on diagnostic procedures and the total number of procedures among malignant nodules, benign nodules, size of nodules (<2 and >2 cm), nodule density (solid or subsolid), use of cryobiopsy and presence of bronchus sign. For simplicity, the main diagnostic procedures were divided into the following subgroups: 1) CT-TBNA, 2) rEBUS (with or without guide sheath but no other additional techniques, including VB, EMN or RAB), 3) VB (with or without rEBUS), 4) EMN (with or without rEBUS) and 5) RAB (with or without rEBUS). Data on complications including pneumothorax, pneumothorax requiring a chest tube and clinically significant bleeding (as defined by the Common Terminology Criteria for Adverse Events ≥2 or pulmonary bleeding on the CT scan with haemoptysis) were collected. The data was initially collected by primary reviewers (A.B., A.G., A.K. or N.C.) and then reviewed by a secondary reviewer different from the primary reviewer in the same team. A senior reviewer (P.B.) performed a final check of the data and resolved conflicts. Among studies from the same author(s), the most recent publication with the larger sample size was included. The quality of the data of single-arm studies was assessed with the QUADAS-2 tool and the same for comparative studies was assessed with the QUADAS-C tool, modified for our study as mentioned in supplementary appendix 2 [13, 14]. This was collected by both primary and secondary reviewers with P.B. resolving any conflicts.

Data analysis

The diagnostic yield was defined as the number of diagnostic procedures over the total number of procedures. A procedure was considered diagnostic if 1) a definite diagnosis of malignancy or specific benign aetiology, such as granuloma and hamartoma, was made or 2) a nonspecific result such as inflammation with resolution of the lesion in the serial follow-up imaging and/or consistent with the results of a second diagnostic procedure including sampling or surgery. The overall diagnostic yield as well as diagnostic yield over different diagnostic modalities were calculated using a univariate analysis model after logit transformation [15].

The Tau-square (τ2) statistic was used to assess between-study heterogeneity by the variance of the true effect size. The I2 statistics, which represent the ratio of total heterogeneity to total variability, were also calculated. I2>75% was considered as high-heterogeneity and I2<25 as low-heterogeneity [16]. The heterogeneity of the diagnostic yield of each modality of sampling was further addressed with the assessment of outliers studies using a Baujat plot, calculation of diagnostic yield by excluding outlier studies with z-scores more than two as calculated by externally studentised residuals, assessment publication bias via funnel plot and Egger's regression model. The funnel plots were contour enhanced with trim and fill along with imputed studies to identify potential missing articles that could represent reporting bias. Finally, a meta-regressional analysis was performed to identify potential moderators influencing the heterogeneity including year of publication, type of the study, studies with inclusion bias by including only malignant nodules or only solid nodules, nodule size and number of nodules [17].

A pairwise meta-analysis was performed on the comparative studies with data on diagnostic yield over two different diagnostic modalities (CT-TBNA, rEBUS, VB, EMN or RAB) and was compared with relative risk. We then performed a network meta-analysis among studies presenting comparative data with a frequentist method based on a random effects consistency model with the provision of relative risk and 95% confidence intervals from the frequency distribution of the estimate [18]. A network plot was also performed to graphically compare the treatment groups. The overall ranking of each diagnostic modality relative to each other in the network model was calculated using the p-score [19].

The certainty of evidence and conclusion of the network meta-analysis model was evaluated following the Grading of Recommendations Assessment, Development and Evaluation (GRADE) approach by the authors P.B. and G.L. [20, 21]. The complications with each diagnostic procedure were represented individually using a proportion test and 95% confidence interval. Subgroup analysis including the diagnostic yield of benign versus malignant lesions, nodule size (<2 or >2 cm), bronchus sign, consistency of lesion and use of cryoprobe were calculated as pooled mean with 95% confidence interval and compared using the paired t-test. The sensitivity analysis of studies with low and high risk of bias was performed with a univariate analysis model after logit transformation with random effects as mentioned above. Results were expressed as a percentage or relative risk and 95% confidence interval with an alpha value of 0.05. Statistics were performed with R 4.2.3 (R Foundation for Statistical Computing, Vienna, Austria).

Results

The electronic search retrieved 2376 articles from different sources, excluding duplicates. After title and abstract screening, 642 articles were selected. A total of 363 studies, including 18 randomised controlled trials, 92 prospective observational studies and 253 retrospective studies published from 1998 to 2023, were included in the final analysis. 40 of them were conference abstracts (figure 1 and figure S2).

FIGURE 1FIGURE 1FIGURE 1

Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow diagram.

Diagnostic yield

A total of 79 519 nodules were sampled among the 363 studies. The rEBUS studies were the majority in the analysis (n=146, 36.5%), while the studies using CT-TBNA comprised the largest sample size among the sampled nodules (n=31 964, 40.1%). The overall mean and se nodule size was 2.31±0.06 cm with the maximum size in the rEBUS subgroup (2.66±0.12 cm) and the minimum in the RAB subgroup (1.78±0.14 cm) with an overall subgroup difference Q-value=28.39, p<0.0001. The overall diagnostic yield was highest for CT-TBNA (88.9%, 95% CI 87–90.5, I2=93.2%), followed by RAB (84.8%, 95% CI 81.1–87.8, I2=74.8%). rEBUS had the lowest diagnostic yield (72%, 95% CI 70.1–73.8, I2=87.8%). The combined diagnostic yield of all bronchoscopic procedures was 73.9% (95% CI 72.4–75.3, I2=87.1%). The subgroup difference for the overall univariate model with each diagnostic procedure on the diagnostic yield had a Q-value of 150.63 with p<0.0001 and that of CT-TBNA versus bronchoscopic procedures combined had a Q-value of 114, p<0.001 (table 1). The forest plots of each diagnostic modality are represented in figure 2.

TABLE 1

Diagnostic yield overall and across subgroups

FIGURE 2FIGURE 2FIGURE 2FIGURE 2FIGURE 2

Forest plots representing the diagnostic yield of individual and overall studies by a) computed tomography-guided transthoracic biopsy or needle aspiration (CT-TBNA), b) radial endobronchial ultrasound with no additional techniques (rEBUS only), c) virtual bronchoscopy, d) electromagnetic navigation and e) robot-assisted bronchoscopy. Citation details for individual studies are included in Supplement 2.

Pairwise meta-analysis

A total of 37 studies had a comparison of two diagnostic procedures with the majority comparing rEBUS versus VB (n=14), followed by rEBUS versus CT-TBNA (n=9) and the least comparing CT-TBNA versus RAB (n=1). In the pairwise meta-analysis, the superiority of diagnostic yield was demonstrated only for CT-TBNA compared to rEBUS (relative risk 1.19, 95% CI 1.04–1.37, p=0.02, I2=79%) (table 2). There was no difference in diagnostic yield between CT-TBNA versus EMN or RAB, VB versus rEBUS, EMN versus rEBUS, VB or RAB. The forest plots and funnel plots of the head-to-head comparative studies are represented in figures S1 and S2.

TABLE 2

Pairwise meta-analysis of the studies comparing two diagnostic procedures

Network meta-analysis

The network model combining direct and indirect evidence is represented in figure 3a, with rEBUS having the maximum overall sampled nodules (n=2519) and RAB (n=289) the least. The head-to-head comparison of the different diagnostic procedures in the network model along with the GRADE certainty of evidence is represented in table 3 and table S2, respectively. The forest plot of relative risk of diagnostic yield with different diagnostic procedures compared with CT-TBNA as reference is represented in figure 3b. rEBUS had a lower diagnostic yield compared to CT-TBNA (relative risk 0.84, 95% CI 0.76–0.93) with a low GRADE certainty. Other head-to-head comparisons did not prove superiority or inferiority with moderate (CT-TBNA versus RAB or rEBUS versus VB), low (CT-TBNA versus rEBUS or VB, EMN, rEBUS versus EMN or RAB, EMN versus VB or RAB versus VB) and very low (EMN versus RAB) GRADE certainty. Using the GRADE approach, we conclude on our network meta-analysis model that CT-TBNA can be considered the most effective diagnostic modality to sample PPLs, followed by VB, EMN and RAB, all with low GRADE certainty. On the contrary, rEBUS without any additional guided techniques might be the least effective diagnostic modality to sample PPLs, with low GRADE certainty (table 4).

FIGURE 3FIGURE 3FIGURE 3

a) Network geometry displaying a network of studies comparing the diagnostic yield with different modalities for sampling pulmonary lesions (the size of the nodes is proportionate to the total number of nodules sampled and the thickness of the lines is the proportion to the number of studies). b) Forest plot showing the relative risk of different diagnostic techniques in the network model with computed tomography-guided transthoracic biopsy or needle aspiration (CT-TBNA) as a reference. EMN: electromagnetic navigation; RAB: robot-assisted bronchoscopy; rEBUS: radial endobronchial ultrasound; VB: virtual bronchoscopy.

TABLE 3

Summary of findings reporting the comparative diagnostic yield with different diagnostic methods for pulmonary nodules

TABLE 4

Drawing conclusions of the network meta-analysis using the Grading of Recommendations Assessment, Development and Evaluation (GRADE) approach

Complications

The incidence of pneumothorax and pneumothorax requiring chest tube was highest with CT-TBNA (16.8 and 1.6%, respectively) and lowest with rEBUS with no additional diagnostic techniques (0.9 and 0.2%, respectively). Clinically significant bleeding was greatest with CT-TBNA (5.2%) and least with RAB (0.3%). Further details on complications with different diagnostic procedures are described in table 5.

TABLE 5

Complications with different diagnostic procedures

Subgroup analysis

Details of the subgroup analysis are elucidated in table 6. The diagnostic yield was higher with nodules >2 cm compared to nodules <2 cm across all diagnostic modalities. Additionally, the diagnostic yield was greater with malignant lesions compared to benign lesions with the use of CT-TBNA and VB, and higher with solid compared to subsolid lesions with the use of rEBUS and EMN. Regarding bronchus sign, only rEBUS had a higher diagnostic yield with lesions presenting positive bronchus sign. There was no subgroup difference found with or without the use of cryoprobe, the use of cone beam CT or studies with high or low-risk bias. Additional use of rEBUS increased diagnostic yield with RAB but not with VB and EMN.

TABLE 6

Subgroup analysis

Assessment of heterogeneity

The re-estimation of diagnostic yield after removing the outlier studies showed similar values for all diagnostic modalities, with a decrease in I2 values. The publication bias, as assessed by Egger's test, was significant for the studies on rEBUS, EMN and RAB. The contour-enhanced trim and fill funnel plot showed potential missing studies across all diagnostic modalities, the maximum for rEBUS (34 potential missing studies, 23.2%) and the least for CT-TBNA (12, 16.2%). Metaregressional analysis revealed the retrospective nature of the CT-TBNA study, studies with only solid nodules for rEBUS, increasing number of nodules for VB and increasing year of publication for RAB as moderators influencing the diagnostic yield with statistical significance. Further details are elucidated in table S3 and figure S3.

Discussion

Data on the best modality to sample peripheral pulmonary lesions is lacking in the current literature, especially focused on the rapid advancement of guided bronchoscopic techniques compared with CT-TBNA. Our comprehensive study including meta-analysis, pairwise meta-analysis and network meta-analysis showed that CT-TBNA presented the best diagnostic yield, with low GRADE certainty. Although RAB had a diagnostic yield close to CT-TBNA (84.8% versus 88.9%) and showed noninferiority in the pairwise meta-analysis, it ranked fourth in the network meta-analysis, after CT-TBNA, VB and EMN, with low GRADE certainty. rEBUS alone (without additional techniques such as VB, EMN or RAB) had the lowest diagnostic yield in the meta-analysis, was inferior to CT-TBNA in the pairwise meta-analysis and ranked last in the network meta-analysis with low GRADE certainty.

The overall pooled diagnostic yield of CT-TBNA of 88.9% is similar to previously published meta-analyses ranging from 88 to 93% [22, 23]. Both of the previously published meta-analyses included only six and nine studies, respectively, compared to 80 studies included in our analysis, which might explain the high heterogeneity in our pooled analysis for diagnostic yield (I2=93.2%). Likewise, Nadig et al. [24] and Kops et al. [25] published a meta-analysis on guided bronchoscopy with an overall diagnostic yield of 69.4% (95% CI 67–71, 126 studies) and 70.9% (95% CI 68.4–73.2, 96 studies), respectively. Our overall diagnostic yield with bronchoscopic procedures is slightly higher (73.9%, 95% CI 72.4–75.3320 studies). This finding is explained by a higher diagnostic yield specifically with RAB in our meta-analysis (84.8%) compared to the above-mentioned studies (77.6 and 76.5% for RAB, respectively). Of note, the pooled diagnostic yield of 84.8% with RAB is comparable to more recently published meta-analyses on RAB by Pyarali et al. [9] and Ali et al. [8] reporting 81.9 and 84.3%, respectively. This shows that the diagnostic yield with RAB is improving with increased familiarity and usage of this technique including the community setting and the introduction of advanced sampling techniques such as cryobiopsy and three-dimensional fluoroscopy [2629].

Although the pooled-diagnostic yield of RAB is comparable to CT-TBNA (84.8% versus 88.9%), with noninferiority in the pairwise meta-analysis, it ranked inferior to CT-TBNA in the network meta-analysis, with low GRADE certainty. This is, however, limited by the fact that there is only one direct study comparing RAB versus CT-TBNA directly and the rest of the network is mainly contributed by four indirect studies [3034]. Thus, further prospective studies are needed to assess the superiority or noninferiority of RAB compared to CT-TBNA.

The complication rates in terms of pneumothorax, pneumothorax requiring a chest tube and clinically significant bleeding were highest with CT-TBNA (16.8, 1.6 and 5.2%, respectively). The complications reported in our study were lower than previously published metanalyses reporting up to 25.9, 6.9 and 4.1%, with core biopsy and up to 18.8, 4.3 and 1.7% with fine needle aspiration, respectively [35, 36]. However, these studies, which focused on CT-TBNA complications, included only 36 and 32 studies, respectively, compared to 70 studies in ours, with possible variations in terms of techniques, including patient position, number of nodules sampled, biopsy versus needle aspiration, etc. The risk of pneumothorax with CT-TBNA is higher with deeper needle insertion depth, smaller lesions and underlying emphysema [37]. As noted in our study as well as in the previously published studies, despite the high pneumothorax rate, only one fifth to one tenth of those patients with pneumothorax would require chest tube insertion, because manoeuvres such as manual aspiration of air before removing the insertion needle can be performed as soon as the pneumothorax is identified during the CT-TBNA procedure [37]. However, sampling via CT-TBNA has the disadvantage of controlling clinically significant bleeding, unlike bronchoscopic procedures where bleeding could be controlled endobronchial with a variety of interventions, including instillation of haemostatic agents, isolation of airway with balloon tamponade or extraction of clots with cryoprobe.

The complications with guided bronchoscopic procedures are comparable to or slightly lower than prior published meta-analyses [24, 25]. Bronchoscopic procedures not only showed decreased complication rates compared to CT-TBNA but also had the added advantage of sampling mediastinal lymph nodes for staging via endobronchial ultrasound and transbronchial needle aspiration. Thus, guided bronchoscopic procedures, particularly RAB with a diagnostic yield comparable to CT-TBNA, could become the preferred approach to sample PPLs as well as the mediastinal lymph nodes in the same anaesthetic event as a one-stop-shop for diagnosis and staging.

The main strength of our study is that it is the first study comparing CT-TBNA with other bronchoscopic procedures in the sampling of PPLs. In addition, it is the first network meta-analysis among the diagnostic techniques to biopsy PPLs. We included all possible studies within our inclusive criteria, obtaining a final sample size of 363 studies, the highest than any of the previously published meta-analyses in similar settings. We followed the latest recommendations to report the findings of our network meta-analysis using the GRADE approach and, thus, we consider the findings of our studies to be the most comprehensive evidence available encompassing the different diagnostic procedures for sampling PPLs.

The main limitation of our study is that the findings are associated with high heterogeneity as measured by I2, albeit the sum of total heterogeneity as measured by τ2 was low. We tried to explore it by excluding the outlier studies, analysing the publication bias as well as performing meta-regressional analysis. Interestingly, there was no significant moderator effect on diagnostic yield in comparing studies with a sample size of more than 20 versus fewer than 20 among all diagnostic modalities. As noted in the contour-enhanced trim and fill funnel plots, there were at least 15% of negative studies (or with lower diagnostic yield) in each modality that could potentially be nonreported, thus representing reporting bias. We tried to avoid reporting bias by using a very sensitive search strategy that even included small studies and conference abstracts, as proven by the large sample size of the across each modality. Historically, high heterogeneity is associated with most of the previously published meta-analyses estimating diagnostic yield across all the modalities (table S4). Although we tried to assess clinical and methodological diversity by performing subgroup and meta-regressional analysis, it is hard to standardise certain confounding factors because the use of accessory techniques (three-dimensional fluoroscopy, larger needle size, cryoprobe, etc.) to sample complex nodules, operator's expertise on the procedure or inherent patient characteristics could affect the outcomes. We used the random-effects model in our analysis to incorporate the heterogeneity from clinical and methodological diversity.

The other limitations of our study include heterogeneity in the definition of diagnostic yield as some studies, especially conference abstracts, did not report the strictness whereas others followed strict or liberal criteria. We preferred to use strict criteria over liberal criteria whenever possible. We also classified the studies as high risk of bias in the QUADAS-2 if the strictness was not transparent or liberal. We included all possible studies including conference abstracts and studies with selection bias to have comprehensive evidence of our findings. Interestingly, there was no significant difference in the diagnostic yield between studies with low- and high-risk bias across all diagnostic modalities.

In our meta-analysis, pairwise and network metanalysis comparing the different diagnostic procedures to sample PPLs, CT-TBNA appeared to be the most effective approach to sample PPLs with low GRADE certainty, although with the highest adverse events rate. RAB has a comparable diagnostic yield to CT-TBNA with noninferiority in pairwise meta-analysis but might be inferior to CT-TBNA with low GRADE certainty as per the network meta-analysis. rEBUS only is the procedure with the lowest diagnostic yield with low GRADE certainty. Further prospective studies, particularly comparing RAB with CT-TBNA, are needed, considering the recently published work describing increasing diagnostic yield with RAB as well as the ability to perform mediastinal lymph node staging in the same procedural setting.

Points for clinical practice

Lung cancer is the leading cause of cancer-related death worldwide, for men and women.

PPLs are sampled with a variety of diagnostic procedures, including CT-TBNA, rEBUS (without any additional techniques), VB, EMN and RAB.

Among all the procedures, CT-TBNA had the highest diagnostic yield (88.9%) but is associated with high complication rates. On the other hand, RAB has a diagnostic yield close to CT-TBNA (84.8%) with much lower complication rates. The diagnostic yield is the least with rEBUS without any use of additional techniques.

Questions for future research

Further research should focus on performing prospective and randomised controlled trials on CT-TBNA versus guided bronchoscopic procedures, especially with RAB given the increasing diagnostic yield with time comparable to CT-TBNA and the added benefit of performing mediastinal lymph node staging during the same procedure.

留言 (0)

沒有登入
gif