Development of a Bronchoscopy-Radiologic Skills and Task Assessment Tool (BRadSTAT): A Tool for Evaluating the Radiological Skills of Bronchoscopists with Different Experience

Background: Competency using radiologic images for bronchoscopic navigation is presumed during subspecialty training, but no assessments objectively measure combined knowledge of radiologic interpretation and ability to maneuver a bronchoscope into peripheral airways. Objectives: The objectives of this study were (i) to determine whether the Bronchoscopy-Radiology Skills and Tasks Assessment Tool (BRadSTAT) discriminates between bronchoscopists of various levels of experience and (ii) to improve construct validity using study findings. Methods: BRadSTAT contains 10 questions that assess chest X-ray and CT scan interpretation using multiple images per question and 2 technical skill assessments. After administration to 33 bronchoscopists (5 Beginners, 9 Intermediates, 10 Experienced, and 9 Experts), discriminative power was strengthened using differential weighting on CT-related questions, producing the BRadSTAT-CT score. Cut points for both scores were determined via cross-validation. Results: Mean BRadSTAT scores for Beginner, Intermediate, Experienced, and Expert were 74 (±13 SD), 78 (±14), 86 (±9), and 88 (±8), respectively. Statistically significant differences were noted between Expert and Beginner, Expert and Intermediate, and Experienced and Beginner (all p ≤ 0.05). Mean BRadSTAT-CT scores for Beginner, Intermediate, Experienced, and Expert were 63 (±14), 74 (±15), 82 (±13), and 90 (±9), respectively, all statistically significant (p ≤ 0.03). Cut points for BRadSTAT-CT had lower sensitivity but greater specificity and accuracy than for BRadSTAT. Conclusion: BRadSTAT represents the first validated assessment tool measuring knowledge and skills for bronchoscopic access to peripheral airways, which discriminates between bronchoscopists of various experience levels. Refining BRadSTAT produced the BRadSTAT-CT, which had higher discriminative power. Future studies should focus on their usefulness in competency-based bronchoscopy programs.

© 2022 The Author(s). Published by S. Karger AG, Basel

Introduction

Technological advances, associated with modern imaging techniques such as radial probe endobronchial ultrasound (EBUS) and multi-row detector computed tomography (CT), render the ability to accurately select endobronchial paths to peripheral pulmonary lesions and skillfully navigate peripheral airways more important than ever before. Competency using radiologic images to navigate the airways is presumed during hands-on subspecialty education, but no existing validated assessment tools measure knowledge and technical skill in this area. Current assessment tools focus on dexterity and handling of the bronchoscope and knowledge of the bronchoscopic airway anatomy [1-3] but do not address the radiological component. The purpose of this study was to evaluate a novel instrument, the Bronchoscopy-Radiology Skills and Tasks Assessment Tool (BRadSTAT), that objectively tests a user’s combined knowledge of radiologic interpretation and ability to maneuver a flexible bronchoscope into peripheral airway regions.

MethodsBRadSTAT Design and Scoring

BRadSTAT is designed for bronchoscopists with passing scores on the Bronchoscopy Skills and Task Assessment (BSTAT) [1] to ensure documented knowledge of airway anatomy and precise bronchoscopic handling. The assessment tool (Fig. 1) contains 10 questions that assess chest X-ray (CXR) (questions 1, 2, and 7) and CT scan (questions 3–6, 8–10) interpretation, using multiple images per question. Each image is scored separately and equally for a total possible score of 10 for each question. For example, if a question contains five images, each has a score of 2 if answered correctly and 0 if answered incorrectly. The maximum possible BRadSTAT score therefore is 100. Airway and bronchopulmonary segmental anatomy of the right and left lungs are addressed separately. All questions require users to mentally convert 2-dimensional images to a 3-dimensional perspective. Users must (1) match images of normal lobar and segmental anatomy with their corresponding anatomic descriptions (questions 1–6); (2) match abnormalities on CXR and CT with their corresponding descriptions of location (questions 7 and 8); and (3) identify the location of an abnormality on CT before navigating the bronchoscope to the appropriate pulmonary segment (questions 9 and 10).

Fig. 1.

BRadSTAT. Questions 1–10.

/WebMaterial/ShowPic/1455605BRadSTAT Administration

Radiologic images for questions 5 and 6 are viewed as still photos on the assessment tool or in the slideshow mode (online suppl. Questions 5, 6; see www.karger.com/doi/10.1159/000526011 for all online suppl. material) to mimic how users scroll through images in real life. Technical skill (questions 9 and 10) is measured using an airway model or simulator. We used the commercially available, previously validated ORSIM high-fidelity simulator (Auckland, New Zealand) [4, 5] made of a proxy flexible bronchoscope, an interface device into which the bronchoscope is inserted, and a laptop with simulation software. Sensors detect lever movements and bronchoscope rotation. Maneuvering feels realistic and virtual airway images appear true to life.

Study Participants

Participants were trainees in medical thoracic training programs and pulmonary consultants. All were asked to complete a questionnaire regarding bronchoscopy experience (number of procedures and whether they performed transbronchial lung biopsy, radial EBUS, or navigational bronchoscopy independently) and year of training. Participants were categorized into four groups: Beginners were trainees in the first half of their 3-year program who had completed <100 bronchoscopies; Intermediates were in the second half of their training who had completed between 100 and 200 bronchoscopies; Experienced were consultants who had performed >200 bronchoscopies; and Experts were interventional consultants who performed navigational bronchoscopy and radial EBUS.

BRadSTAT was administered over a 6-month period by examiners (E.Y., J.W., P.N.) at three different hospitals (Middlemore Hospital, Auckland, Liverpool and Macquarie Hospitals, Sydney, and Royal Adelaide Hospital, Adelaide) with interventional pulmonology units accredited for advanced training in thoracic medicine with the Royal Australasian College of Physician (RACP). Some Experts were recruited in Auckland during the June 2019 Australia-New Zealand Interventional Pulmonology (ANZIP) meeting.

Testing Protocol

BRadSTAT was administered to participants during a single session. The time for each participant to complete the assessment was noted. Questions 1–8 were addressed using a combination of paper-based and laptop computer images without examiner assistance. For questions 9 and 10 (technical skill), examiners asked participants to verbally identify the location of an abnormality before maneuvering the bronchoscope to that target location. Questions were scored as incorrect if either verbal or maneuvering portions were erroneous.

Study Aims

Our primary aim was to determine whether BRadSTAT scores discriminate between bronchoscopists of varying levels of experience, thus establishing construct validity. The secondary aim was to analyze data to further refine or improve BRadSTAT as an assessment tool.

Statistical Methods

Statistical analysis was performed in three phases by an investigator (A.V.) who neither knew nor observed study participants. In the first phase, BRadSTAT scores were assessed visually, overall, and question by question. Descriptive models were produced with total and question scores linearly regressed on the group. The BRadSTAT score’s association with the groups was tested using the Jonckheere-Terpstra procedure [6]. Questions were assessed for internal consistency using Cronbach’s α.

A second phase analysis was performed to determine if BRadSTAT could be refined. Different weightings to each question were assessed. Thirteen different methods were assessed, including the original BRadSTAT score weighting which allotted equal weight to each question. The assessment criterion was misclassification error, i.e., the proportion of misclassified participants per total number of participants. Weights were applied as regression coefficients in an adjacent-category logit proportional odds model. This model called for the estimation of thresholds on the log-odds scale between every pair of adjacent categories. Thresholds could be translated as maximum likelihood cut points for the corresponding scores, conditionally on the weighting scheme. Misclassification was minimized using leave-one-out cross-validation in all cases (online supplementary).

In the third phase, we selected the weighting method yielding the smallest misclassification error as well as being most clinically relevant. The first phase was then repeated with the corresponding score. The conditional maximum likelihood cut points of this score and the BRadSTAT score were assessed by producing cross-validated sensitivity, specificity, and accuracy estimates by comparing Beginner to non-Beginner, Beginner or Intermediate to Experienced or Expert, and non-Expert to Expert, respectively. Sensitivity and specificity here correspond to correct classification in the lower ability and the higher ability groups, respectively.

Ethics

No ethics approval was required according to the Health and Disability Ethics Committee in New Zealand as no patient data were collected for this study. Participants provided written informed consent prior to participation in the study.

ResultsPhase 1

BRadSTAT was administered to 33 participants: Beginners (5), Intermediates (9), Experienced (10), and Experts (9). Eight were from Sydney, four from Adelaide, and thirteen from Auckland. Eight Experts were recruited during the ANZIP conference.

Mean scores for each of the four groups (Beginner, Intermediate, Experienced, and Expert) were 74 (±13 SD), 78 (±14), 86 (±9), and 88 (±8), respectively (Fig. 2). Statistically significant differences were noted between Expert and Beginner (p = 0.02), Expert and Intermediate (p = 0.04), and Experienced and Beginner groups (p = 0.05) (Table 1). Experts’ scores were less variable than those of Beginners (SD 8 vs. 13, p = 0.02). When participants were grouped into trainees (n = 14) or consultants (n = 19), differences between mean scores remained statistically significant (76 vs. 87, respectively, p = 0.014), The median time to complete the BRadSTAT differed significantly between the Expert group (median 29 min, range [21, 35]) and others (p = 0.0028): Beginner (median 50 min, range [30, 53]), Intermediate (median 47 min, range [35, 55]), and Experienced (median 40 min, range [30, 64]) (Fig. 3). The median time to complete the assessment was also statistically significant between trainees and consultants (45.2 vs. 36.6 min, respectively, p = 0.014).

Table 1.

Between group differences for BRadSTAT and BRadSTAT-CT

/WebMaterial/ShowPic/1455611Fig. 2.

Comparing BRadSTAT and BRadSTAT-CT scores across the four groups.

/WebMaterial/ShowPic/1455603Fig. 3.

Test time required by groups to complete BRadSTAT.

/WebMaterial/ShowPic/1455601Phase 2 and 3

Analysis of the ten individual questions showed good discriminative power of questions 3, 4, 6, 8, 9, and 10, which were CT-related questions (Fig. 4), whereas CXR-related questions 1, 2, and 7 were less discriminative. Question 5 had lower discriminative power compared to other CT-related questions.

Fig. 4.

Sample means of the 10 questions in BRadSTAT by group.

/WebMaterial/ShowPic/1455599

Based on these findings and our examination of weighting methods, the weighting scheme yielding the smallest misclassification error as well as being most clinically relevant was selected. We named this weighting scheme the BRadSTAT-CT because it included only CT-related questions. A score of 10 was assigned for questions 3, 4, 5, 6, and 8 each if answered correctly, and a score of 25 was assigned for questions 9 and 10 each. The rationale for the higher weighted questions 9 and 10 was that they required a combination of cognition (localizing the abnormality on CT) and technical skills (navigation to the correct bronchopulmonary segment) as compared to CT interpretation alone.

The BRadSTAT-CT score was determined for all 33 participants. Mean BRadSTAT-CT scores for the four groups (Beginner, Intermediate, Experienced, and Expert) were 63 (±14), 74 (±15), 82 (±13), and 90 (±9), respectively (Fig. 2). Statistically significant differences were noted between Expert and Beginner (p = 0.007), Expert and Intermediate (p = 0.01), and Experienced and Beginner scores (p = 0.03) (Table 1). A statistically significant difference between the mean BRadSTAT-CT scores for trainees and consultants (70.4 vs. 86.2, p = 0.002) was noted. The Jonckheere-Terpstra test relating the group order to the BRadSTAT and BRadSTAT-CT scores as a predictor yielded p values of p = 0.004 and p = 0.0001, respectively (Table 2). Median time to complete BRadSTAT-CT was not measured because BRadSTAT-CT was determined a posteriori.

Table 2.

Observed significance level of scores as predictors of group, level, and total test time

/WebMaterial/ShowPic/1455609Cut Point Analysis

Table 3 presents the cut points for the BRadSTAT and BRadSTAT-CT scores and the associated cross-validated sensitivity, specificity, and accuracy. Both the BRadSTAT and BRadSTAT-CT cut points have lower sensitivity than specificity. The accuracy of the BRadSTAT cut points (≤67 for Beginner level, 68–80 for Intermediate, 81–88 for Experienced, and ≥89 for Expert) ranged between 69% and 81%. The accuracy of the BRadSTAT-CT cut points (≤50 for Beginner level, 51–69 for Intermediate, 70–82 for Experienced, and ≥83 for Expert) ranged between 70% and 88%. The BRadSTAT-CT cut points have better specificity, accuracy, and lower sensitivity compared to BRadSTAT cut points.

Table 3.

Estimated cut points, sensitivity, specificity, and accuracy for BRadSTAT and BRadSTAT-CT

/WebMaterial/ShowPic/1455607Discussion

BRadSTAT objectively and systematically tests a user’s combined knowledge of radiologic interpretation and ability to maneuver a bronchoscope to a desired peripheral location. Technological advances in airway imaging and peripheral bronchoscopy make these skills increasingly necessary, particularly for nodule management in the era of CT screening for lung cancer. Precise knowledge of bronchopulmonary segments is also essential for accurate bronchoscopic lavage in patients with patchy interstitial lung disease.

The importance of accurate endobronchial path selection to maximize diagnostic yield was highlighted by Dolina et al. [7], who noted a wide range of accuracy across experience levels. Variability in yields might also result from varying bronchoscopic dexterity and handling [8] and inexperience [9]. While both can be measured using existing bronchoscopy assessment tools [1-3], it is noteworthy that none apply to the combined use of radiologic imaging and technical skill. The BRadSTAT was designed to fill this void.

In this study, we demonstrated the BRadSTAT’s construct validity; it was able to discriminate between participants at different levels of experience, from Beginner to Expert, and between thoracic trainees and consultants. The time to complete the assessment was also reduced from Beginner to Expert, reflecting difficulty to complete the assessment based on level of experience. Time should be included in assessments because proficiency is characterized by good results as well as task efficiency. Although we did not test the longitudinal aspects of BRadSTAT, we propose the repeated use of this assessment tool in training programs would be consistent with the strategy of using formative and barrier assessments such as the BSTAT [1], EBUS-STAT [10], and UG-STAT [11], employed by TSANZ to assess competency in flexible bronchoscopy [12], EBUS bronchoscopy [13], and pleural ultrasound [14]. Deliberate practice and improving specific tasks under supervision with constructive feedback helps achieve mastery learning in other medical fields [15-17].

The secondary aim of this study was to analyze our data to potentially refine the BRadSTAT. Based on high discriminative power, low misclassification, and a clinically relevant weighting scheme, we identified and studied a BRadSTAT-CT scoring system which consisted of seven CT-related questions. That BRadSTAT-CT had superior discriminative power compared with BRadSTAT suggests that an assessment tool based on CT-related questions alone could better discriminate between participants of varying levels of experience, i.e., had better construct validity compared with BRadSTAT. This may be related to the possibility that CXR interpretation has become a fundamental skill for all physicians; therefore, few differences in chest interpretation skill might be expected between Beginners and more advanced trainees or consultants. It is also possible that a tool using only three CXR-related questions is insufficient to detect statistically significant differences. Regardless, in an era when CT is the imaging modality of choice for patients requiring bronchoscopy for peripheral airway pathology, a CXR-related bronchoscopy assessment tool might rapidly become obsolete.

BRadSTAT-CT shows how an assessment tool may be more discriminating when different weights are allotted to questions based on their complexity. This was also the case for the RIGID-TASC [18], an instrument that objectively measures rigid bronchoscopy-related skills. In BRadSTAT-CT, combined radiologic interpretation and technical skill questions 9 and 10 required multiple steps to complete and thus were weighted more than the remaining five questions that required only one step. This approach is in line with Miller’s [19] suggestion that assessment should not only capture important components of a complex task but also ensure that test elements are weighted appropriately.

Cut points, also referred to as cut scores, provide a precise way of establishing a standard of performance for a test [20]. They are often used by educators and policy makers to help define whether a particular test score is sufficient for some purpose [21] and are valuable for setting levels of proficiency or competency [22, 23]. If used longitudinally, cut points can also enable assessors to determine a trainee’s progress along the learning curve. Those of the BRadSTAT and BRadSTAT-CT were identified via cross-validation. They allowed stratification into Expert, Experienced, Intermediate, and Beginner groups (Table 3). Both sets of cut points had high accuracy rates for predicting these groups, further strengthened by the results of the Jonckheere-Terpstra test which showed a stronger predictive power of the BRadSTAT-CT cut points compared with those of the BRadSTAT.

The BRadSTAT-CT’s cut points also make clinical sense. For instance, a participant who scored perfectly in CT interpretation (questions 3, 4, 5, 6, 8) but made technical errors (questions 9 and 10) could score 50 points, i.e., Beginner level, which could be consistent with a subjective assessment of trainees with CT interpretation skills but not yet able to complete the more complex task of combining radiologic interpretation with bronchoscopic navigation. Similarly, to perform at an Expert level (cut point ≥83), one might expect a perfect score in CT interpretation and certainly no more than three mistakes in either technical question, which each contained five images of 5 points each for a total possible score of 25 per question. It is strangely coincidental that our proposed BRadSTAT-CT cut points closely resemble the grading system used in New Zealand and Australia’s tertiary education systems, where a grade A is conferred for scores ≥80% (≥85% in Australia), B for scores ≥65% (≥70% in Australia), and C for scores ≥50% [24, 25].

Limitations

Because the lead investigator (E.Y.) also helped administer and score BRadSTAT, one limitation of this study is possible investigator bias. A similar possibility for bias is found in studies of other assessment tools [1, 10]. Overall, the objective nature of BRadSTAT’s scoring system makes investigator bias unlikely. A second limitation of this study is that trainees recruited from centers with a strong focus on bronchoscopy education may have been better trained, thereby reducing BRadSTAT’s ability to discriminate between groups. A third limitation relates to small sample size and the possibility of introducing a priori group classification error. Each of these limitations can be addressed through a larger scale study. A fourth limitation relates to our decision to test the technical skills components of questions 9 and 10 using the ORSIM bronchoscopy simulator rather than patients. While this device is validated and provides a realistic procedural experience [4, 5], it is not widely available. Future studies can incorporate assessments in the clinical setting.

Finally, while BRadSTAT and BRadSTAT-CT are important steps toward the development of an objective, fully validated assessment tool, the BRadSTAT-CT was derived from differential scoring and not tested independently. Future clinical studies employing subjects of varying experience are needed to determine its repeatability, reproducibility, and longitudinal applicability. We believe that the objective nature of its questions and BRadSTAT-CT’s weighted scoring system make instability over time or between different assessors unlikely. How cut points might be used to help define competency, however, and how often BRadSTAT or BRadSTAT-CT should be performed during training is less clear.

Conclusions

Combining accurate radiologic interpretation, precise airway navigation, and dexterous technical skill is essential to competent bronchoscopic access to peripheral airways. BRadSTAT is the first validated assessment tool that objectively measures these skills and effectively discriminates between bronchoscopists with different levels of experience. Refining BRadSTAT and strengthening its discriminative power by using differential weighting and CT-related questions only produced the BRadSTAT-CT. Studies are warranted to demonstrate the reproducibility and repeatability of both these tools and to assess their usefulness in competency-based bronchoscopy training programs.

Statement of Ethics

No ethics approval was required according to the Health and Disability Ethics Committee in New Zealand as no patient data were collected for this study. Participants provided written informed consent prior to participation in the study.

Conflict of Interest Statement

The authors have no conflicts of interest to declare.

Funding Sources

No funding for the study was received by any of the authors.

Author Contributions

Elaine L.C. Yap: conceptualization: lead, formal analysis: supporting, investigation: equal, methodology: lead, writing – original draft: lead, and writing – review and editing: lead. Alain C. Vandal: formal analysis: lead, methodology: equal, writing – original draft: equal, and writing – review and editing: equal. Jonathan P. Williamson: conceptualization: supporting, investigation: equal, methodology: supporting, and writing – review and editing: equal. Phan Nguyen: investigation: equal, methodology: supporting, and writing – review and editing: equal. Henri Colt: conceptualization: lead, formal analysis: supporting, methodology: lead, writing – original draft: lead, and writing – review and editing: lead.

Data Availability Statement

All data generated or analyzed during this study are included in this article and its online supplementary material. Further inquiries can be directed to the corresponding author.

This article is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC). Usage and distribution for commercial purposes requires written permission. Drug Dosage: The authors and the publisher have exerted every effort to ensure that drug selection and dosage set forth in this text are in accord with current recommendations and practice at the time of publication. However, in view of ongoing research, changes in government regulations, and the constant flow of information relating to drug therapy and drug reactions, the reader is urged to check the package insert for each drug for any changes in indications and dosage and for added warnings and precautions. This is particularly important when the recommended agent is a new and/or infrequently employed drug. Disclaimer: The statements, opinions and data contained in this publication are solely those of the individual authors and contributors and not of the publishers and the editor(s). The appearance of advertisements or/and product references in the publication is not a warranty, endorsement, or approval of the products or services advertised or of their effectiveness, quality or safety. The publisher and the editor(s) disclaim responsibility for any injury to persons or property resulting from any ideas, methods, instructions or products referred to in the content or advertisements.

留言 (0)

沒有登入
gif