Baylor College of Medicine’s institutional review board approved the study. Only data from consenting students were included in our analyses. A Consolidated Standards of Reporting Trials (CONSORT) is provided as supplementary material. Data were analyzed using R (R Core Team, 2023).
Study DesignWe conducted a randomized cross-over trial with second-year medical students (MS2s) at a single U.S. institution over a two-day period. The learners had been previously exposed to TBL during their first year of coursework; however, their experience with lectures was largely in a passive format. Simple randomization was completed by the first author using Excel to produce random values for group assignment. Given that the two classrooms were at different locations, students were informed of their room assignment prior to the first day of the experiment. Each group participated in either an LGI or a large-group lecture on the first day with the other modality the second day.
Day one focused on hypertension and day two on electrocardiogram interpretation, each conducted in a 50-min session and starting simultaneously at 8 am. Each group had the same instructor for both LGI and lecture-based learning sessions. Instructors were selected for their prior experience teaching in the medical school, expertise in the subject matter, and for having completed training on the use of active learning strategies. Because of the differences in topics, we did not anticipate carryover effects. The first and second author observed the classrooms to verify that the learning modality (LGI or lecture) was implemented as intended.
The two learning conditions were developed such that the only difference between them was the active engagement of learners. The large-group lecture consisted of clinical vignette presentation with minimal interaction between the instructor and students. The active learning condition required students to work collaboratively in teams of 4 to 6 on the same clinical vignettes. Each vignette included multiple-choice questions. In the passive lecture, the instructor presented the questions and talked through each of the options for the multiple-choice questions. In the LGI, the learners were given 1 to 2 min to discuss the question and select an answer. This was followed by a brief explanation of the answer by the instructor. Between vignettes, the instructor gave a brief presentation of material that connected to the next clinical vignette. All participants, regardless of modality, were presented with identical learning objectives, PowerPoint slides, content, clinical cases, and questions. When developing the lessons, each instructor created PowerPoint slides and clinical cases for one lesson and then shared their work with the other instructor. Doing so provided consistency of the learning content and presentation materials, regardless of instructional format. At the conclusion of each session, all learners completed a feeling of learning survey and then a test of learning.
MeasuresTest of LearningThree faculty members worked independently of the instructors and, without reviewing lesson materials, developed the 12-question test of learning for each topic. These items were reviewed and refined by additional team members. Only the session objectives were shared between the lesson instructors and assessment developers. By blinding the instructors to the assessments and the assessment developers to the session materials, we eliminated the possibility of bias in the developed assessments or instructors “teaching to the test.”
Feeling of LearningThe feeling of learning (FOL) survey was adapted from previous research [14]. Each item was Likert-scaled with 5 response options ranging from Strongly Disagree to Strongly Agree. The four items used were “I enjoyed this session on ___,” “I feel like I learned a great deal from this session,” “The instructor was very effective at teaching the material,” and “I wish all of my courses were taught in the same way as this session.” A single question was added: “I feel prepared to apply the material I learned in another class or in a future clinical context such as clerkships.”
CovariatesCovariates included first-year academic achievement and demographic information. The 12 course grades from the first year of medical school were averaged to produce a single score, similar to a grade point average but in the original metric of the course grades (0 to 100). This was then dichotomized using a median split, so that half of the learners are in the lower 50% of prior achievement and half of the learners are in the upper 50% of prior achievement. Additional covariates included sex (male/female), age, and race/ethnicity, with categories of Asian, Black, White (non-Hispanic/Latinx), White (Hispanic/Latinx), multiracial, and unknown. Prior evidence indicates differences among groups along lines of race/ethnicity and sex exist on medical exams [15]. By including these factors, we accounted for potential residual differences in performance.
AnalysisOutcome: TOLThe sum score of the twelve items within the lesson was converted to z-scores for analysis. As z-scores, each TOL score is the deviation of the individual score from the average of scores in terms of the number of standard deviations of the distribution of scores. For example, if a participant in the LGI session for the hypertension lesson earned a z-score of 0.2 on the TOL, then this participant scored 0.2 standard deviations above the mean of scores on the hypertension TOL.
Outcome: FOLTo analyze FOL, composite FOL scores were created from the four FOL questions using principal components analysis (PCA). PCA is a data reduction strategy, wherein a reduced set of composite scores are created with the purpose of explaining variability in the items. The number of components determines the number of composite scores that are created for each. The number of components is determined using an eigenvalue > 1 rule of thumb, parallel analysis, and review of the component loadings of items. Components with an eigenvalue greater than 1 represent a composite that can explain more variability in the observed items than a single item and therefore would be a useful component for data reduction. In a parallel analysis, a PCA is conducted on a randomly generated dataset with the same number of items as the observed dataset, but these items are uncorrelated. Any component from the observed dataset that is greater than a component extracted from the randomly generated, uncorrelated data are considered to exist beyond chance and should be retained. Finally, the component loadings indicate the association between each item and the derived component; high loadings across items would indicate that the component is representing all items well. Because the FOL items are ordinal, the PCA is based on the polychoric correlation matrix rather than the Pearson r correlation matrix. The reliability of the four items is evaluated based on Cronbach’s alpha.
Analysis PlanLinear mixed-effects models were used to evaluate differences in student knowledge and feeling of learning between LGI and lecture-based learning sessions. The linear mixed-effects model accounts for within-participant variability while making within-participant comparisons. The following model was fit:
$$}_} \, \text \, }_}} \, \text \, }_}}_} \, \text \, }_}}_} \, \text \, }_}}_} \, \text \, \sum\limits_}= \text }^}}}_}+ \text }}_} \, \text \, }_} \, \text \, }_}}$$
where \(}_}\) is the outcome observed for participant i at time t, \(}}_}\) is the content of the lesson (ECG interpretation or hypertension) at time t, \(}}_}\) is the instructional modality sequence group that participant i was randomized to, and \(}}_}\) is an indicator for treatment condition at time t for participant i (LGI or lecture). The coefficient of the treatment indicator (\(}_\)) is the estimate of the effect of being in an LGI versus lecture. The summation captures covariates including prior achievement and demographic characteristics. The error terms, \(}_}\) and \(}_}}\), capture residual variability due to within- and between-person differences in outcomes, respectively, that are unexplained by the included variables.
To evaluate if prior achievement moderated the observed effects on knowledge or feeling of learning, prior performance was dichotomized using a median split, and the impact of participation in an LGI was compared between those in the upper and lower 50% of prior achievement. Analytically, this was accomplished by interacting the median-split indicator with the treatment indicator.
Regarding the question “I feel prepared to apply the material I learned in another class or in a future clinical context such as clerkships,” Wilcoxon signed-rank tests were used to compare the responses of learners in the LGI and lecture-based conditions for each lesson.
PowerThe necessary sample size to detect a minimum effect of interest was determined by statistical simulation. The minimum effect of interest is a standardized mean difference of 0.2, equal to one-fifth of a standard deviation in the difference between LGI and lecture-based learning conditions on the test of learning. In total, 142 participants were needed to detect this effect with power of 0.80 and alpha of 0.05. With 181 students in the MS2 course, 39 students (21.5%) could choose to not consent or participate in the experiment, and we would remain adequately powered to detect the effect of interest.
留言 (0)