Assessing fragility of statistically significant findings from randomized controlled trials assessing pharmacological therapies for opioid use disorders: a systematic review

This is the first study to evaluate the FI in the field of addiction medicine, and more specifically in OUD trials. Among the ten RCTs evaluating the OSAT for OUD, we found that, in some cases, changing the outcome of one or two participants could completely alter the study’s conclusions and render the results statistically non-significant.

We compare our findings to those of Holek et al.,wherein they examined the mean FI across all reviews published in PubMed between 2014 and 2019 that assessed the distribution of FI indices, irrespective of discipline (though none were in addiction medicine) [13]. Among 24 included reviews with a median sample size of 134 (IQR 82, 207), they found a mean FI of 4 (95% CI 3, 5) [13]. This is slightly lower than our calculated our median FI of 7.5 (IQR 4–12; range 1–26). It is important to note that half of the reviews included in the study by Holek et al. were conducted in surgical disciplines, which are generally subjected to more limitations to internal and external validity, as it is often not possible to conceal allocation, blind participants, or operators, and the intervention is operator dependent. [27] To date, no study has directly applied FI to the findings of trials in OUD. In the HIV/AIDS literature, however, a population which is commonly shared with addiction medicine due to the prevalence of the comorbidities coexisting, the median fragility across all trials assessing anti-retroviral therapies (n= 39) was 6 (IQR = 1, 11) [28], which is more closely related to our calculated FI. Among the included studies, only 3 were deemed to be at high risk of bias, whereas 13 and 20 studies were deemed to be at low and some risk of bias, respectively.

Loss-to-follow-up plays an important role in the interpretation of the FI. For instance, when the number of study participants lost to follow-up exceeds the FI of the trial, this implies that the outcome of these participants could have significantly altered the statistical significance and final conclusions of the study. While only two of the included studies had an FI that was greater than the total number of participants lost to follow-up [23, 26], this metric is less important in our case given the primary outcome assessed by the majority of trials was retention in treatment, rendering loss to follow-up an outcome itself. In our report, we considered participants to be lost to follow-up if they left the study for reasons that were known and not necessarily indicative of treatment failure, such as due to factors beyond the participants, control including incarceration or being transferred to another treatment location.

Findings from our analysis of the literature as well as the application of FI to the existing clinical trials in the field of addiction medicine demonstrates significant concerns regarding the robustness of the evidence. This, in conjunction with the large differences between the clinical population and trial participants of opioid-dependent patients inherent in addiction medicine trials, raises larger concerns as to a growing body of evidence with deficiencies in both internal and external validity. The findings from this study raise important clinical concerns regarding the applicability of the current evidence to treating patients in the context of the opioid epidemic. Are we recommending the appropriate treatments for patients with OUD based on robust and applicable evidence? Are we completing our due diligence and ensuring clinicians and researchers alike understand the critical issues rampant in the literature, including the fragility of the data and misconceptions of p-values? Are we possibly putting our patients at risk employing such treatment based on fragile data? These questions cannot be answered until the appropriate re-evaluation of the evidence takes place employing both the use pragmatic trial designs as well as transparent metrics to reflect the reliability and robustness of the findings.

Strengths and limitations

Our study is strengthened by a comprehensive search strategy, rigorous and systematic screening of studies, and the use of an objective measure to gauge the robustness of studies (i.e., FI). The limitations of this study are inherent in the limitations of the FI. Precisely, that it can only be calculated for RCTs with a 1:1 allocation ratio, a parallel arm or two-by-two factorial design, and a dichotomous primary outcome. As a result, 94 RCTs evaluating OSAT for OUD were excluded for not meeting these criteria (Fig. 1). Nonetheless, the FI provides a general sense of the robustness of the available studies, and our data reflect studies published across almost four decades in journals of varying impact factor.

Future direction

This study serves as further evidence for the need of a shift away from p-values [29, 30]. Although there is increasingly a shift among statisticians to shift away from relying on statistical significance due to its inability to convey clinical importance [31], this remains the simplest way and most commonly reported metric in manuscripts. p-values provide a simple statistical measure to confirm or refute a null hypothesis, by providing a measure of how likely the observed result would be if the null hypothesis were true. An arbitrary cutoff of 5% is traditionally used as a threshold for rejecting the null hypothesis. However, a major drawback of the p-value is that it does not take into account the effect size of the outcome measure, such that a small incremental change that may not be clinically significant may still be statistically significant in a large enough trial. Contrastingly, a very large effect size that has biological plausibility, for instance, may not reach statistical significance if the trial size is not large enough [29, 30]. This is highly problematic given the common misconceptions surrounding the p-value. Increasing emphasis is being placed on the importance of transparency in outcome reporting, and the reporting of confidence intervals to allow the reader to gauge the uncertainty in the evidence, and make a clinically informed decision about whether a finding is clinically significant or not. It has also been recommended that studies report FI where possible to provide readers with a comprehensible way of gauging the robustness of their findings [12, 13]. There is a strive to make all data publicly available, allowing for replication of study findings as well as pooling of data among databases for generating more robust analyses using larger pragmatic samples [32]. Together, these efforts aim to increase transparency of research and facilitate data sharing to allow for stronger and more robust evidence to be produced, allowing for advancements in evidence-based medicine and improvements in the quality of care delivered to patients.

留言 (0)

沒有登入
gif