Accounting for bias due to outcome data missing not at random: comparison and illustration of two approaches to probabilistic bias analysis: a simulation study

Abstract

Background: Bias from data missing not at random (MNAR) is a persistent concern in health-related research. A bias analysis quantitatively assesses how conclusions change under different assumptions about missingness using bias parameters which govern the magnitude and direction of the bias. Probabilistic bias analysis specifies a prior distribution for these parameters, explicitly incorporating available information and uncertainty about their true values. A Bayesian approach combines the prior distribution with the data's likelihood function whilst a Monte Carlo approach samples the bias parameters directly from the prior distribution. No study has compared a Monte Carlo approach to a fully Bayesian approach in the context of a bias analysis to MNAR missingness. Methods: We propose an accessible Monte Carlo probabilistic bias analysis which uses a well-known imputation method. We designed a simulation study based on a motivating example from the UK Biobank study, where a large proportion of the outcome was missing and missingness was suspected to be MNAR. We compared the performance of our Monte Carlo probabilistic bias analysis to a principled Bayesian probabilistic bias analysis, complete case analysis (CCA) and missing at random implementations of inverse probability weighting (IPW) and multiple imputation (MI). Results: Estimates of CCA, IPW and MI were substantially biased, with 95% confidence interval coverages of 7-64%. Including auxiliary variables (i.e., variables not included in the substantive analysis which are predictive of missingness and the missing data) in MI's imputation model amplified the bias due to assuming missing at random. With reasonably accurate and precise information about the bias parameter, the Monte Carlo probabilistic bias analysis performed as well as the fully Bayesian approach. However, when very limited information was provided about the bias parameter, only the Bayesian approach was able to eliminate most of the bias due to MNAR whilst the Monte Carlo approach performed no better than the CCA, IPW and MI. Conclusion: Our proposed Monte Carlo probabilistic bias analysis approach is easy to implement in standard software and is a viable alternative to a Bayesian approach. We caution careful consideration of choice of auxiliary variables when applying imputation where data may be MNAR.

Competing Interest Statement

TPM has received consultancy fees from: Bayer Healthcare Pharmaceuticals, Alliance Pharmaceuticals, Gilead Sciences, and Kite Pharmaceuticals. Since January 2023, ARC has been an employee of Novo Nordisk Research Centre Oxford, which is not related to the current work and had no involvement in the decision to publish. The remaining authors declare that they have no competing interests.

Funding Statement

This work was supported by the Bristol British Heart Foundation (BHF) Accelerator Award (AA/18/7/34219), the University of Bristol and Medical Research Council (MRC) Integrative Epidemiology Unit (MC_UU_00032/01, 02 & 05), the BHF-National Institute of Health Research (NIHR) COVIDITY flagship project, the John Templeton Foundation (61917), and the Wellcome Trust and Royal Society (215408/Z/19/Z). TPM was funded by the UKRI Medical Research Council, grant number MC_UU_00004/09. The computation work was carried out using the computational facilities of the Advanced Computing Research Centre, University of Bristol - http://www.bristol.ac.uk/acrc/.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

UK Biobank received ethical approval from the UK National Health Service's National Research Ethics Service (ref. 11/NW/0382). All participants provided written and informed consent for data collection, analysis, and record linkage. This research was conducted under UKB application number 16729.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Data Availability

The software code to generate the simulated datasets analysed during the simulated study are available in the COVIDITY_ProbQBA repository, https://github.com/MRCIEU/COVIDITY_ProbQBA. The UK Biobank study dataset analysed during the current study is available from the UK Biobank Access Management Team (https://www.ukbiobank.ac.uk/learn-more-about-uk-biobank/contact-us) but restrictions may apply to the availability of these data, which were used under license for the current study, and so are not publicly available. All methods discussed in this paper can be implemented using the provided software code available from the COVIDITY_ProbQBA repository, https://github.com/MRCIEU/COVIDITY_ProbQBA.

https://github.com/MRCIEU/COVIDITY_ProbQBA

留言 (0)

沒有登入
gif