Non-linear Mendelian randomization: detection of biases using negative controls with a focus on BMI, Vitamin D and LDL cholesterol

Negative control outcomes as an assessment of the potential risk of bias

For our negative control outcome analysis, we generated instrumental variables (IVs) for two exposures (Vitamin D and BMI) using a summed linear score across single nucleotide polymorphisms (SNPs) and performed conventional and non-linear MR on participant age in the self-reported white British population in UK Biobank.

In total, we included 351,005 participants in our analysis using Vitamin D as an exposure and 383,793 participants in our analysis on BMI. Demographics of the population are in Supplementary Table S4.

Conventional MR estimates are close to the null

Conventional MR estimates of the effect of Vitamin D on age (beta per nmol/L: 0.0028; 95% CI -0.004; 0.009, p = 0.39) and sex (OR per nmol/L 1.002; 95% CI 1.000–1.003, p = 0.06) were close to the null. MR estimates of the effect of BMI on age (beta per kg/m2: -0.019, 95% CI -0.060–0.023, p = 0.37) and sex (OR per kg/m2 1.01; 95% CI 1.00–1.02, p = 0.04) were also close to the null.

We also calculated the effect of BMI and Vitamin D on age in males and females separately, and the effect of BMI and Vitamin D on sex in deciles of age. These are reported in Supplementary Table S5. Estimates across subgroups were similar—accepting the play of chance—across these analyses.

Non-linear MR estimates differ substantially across strata

We then used both the residual and doubly-ranked method to generate stratum-specific estimates of the effect of each exposure on age and sex in univariable analyses. We chose to use ten strata, although results were similar using a differing number of strata (data not shown).

In contrast to the null estimate across the whole cohort, estimates across each stratum were markedly different using either method (Fig. 2). Focussing on the residual method, for the predicted effect of Vitamin D on age (Fig. 2A), there were positive effect estimates in lower strata which decreased moving from lower to upper strata for the effect of Vitamin D on age, leading to null effects. However, the uppermost strata had a positive effect estimate. For BMI on age (Fig. 2C), we saw a similar trend, with positive effect estimates in lower strata and negative effect estimates in higher strata.

Fig. 2figure 2

The predicted causal effect of Vitamin D on age (A), sex (B) and BMI on age (C) and sex (D). Estimates were generated in each stratum using the residual method (blue) and the doubly-ranked method (red) and were unadjusted for covariates. The black estimate represents the conventional MR estimate generated using the whole cohort

For the effect of BMI on sex (Fig. 2D), we saw positive estimates in lower strata and negative effects in upper strata, in line with a previous similar analysis we have performed, although that analyses used different parameters including a different instrumental variable [21]. In short the estimates would suggest that BMI increased the odds for being male in those with the lowest non-genetic BMI and increased the odds of being female in upper strata. For the effect of Vitamin D on sex (Fig. 2B) we observed null effects in lower strata but modest positive effects in upper strata suggesting that Vitamin D increases the odds of being male in those with the highest non-genetic BMI.

For the doubly-ranked method, we identified largely null estimates for the effect of Vitamin D on age (Fig. 2A). For the effect of Vitamin D on sex (Fig. 2B), effect estimates were similar between the residual and doubly-ranked method, with increasingly positive effect estimates when moving from lower to upper strata. For the effect of BMI on age (Fig. 2C), we saw similar estimates to the residual method, with positive estimates in lower strata and negative estimates in upper strata. For the effect of BMI on sex (Fig. 2D), we saw marked non-null estimates with a similar trend but, in general, estimates closer to the null for the doubly-ranked relative to the residual method. We calculated Cochran’s Q to formally assess the heterogeneity of strata specific estimates. These results are reported in Supplementary Figure S6. There was evidence—extremely strong in many cases—of strata specific estimate heterogeneity across all negative control outcome analyses except the analysis of Vitamin D on age and sex using the doubly-ranked method.

Estimates were similar in analyses adjusted for covariates: age (for sex as an outcome), sex (for age as an outcome) and the first 5 genetic principal components (Supplementary Figure S1). Log-transforming Vitamin D (as recommended by the creators of the residual method) brought estimates from the residual model closer on average to the doubly-ranked model (Supplementary Figure S2), although these were still non-null across many strata.

For our BMI analysis, we then stratified by smoking status (as was performed in a previous non-linear MR analysis in BMI, producing the headline findings from the paper [17]) and re-ran analyses (Supplementary Figure S3). Smoking had been shown to have a bidirectional relationship with BMI [41] before the non-linear BMJ MR paper was published [17], and therefore it should have been clear that this could have produced collider bias [22]. In these analyses, stratification on smoking made considerable difference to some strata-specific estimates, although the general shape of the association remained similar.

To summarise, using both the residual and doubly-ranked method we identified non-null, stratum-specific associations between two exposures (Vitamin D and BMI) and two negative control outcomes (sex and age) across strata of the exposure in which the expected result is null.

LDL cholesterol and myocardial infarction

To examine both methods in a scenario where we anticipate the shape of the causal relationship we examined the association between LDL-C and a key outcome: myocardial infarction. RCT data from > 30 trials have demonstrated strong and broadly linear effects of LDL-C reduction on both outcomes with meta-analyses identifying slightly larger estimates of the effect of more intensive therapy in those with higher levels of LDL-C at baseline, while a recent MR analysis identified the opposite effect (increased effectiveness of LDL-C reduction in those with lower LDL-C [29], see Box 1 for further background).

First, as with analyses we presented above, we performed negative control outcome analyses of the effect of increasing LDL-C on age and sex. In conventional MR, estimates were essentially null for age (beta on age per 1 mmol/L increase in LDL -0.10; 95% CI -0.23, 0.03, p = 0.15), but suggested increased LDL-C was ‘causal’ for being more female (OR 0.95; 95% CI 0.92–0.98, p = 0.004). Conventional MR estimates of the effect of LDL-C on age were similar in men and women (Supplementary Figure S6), although there was some evidence of heterogeneity of MR estimates of LDL-C on sex across age strata, p value for heterogeneity 0.03).

Compared to conventional MR estimates, divergence from the null was much greater in non-linear MR (Fig. 3), with estimates on age as high as 5.74 (95% CI 1.79–9.63) years per 1 mmol/L LDL-C increase in one strata from the doubly-ranked method, a ~ 50 fold increase in bias compared to the conventional MR estimate. In contrast to our NCO analyses above, the bias was more extreme for the doubly-ranked method than the residual method.

Fig. 3figure 3

The predicted causal effect of increased LDL-C on A age and B sex. Estimates were generated in each stratum using the residual method (blue) and the doubly-ranked method (red) and were unadjusted for covariates

We then went on to perform MR of the effect of increased LDL-C on myocardial infarction. For our primary analysis we included both incident and prevalent cases of MI, the effect on incident and prevalent MI separately are shown in Supplementary Figure S4. In conventional MR we saw the expected effect of LDL-C (OR for MI 1.74 per mmol/L increase in LDL again; 95% CI 1.61–1.87). However, in NLMR we saw unexpected effects, particularly with the doubly-ranked method. For the effect on MI (Fig. 4), we saw large differences in effects across strata, with the strongest effect in those with the lowest LDL-C, and the weakest effects in those in strata 5, 6, and 8.

Fig. 4figure 4

The estimated causal effect of increased LDL-C on MI. Estimates were generated in each stratum using the residual method (blue) and the doubly-ranked method (red) and were unadjusted for covariates

For the residual method, the effect of LDL-C on MI was broadly stable and positive (although the trend still suggested reduction of the size of the effect across strata, p = 0.02). When running analyses adjusting for age, sex, and the first 5 principal components, effects were similar but less extreme for MI ( Supplementary Figure S5), although there was still a clear negative trend (p = 0.001 for using the doubly-ranked method), with effect estimates much larger in those with lower LDL-C than those with higher LDL-C.

We ran sensitivity analyses for LDL-C on MI that included a) adjusting for statin use as a covariate b) in statin users and non-statin users, and c) in under 50 s, where statin use was rare —5.1% ( Supplementary Figure S6). We recognise these estimates adjusting for statin use are highly likely to be biased due to collider bias but include these for interest. As expected, stratifying or adjusting for statin usage altered estimates dramatically. When adding statin as a covariate, estimates of LDL-C favoured protection in upper and lower strata, but were null in the middle strata (u-shaped). When analyses were performed using those only under 50, estimates were similar to our primary analyses but showed less precision. Similarly, analyses restricted to non-statin users looked similar to our primary analyses. Analyses in statin users had reversed estimates in most strata, with increased LDL-C associated with reduced risk of MI.

Triglycerides and cancer mortality

In the recent paper on non-linear effects of lipids [29], one analysis focussed on the effect of triglycerides on cancer mortality. The overall effect was null in both univariable and multivariable MR, but the authors report extremely implausible results: a strong positive effect in strata 1 (of ten): an OR of 2.57 per mmol/L increase of TG (95% CI 1.67 to 3.96); but then a strong negative effect in strata 2: OR 0.56 (95% CI 0.39 to 0.83). All other strata specific estimates are close to the null. We report these estimates in Fig. 5. These results are especially implausible given the known variability in measurements of triglycerides [46, 47]. Therefore we aimed to replicate this as closely as possible, using the same dataset, exposure, outcome, and covariates. In our analysis (Fig. 6), we were unable to replicate this finding, despite the correlation between our strata specific mean triglyceride levels being > 0.99, suggesting we are analysing the same strata.

Fig. 5figure 5

Estimates of the effect of triglycerides on cancer mortality from Yang et al. [29]. Strata specific estimates generated using the doubly-ranked method [29]

Fig. 6figure 6

Estimates of the effect of triglycerides on cancer mortality using both the doubly-ranked (red) and residual method (blue). Estimates were generated adjusting for age, sex, age * sex, age * age, age * age * sex, and the first ten principle components

To further assess whether the prior analysis was unreliable, we used the repeat blood sampling data from UK Biobank, that was taken approximately 2 years after the original visit. Data on triglyceride levels at both time points was available for 13,535 participants. Correlation of TG measures between each time point was moderate (Pearson’s R 0.60). As expected, when generating strata of TG using the doubly-ranked method on the original and repeat sample, participants were often not classified in the same strata. In fact, only 37% of those participants originally in strata 1 remain in strata 1, with 22% of them in strata 2, and the rest in higher strata. For those participants originally in strata 2, only 20% remain in strata 2, with 22% now in strata 1, and the rest in higher strata. These results are visualised in the alluvial plot below (Fig. 7).

Fig. 7figure 7

Classification of 13,535 participants who have repeat measurements into doubly-ranked strata based on triglyceride levels. The left hand Y axis represents the original strata, the right hand Y-axis the strata at repeat sampling

This variability is not consistent with the reported estimates from Yang et al. [29] which would suggest those in strata 1 are at greatly increased risk of cancer mortality, and those in strata 2 at greatly reduced risk, with no effect elsewhere. If this were the case, more than half of the participants in the lower two strata dramatically change their risk of cancer mortality over 2 years, with some estimates flipping from greatly increased risk to protective. To present analyses which implicitly assumes this is possible is simply not credible.

留言 (0)

沒有登入
gif