Using the global randomization test as a Mendelian randomization falsification test for the exclusion restriction assumption

Overview of the randomization test procedure

The global randomization test approach as presented in [15] has the following steps:

1.

Define set of covariates to test—this depends on the specific scenario (see applied examples).

2.

Calculate the test statistic T, the Mahalanobis distance (with values \(\left[ \right))\), which is a global measure of balance and bias across all covariates tested.

3.

Permute the genetic IV Np times and for each calculate the test statistic t, where Np is specified by the researcher. We use Np = 5000 in our simulations and applied examples below.

4.

Calculate the P value as the proportion of permutations with a test statistic t at least as strong as T, i.e. \(\left| \right|/}_\).

We generalize the approach in [15] (that focused on binary IVs) to continuous, ordinal and binary IVs, as described in the following section.

Generalising the Mahalanobis distance to allow continuous, ordered categorical and binary variables

We use the Mahalanobis distance as a global measure of balance defined for an IV with two categories as:

$$} = \sqrt _ - \bar_ } \right)^ \left[ \right]^ \left( _ - \bar_ } \right)}$$

where C is a \(m \times n\) matrix of m participants and n covariates, and \(\bar_\) is a vector containing the mean of the covariates for the subset of participants where z = a.

Since MD is affinely invariant, this is also a global measure of bias (i.e. changing prevalence difference \(\bar_ - \bar_\) to bias measure \(\frac_ - \bar_ }}_ - \bar_ }}\) in the above equation, where X is the exposure, would result in the same MD value).

To generalize to IVs with three categories (i.e., SNP dosages) and continuous IVs (i.e., genetic risk scores), we generalize this equation to:

$$} = \sqrt \left[ \right]^ meandiff\left( C \right)}$$

where meandiff(C) is a vector of length n, of the mean difference of each covariate per 1 unit higher IV. This assumes a linear relationship between the IV and covariates.

We estimate the mean difference using the correlation between z and each covariate \(}_\):

$$meandiff\left( }_ } \right) = }\left( },}_ } \right) \times \frac}_ } \right)}}$$

where cor(A, B) is the Pearson’s r2 between A and B, and SD(X) is the standard deviation of X. This approach also assumes a linear relationship between the IV and covariates but is ~12 times faster than estimating the mean difference using linear regression (see Supplementary section S1 for further details). This is particularly helpful for this work as the randomization test uses permutation testing and we conduct extensive simulations. Note that the MD is invariant to the scale of covariates in C, but not invariant to the scale of Z, and the resultant global randomization test P value is independent of both.

Simulations

We conduct simulations to explore scenarios where the global randomization test may be useful when testing for horizontal pleiotropy and selection bias. We conduct separate simulations to test for selection bias and horizontal pleiotropy, illustrating how the covariates can be chosen to test each scenario. We report the aims, data‐generating mechanisms, estimands, methods, and performance measures of our simulations (the ADEMP approach) [23].

Simulation A: Assessing statistical power to detect potential selection bias

This simulation is based on the situation where a researcher wants to determine whether the GRS relates to covariates that are unlikely to be downstream effects of the GRS, such that an association would indicate possible selection bias. Traits that are downstream effects of a GRS and affect selection cannot be assessed here due to their association with the GRS irrespective of selection bias. Furthermore, it is necessary to assume that the genetic determinants of the covariates causing selection are different to, and uncorrelated with, those of the exposure.

Aim: To compare statistical power of the global randomization test compared to alternative tests that test each covariate individually, across different (1) number of covariates that affect selection, (2) number of covariates than do not affect selection, and (3) correlations between covariates.

Data generating mechanism: The directed acyclic graph (DAG) on which our data generating mechanism is based is shown in Fig. 2a. We set the proportion selected to 5.5%, based on the UK Biobank recruitment rate where 9.2 million invited and 5.5% of those joined the study. We use a sample size of 920,000, 10% of the number invited in UK Biobank to keep the simulation manageable [24]. A set of covariates C including those affecting selection \(C_\) and those not affecting selection \(C_}}\) are included. We vary the number of covariates affecting selection \(N_\), the number of covariates not affecting selection \(N_}}\), the correlation between all variables in C, \(r_^\), the variance of X explained by Z, \(r_^\), and the pseudo variance of S explained by Cs and X, \(r_^\). We use a fully factorial design, running our simulation with all combinations of the following values: \(N_ \in\) , \(N_}} \in\) , \(r_^\) \(\in\) \right)\)}, \(r_^ \in \left\ \right\}\), and \(r_^ \in \. For the \(r_^ = N\left( \right)\) setting the covariate correlations are generated from a normal distribution with mean = 0 and standard deviation (SD) = 0.1, to reflect correlations reported previously [25]. All covariates in C, and exposure X are continuous with mean = 0 and SD = 1. Instrument Z is assumed to be a normally distributed genetic risk score with mean = 0 and SD = 1.

Fig. 2figure 2

DAGs for simulation data generating mechanisms. a Selection bias, b Horizontal pleiotropy. DAG (a): Covariates \(C_\) and \(C_}}\) are confounders of X and Y. Covariates \(C_\) and exposure X affect selection (S) inducing an association between instrument Z and \(C_\). X, \(C_\) and \(C_}}\) may affect Y but effects on Y do not impact associations between Z and \(C_\) tested by global randomization test. The total effect of the following paths on the DAG is kept constant irrespective of the number of covariates in \(C_\) and \(C_}}\): \(C_ \to X\); \(C_ \to Y\); \(C_ \to S\); \(C_}} \to X\); \(C_}} \to Y\). Dashed line indicates a statistical association induced through conditioning on S. DAG (b): Covariates \(C_\) and \(C_ }}\) are confounders of X and Y. In this DAG we depict a horizontally pleiotropic instrument that affects Y both via and not via X. While this isn’t necessarily the case (i.e. Zhp might affect Y via X only, or directly (i.e. not via X)) here we are showing an example—the exact relationship between the instruments and X and Y does not impact the randomization test because the randomization test only tests the association between each instrument and the covariate set, and the relationships with X and Y do not impact the strength of these associations (unlike in the selection bias example where e.g. the effect of Z on X impacts the magnitude of the selection induced association between Z and the covariate set).

A DAG with simulation parameters is shown in Supplementary Figure 1a. The outcome Y is not modelled as, according to our DAG in Fig. 2a, the association between the SNPs and covariates induced by conditioning on selection does not depend on our definition of Y.

Estimand or other target: Our target is the test of the null hypothesis of no association between the GRS and covariate set. Note that in this scenario (in contrast to simulation B, see below) we use the randomization test with the GRS as we assume (see DAG in Fig. 1a) the exposure of interest affects selection, hence selection bias would impact all instruments jointly (and testing using the GRS rather than SNPs individually maximizes statistical power).

Methods: We compare the global randomization test with 3 alternative approaches to test the association of the IV with C:

a.

individual tests of each covariate with Bonferroni correction, referred to as test-Bonf.

b.

individual tests of each covariate with correction for the effective number of tests performed, referred to as test-indep.

c.

permutation testing where the test statistic is the maximum r2 of each of the covariates with the IV, referred to as test-r2perm.

To calculate (a) and (b) we first regress the IV on each of the covariates using univariable regression, and find the lowest P value of these results, pvaluemin. The test-Bonf P value is then calculated as \(\min \left( \times }_ } \right)\). The P value for test-indep in calculated as \(}\left( \times }_ } \right)\), where NI is the estimated effective number of independent tests calculated using spectral decomposition [26, 27]. Spectral decomposition estimates the effective number of independent tests from the phenotype correlation matrix, and has been shown to be accurate (compared to the more computationally intensive permutation testing approach) [28].

Performance measures: We evaluate statistical power using rejection percentage [23], which is the proportion of simulation repetitions, \(n_\), where the null hypothesis is rejected (see further details in Supplementary section S3). We set \(n_\) = 500 in all our simulations. The four tests (global randomization test, test-Bonf, test-indep and test-r2perm) are applied to the same simulated dataset in each simulation repetition.

We repeated these simulations, including only half the covariates in \(C_\) and \(C_}}\) to represent the scenario where only a subset of these covariates is either available or hypothesized to affect selection. We also repeated these simulations in the whole sample (i.e., with no selection) to check that we observe ~5% type-1 error, i.e., around 5% of the permutations incorrectly identify an association between the IV and covariate set.

Simulation B: Assessing statistical power to detect potential horizontal pleiotropy

This simulation is based on the situation where a researcher has a GRS but wants to determine whether any of the SNPs included may affect the outcome via horizontally pleiotropic pathways rather than (solely) via the exposure of interest.

Aim: To compare statistical power of the global randomization test with alternatives that test the association of a SNP with each covariate individually, to identify whether the SNP acts via a horizontally pleiotropic pathway. We evaluate this across different: (1) numbers of covariates affected and not affected by a horizontally pleiotropic SNP, and (2) magnitude of effect of a horizontally pleiotropic SNP on covariates on the horizontal pleiotropy pathway.

Data generating mechanisms: The DAG on which our DGM is based is shown in Fig. 2b. We use a sample size of 500,000 reflecting the size of UK Biobank. We include one horizontally pleiotropic SNP, \(Z_ ,\) and one non-horizontally pleiotropic \(Z_ }}\). We generate a set of covariates C including those affected by \(Z_\), and not affected by \(Z_\), denoted \(C_\) and \(C_ }}\), respectively. We vary the number of covariates in \(C_\) and \(C_ }}\), denoted \(N_\) and \(N_ }}\), the variance of each covariate in \(C_\) explained by \(Z_\), \(r_^\), and the correlation between all variables in C, \(r_^\). We use a fully factorial design where \(N_ \in\) , \(N_ }} \in\) , \(r_^ \in\) , and \(r_^\) \(\in\) }\left( \right)\)}. The exposure X and outcome Y are not modelled as, according to our DAG in Fig. 2b, the association between the SNPs and covariates are not dependent on X or Y.

All covariates in C are continuous with mean = 0 and SD = 1. Each SNP is a 3-category ordinal variable (representing SNP dosages) assuming allele frequencies of 0.8 and 0.2 and assuming Hardy Weinberg Equilibrium (such that Pdosage0 = 0.64, Pdosage1 = 0.32, and Pdosage2 = 0.04). A DAG with simulation parameters is shown in Supplementary Figure 1b.

Estimand or other target: Our target is the test of the null hypothesis of no association between each SNP in \(Z_\) and the covariates. Note that in this scenario (in contrast to simulation A, see above) we use the randomization test with the SNPs individually, as we are seeking to identify which SNPs may affect the outcome via horizontally pleiotropic pathways.

Methods: As in simulation A, we compare 4 approaches to test the association of the IV with C: the global randomization test, test-Bonf, and test-indep and test-r2perm.

Performance measures: We evaluate statistical power using rejection percentage [23].

We also estimated these performance measures using \(Z_ }}\) to check that we observe ~5% type-1 error, i.e., around 5% of the permutations incorrectly identify an association between \(Z_ }}\) and C.

Applied examplesStudy population

UK Biobank is a prospective cohort of 503 325 men and women in the UK aged between 37 and 73 years (99.5% were between 40 and 69 years). This cohort includes a large and diverse range of data from blood, urine and saliva samples and health and lifestyle questionnaires. UK Biobank received ethical approval from the UK National Health Service’s National Research Ethics Service (ref 11/NW/0382). This research was conducted under UK Biobank application number 16729, using phenotypic dataset ID 48196.

Of the 463,005 UK Biobank participants with genetic data passing quality control [29], we removed 77,758 minimally related participants, 48,233 non-white British participants, and 39 participants who had since withdrawn from the study. Our sample therefore included 336,975 participants. A data flow diagram is provided in Supplementary Figure 2.

Example 1: Testing for evidence of selection bias

We assess the potential for selection bias in Mendelian randomization studies in UK Biobank that use C-reactive protein (CRP) as the exposure of interest. A previous GWAS meta-analysis (that did not include UK Biobank) identified 58 SNPs robustly associated with CRP [30]. We generate the CRP GRS as a weighted sum of the 58 SNPs, weighted by the effect size of the CRP-increasing allele of each SNP on CRP.

We use two sets of covariates, a restricted set and a more liberal set. The restricted set comprises just age and sex—two factors that cannot be on the causal path between the (constituent SNPs of the) CRP GRS and outcome. The liberal set were chosen as phenotypes that, given our exposure of interest (CRP) we believe are not likely to either be on the causal pathway between the CRP GRS and CRP, determinants of SNPs in CRP GRS, or downstream effects of CRP (or CRP GRS), such that if an association exists between this set and the CRP GRS we would think it is more likely that this is due to selection bias rather than a causal effect of one or more of the CRP SNPs on (one or more of) these traits. This second set additionally includes socio-economic factors (Townsend deprivation index, age completed full time education), north and east coordinates of home location, and height [31]. Age, sex, home location and education were self-reported at baseline. Sex was validated against genetic sex. Height was measured at baseline, to the nearest cm using a Seca 202 device. The participant’s age when completing full time education was used as a measure of education level. Townsend deprivation index (a score representing the deprivation of the participant’s neighbourhood) was calculated immediately prior to participants joining UK Biobank using their self-reported postcode of residence. This gave 7 variables included in the liberal covariate set.

We ran the global randomization test and alternative approaches to test for an association of the CRP GRS with the restricted and liberal covariate sets, respectively. We also repeated these analyses using the rs2794520 cis CRP SNP only, to explore detection of selection bias using the SNP set (in this case just one SNP) that is unlikely to be horizontally pleiotropic. SNP rs2794520 was used as it is only cis CRP SNP among the 58 independent SNPs identified in the GWAS (and used in the GRS above).

Example 2: Testing for evidence of horizontal pleiotropy among CRP SNPs

We used the global randomization test to identify CRP-associated SNPs that may have horizontally pleiotropic effects on coronary heart disease (CHD). We formed our covariate set using a previous study [10] that found little evidence of an association of cis CRP SNPs with a set of CHD risk factors. These risk factors are therefore unlikely to be on the causal pathway between CRP levels and CHD, such that associations with other CRP-associated SNPs (or a combined CRP GRS) would be most likely due to this SNP being horizontally pleiotropic. These risk factors can therefore be used as the covariate set in the randomization test, to test for evidence of horizontal pleiotropy.

Our covariate set comprised the subset of these phenotypes that were measured in the full UKB sample (e.g., some such as LDL cholesterol were only available in the NMR metabolomics UKB subsample), namely: BMI, systolic blood pressure (SBP), diastolic blood pressure (DBP), total cholesterol, HDL cholesterol, apolipoprotein A1, apolipoprotein B, albumin, lipoprotein A, leukocyte count, glucose, smoking pack years, weight and waist hip ratio. Details of the covariates are provided in Supplementary section S4. CHD events were ascertained using both self-reported data and linkage to mortality data and hospital inpatient records (see Supplementary section S5 for further details).

We estimated the causal effect of CRP on CHD using two-stage IV probit regression, first using all CRP SNPs and then using only those not identified as horizontally pleiotropic, using a nominal threshold of p < 0.05. We use log transformed CRP levels (mg/L) such that the IV probit regression estimate is the difference in probit index per 1 unit higher log CRP levels. We then took the exponent of 1.6 times the probit regression estimates, to approximate the association in terms of the change of odds per 1 unit higher log CRP levels [32]. We repeated analyses using a threshold of p < 0.001, to assess the sensitivity of results to the stringency of SNP selection.

Analyses were performed in R version 4.0.3, Stata version 15 and Matlab r2015a, and all of our analysis code are available at https://github.com/MRCIEU/MR-randomization-test. Git tag v0.2 corresponds to the version of the analyses presented here.

留言 (0)

沒有登入
gif