Rare variants at KCNJ2 are associated with LDL-cholesterol levels in a cross-population study

Discovery cohort

Study subjects were 1751 twins, mainly females (97%; Supplementary Table 1), belonging to the TwinsUK cohort, a UK national register of volunteer adult twins unselected for any diseases or traits6. St. Thomas’ Hospital Research Ethics Committee approved the study, and all twins provided informed written consent.

LDL-cholesterol data

Subjects from TwinsUK had data for longitudinal plasma biochemical indices of HDL-cholesterol (HDL-C), triglycerides (TG) and total cholesterol (TC). Fasting plasma levels were measured using an analysing device (Cobas Fara; Roche Diagnostics, Lewes, UK). TC, HDL-C, and TG were determined by a colorimetric enzymatic method24. A median of four longitudinal measures was available per subject (the maximum number of measures was 8). The median duration of follow-up was 5 years (interquartile range: 2.87–7.51). Longitudinal data on medication usage indicated that 292 subjects were using cholesterol-lowering drugs at the time of lipid measurement, and their effect was taken into account by dividing the measured total cholesterol by 0.8, as previously suggested25. LDL-C values were estimated based on adjusted TC, HDL-C, and TG levels, according to the Friedewald equation26. Longitudinal LDL-C data were averaged, inverse-normal transformed, and adjusted for sex, age, and BMI by linear regression analysis (using the lm function as implemented in the stats R package, v. 3.4.2). No outliers (individuals with any trait value further away than four standard deviations from the dataset mean) were detected.

Whole-genome sequencing and quality control

Whole genome sequencing was carried out at Human Longevity, Inc. (HLI). Details on sample preparation, library preparation, clustering and sequencing have been reported elsewhere27. Reads were mapped to the UCSC GRCh37 human reference genome, and variants were called using Illumina Isaac Analysis Software28 (v. 2.5.26.13). SNVs calls that did not pass all Isaac’s filters (see Isaac Whole Genome Sequencing user guide for details: https://support.illumina.com/content/dam/illumina-support/documents/documentation/software_documentation/basespace/isaac-wgs-user-guide-15050954b.pdf) were discarded as low-quality calls. Multiallelic sites were removed, and missing genotypes were assumed to be reference homozygous calls. We further removed SNV with calls rate <95% and/or Hardy–Weinberg equilibrium (HWE) deviation P < 1 × 10−9, as calculated with PLINK29 (v. 1.9). We finally removed variants mapping to the sex chromosomes, resulting in 58,461,543 biallelic SNVs.

To investigate population structure, we used PLINK to carry out a principal component analysis (PCA) on the genomic data. Here, we used all the common high-quality variants (i.e., missingness across variants = 0 and MAF > 5%) pruned with PLINK (option: --indep-pairwise) using the following parameters: window size of 50-kb, shift of 5 variants, and r2 threshold of 0.05. No significant population stratification was observed (Supplementary Fig. 3).

Single-point association testing

SNVs with MAF ≥ 1% were tested for association with sex, age, and BMI-adjusted plasma LDL-C levels using a linear mixed model as implemented in GEMMA30 (v. 0.97), which uses the matrix of expected genetic sharing to model the non-independence of the twin data. Manhattan and Q–Q plots were generated using the ggplot2 R package. Significant SNVs at conventional genome-wide significance threshold (P < 5 × 10−8) were annotated with Ensembl VEP web interface (v. 94) using the NCBI Reference Sequence Database (release 2015-01).

Identification of known LDL-C loci

We interrogated the NHGRI-EBI GWAS catalog7 (v. 1.0, release: 2019-01-11; association P < 5 × 10−8) to identify overlap between SNVs identified in our study (or in high linkage disequilibrium with them; r2 ≥ 0.8) and previously reported associations for LDL-C levels. Linkage disequilibrium statistics were calculated with PLINK.

Region-based associations testing

Region-based testing was carried out using MONSTER8 (v. 1.3), which uses a generalised version of the SKAT-O method for non-independent samples to perform robust region-based rare-variant association testing. Briefly, this programme uses a mixed effects model that accounts for covariates and additive polygenic effects, accounting for relatedness among individuals and adaptively estimating the correlation structure of variant effects to maximise the statistical power. Coefficients of the relationship were assumed to be 1 for monozygotic twins and 0.5 for dizygotic twins. We assessed associations by collapsing rare (MAF < 1%) variants within fixed-size sliding windows. Specifically, 1,341,836 sliding windows of 4 kb (overlapping by 2 kb) were generated beginning at position 1 bp for each chromosome, as described previously31,32,33). The median number of SNVs per window was 48 (interquartile range: 39–65). We opted for a stringent genome-wide significance threshold of 6 × 10−9, following suggestions from previous simulation studies for WGS-based analytic strategies combining individual common variants testing and aggregated rare variants tests using the sliding window approach34).

Region-based conditional analysis

We used secondary conditional analyses to assess the independence of signals of association from previously known GWAS signals arising from common variation near the chr17:70,493,859–70,519,858 and chr16:67,304,097–67,308,096 regions. Specifically, we retrieved from the NHGRI-EBI GWAS catalog7 (v. 1.0, release: 2019-01-11) any reported SNV associated with blood lipids, adiposity or cardiovascular risk within 500 kb on either side of the studied regions. Then, we fitted a new region-based regression model for LDL-C as implemented in MONSTER, including the reported SNVs as a covariate.

Replication cohort

The Qatar Genome Programme (QGP) is a national population-based initiative aiming at combining a comprehensive analysis of Qatari genomes with the extensive phenotypic data collected at the QBB11. All subjects are enroled in the Qatar Biobank via informed consent. The Qatar Biobank study was approved by the Institutional Review Board from the Hamad Medical Corporation Ethics Committee (IRB protocol E/2017/RES-ACC-0032/0002). Whole genome sequencing was performed on 2935 blood samples using Illumina X10 Sequencing machine, with a 150-base paired-end single-index-read format with an average coverage of 30×. Raw WGS data were converted from the native BCL format to paired-end FASTQ format using bcl2fastq (v. 2.16). The quality of the raw data was assessed using fastqc (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Sequence reads were aligned to the human reference genome version GRCh37 using bwakit (v. 7.12) (https://github.com/lh3/bwa/tree/master/bwakit). Variant calling was performed using GATK haplotype caller35 (v. 3.3). SNVs with call rate <95% or Hardy–Weinberg equilibrium (HWE) deviation P < 1 × 10−9 were discarded from the analysis. We leveraged genotypes at 48 ancestry-informative SNVs to differentiate the three major Qatari subpopulations (Bedouin Arabs, Persians, and East Africans) in the QBB cohort, as previously described36. We excluded from the analysis individuals of African ancestry due to the small sample size (n = 63), as well as 78 individuals, mainly (n = 75, 96%) admixed individuals, separated from the main cohort core based on PC2 value (PC2 < −0.03). Coefficients of the relationship between subjects were modelled using the kinship matrix between all individuals, as evaluated by PLINK (v. 1.9), to account for both familial and more distant relatedness.

LDL-C was measured from blood samples at the Hamad Medical Centre Laboratory, Doha, inverse-normal transformed and adjusted for sex, age and BMI. Outliers (measurements further away than four standard deviations from the dataset mean) were removed from subsequent analyses, which were ultimately conducted on 2587 individuals. The phenotypic details of the study subjects are summarised in Supplementary Table 2.

Food frequency data was not available for the QBB cohort.

Meta-analysis

Meta-analysis of window-based association results was carried out by combining P-values across TwinsUK and QBB using Fisher’s combined probability test, as implemented in metap R package (v. 1.1, sumlog function). A Bonferroni-adjusted threshold of 7.14 × 10−3 was obtained by dividing a conventional α-threshold of 0.05 by the number of non-overlapping 4-kb windows tested for replication (i.e., six windows mapping to KCNJ2 and one window to KCTD19).

Individual SNVs values were meta-analysed using a sample-size weighted Z-score-based analysis, as implemented in METAL37 (release 2011-03-25), and their sex- and ancestry-specific allele frequencies were downloaded from the gnomAD browser38 (v. 3.1.2). The regional association plot was generated using LocusZoom39. Reported eQTLs for KCNJ2 and KCNJ2-AS1 (Ensembl gene identifiers ENSG00000123700.4 and ENSG00000267365.1, respectively) were retrieved from GTEx40 (v. 8) and eQTLGen41 (phase I) portals.

Food frequency data

Overall, 1360 female subjects from the TwinsUK cohort with WGS data available completed a 131-item validated food frequency questionnaire42 (FFQ) between 1993 and 2015 and had BMI data available within ±1 year of completing the FFQ. These included 345 dizygotic pairs, 218 monozygotic pairs, and 234 singletons (Supplementary Table 10). Details on quality control, subject exclusion criteria and methods for nutrient determination from FFQ data can be found elsewhere43. Briefly, study participants reported food intake frequencies for the past year of average serving sizes for 131 foods and beverages on a 9-point scale (ranging from never or less than once per month to 6+ times per day). The 131 food items were aggregated into food groups, defined by similarity in nutrient content and culinary use, resulting in 54 different nutrients. We considered 12 macro-nutrients for association testing, namely: the four major classes (alcohol, protein, total fat, and total carbohydrate) and eight subclasses of macronutrients (starch, total sugar, fibre, saturated fatty acids, monounsaturated fatty acids, polyunsaturated fatty acids, trans fatty acids and cholesterol), based on their relevance to LDL-C metabolism44. Two or more longitudinal measures were available for 53% of the study subjects (the maximum number of measures was 4). The median duration of follow-up was 9 years (interquartile range: 3.57–10.22 years). When longitudinal data were available, nutrients calculated at each timepoint were averaged. Outliers (values further away than four standard deviations from the nutrient mean) were removed.

Association between KCNJ2 and nutrients

We tested the association of aggregated rare variants at KCNJ2 with nutrient intake, adjusted for daily total energy intake, age, and BMI, using MONSTER. Coefficients of relationship were assumed to be 1 for monozygotic twins, and 0.5 for dizygotic twins. To correct for multiple comparisons, we used Li’s method45 to estimate the effective number of independent tests in the nutrient dataset. The derived P-value threshold for statistical significance was 0.05/9 = 5.56 × 10−3.

Subsequently, we used linear mixed model regression analysis (lmer function from lme4 R package, v. 1.1-21) to test for association between lead rare variants at KCNJ2 and selected nutrients, expressed as percentages of daily total energy intake, modelling family structure as random effect, and including age and BMI as covariates. Average nutrient intakes in the TwinsUK cohort (Supplementary Table 10) fell within the interquartile range of the UK adult population46.

Ethical approval

The TwinsUK study was approved by the National Research Ethics Service London-Westminster, the St. Thomas’ Hospital Research Ethics Committee (EC04/015 and 07/H0802/84). The Qatar Biobank study was approved by the Institutional Review Board from the Hamad Medical Corporation Ethics Committee (IRB protocol E/2017/RES-ACC-0032/0002). Written informed consent was obtained from all participants. All research therefore carried out in accordance with the ethical standards laid down in the 1964 Declaration of Helsinki and its later amendments.

留言 (0)

沒有登入
gif