Genome‐wide association analysis of serum alanine and aspartate aminotransferase, and the modifying effects of BMI in 388k European individuals

1 INTRODUCTION

Nonalcoholic fatty liver disease (NAFLD) is an epidemic in the United States with a prevalence between 30% and 40% among adults (Sharma & John, 2019; Spengler & Loomba, 2015). Although often benign, NAFLD may also progress to nonalcoholic steatohepatitis (NASH), which can lead to cirrhosis, liver failure, and liver cancer if left untreated (Adams et al., 2005). Obesity is a strong risk factor for NAFLD. The prevalence of NAFLD in normal-weight (body mass index [BMI] < 25 kg/m2) men and women is on average 7.5% and 6.7%, respectively, compared with 57% and 44% in men and women with a BMI >35 kg/m2 (Yki-Jarvinen, 2014). Although the pathophysiology between obesity and NAFLD is not fully understood, it has been hypothesized that fat accumulation in the liver may be linked to the exposure to free fatty acids and adipokines released from adipose tissue (Jakobsen et al., 2007).

Serum alanine aminotransferase (ALT) and aspartate aminotransferase (AST) are commonly measured biomarkers of liver health. Elevated ALT and AST levels are signatures of liver disease or damage, such as NAFLD, viral hepatitis, and drug-induced liver damage (Kaplan, 2002). Serum ALT and AST levels are considered highly heritable with genetic factors explaining 20%–60% of the phenotypic variance (Makkonen et al., 2009; Rahmioglu et al., 2009; Sookoian & Pirola, 2015). Previous genome-wide association studies (GWAS) identified numerous significant genetic loci associated with ALT and AST levels (Moon et al., 2019; Prins et al., 2017; Sinnott-Armstrong et al., 2019; Young et al., 2019). In addition, some ALT and AST signals were reported to have obesity-dependent effects. For example, PNPLA3 and HSD17B13 associations have been shown to have stronger effects in obese individuals (Abul-Husn et al., 2018; Giudice et al., 2011; Mann & Anstee, 2017; Stojkovic et al., 2014). However, no genome-wide agnostic screening of obesity-dependent effects has been performed.

Here we report a GWAS of serum ALT and AST levels in 388k unrelated individuals of European ancestry from UKB and DiscovEHR. We also report the first genome-wide interaction study (GWIS) to investigate the effect of BMI on ALT and AST genetic associations. Finally, we show that ALT- and AST-associated variants that are significantly modified by BMI may have an important impact on the risk of liver disease risks, for example, fatty liver disease, shedding light on the development of potential therapeutics.

2 METHODS 2.1 UK Biobank (UKB) data

A detailed description of the UKB study design, and collection of phenotypic and genotype data has been published previously by UKB (Bycroft et al., 2018). Consenting individuals participating in the UKB study were genotyped using the Affymetrix UK Biobank Axiom Array and the UK BiLEVE Axiom Array. Genotype imputation was performed centrally by UKB based on a merged reference panel incorporating UK 10 K, 1000 Genome, and Haplotype Reference Consortium (HRC). Imputed variants were then filtered based on minor allele frequency (MAF ≥ 0.5%) and Hardy–Weinberg (p < 10 × 10−15). Individuals of European ancestry were identified using a linear model trained based on PC estimates from HapMap3. Overall, 319,882 unrelated individuals of European ancestry were included for analysis of two enzyme levels: ALT and AST. Serum levels of ALT and AST from the initial visit (2006–2010) were measured centrally by UKB based on International Federation of Clinical Chemistry (IFCC). A description of the UKB sample demographics is shown in Table S1. Further information about the UKB sample collection and each phenotype can also be found via the UKB Showcase website (https://biobank.ndph.ox.ac.uk/showcase/).

2.2 DiscovEHR data

A detailed description of the DiscoverEHR study design has been published previously (Dewey et al., 2016). In short, the DiscovEHR cohort is a subset of individuals enrolled in the Geisinger Healthcare system who consented to participate in Geisinger's MyCode Community Health Initiative. Genomic DNA samples were transferred to the Regeneron Genetics Center from the Geisinger Health System. Genotyping was performed at the Regeneron Genetics Center in two waves. In the first wave, individuals were genotyped using the Illumina Human OmniExpressExome array (8v1-2). In the second wave, genotyping was performed using the llumina Global Screening Array. These two waves are referred as “DiscovEHR OMNI” and “DiscovEHR GSA,” respectively. All analyses were performed in each cohort separately.

Individuals of European ancestry were identified using a linear model trained based on PC estimates from HapMap3. Pairwise identity-by-decent (IBD) estimates were calculated using PLINK2 (Purcell et al., 2007) and pedigrees were reconstructed using PRIMUS (Staples et al., 2014) as described previously (Dewey et al., 2016). Genotype imputation of European individuals was performed separately for DiscovEHR OMNI and GSA using the Michigan Imputation Server (Das et al., 2016) based on the HRC hg19 reference panel. Imputed variants were mapped (lifted over) to GRCh38/hg38, and then filtered based on MAF (MAF ≥ 0.5%), Hardy-Weinberg (p < 10 × 10−15), and imputation info score (≥0.3). A total of 30,980 and 38,003 unrelated European individuals with DiscovEHR OMNI DiscovEHR GSA data, respectively, were included for analysis of serum ALT and AST levels. The median of serially measured laboratory values was selected for analysis following removal of likely spurious values that were >3 standard deviations from the intra-individual median value. Age was defined as age at last encounter.

2.3 Statistical analysis Genome-wide associations analysis (GWAS) of ALT and AST were tested within each cohort using linear regression in PLINK2 (Purcell et al., 2007). Rank inverse normalized ALT and AST residuals were used for analyses after regressing out Age, Age2, Sex, the first 10 principle components, UKB-specific covariates (study site and array, only adjusted in UKB), and BMI (to minimize the discovery of ALT and AST associations confounded by BMI). The rank inverse normalized residuals (RINT) were then tested for association based on the model urn:x-wiley:07410395:media:gepi22392:gepi22392-math-0001where Y is the RINT residuals of ALT or AST, and G is the dosage genotype. Genome-wide interaction analysis (GWIS) was performed using linear regression in PLINK2 (24). Rank inverse normalized ALT and AST residuals were used for analyses after regressing out Age, Age2, Sex, the first 10 principle components, and UKB-specific covariates. BMI was not used for residualization but was instead included as the interaction variable (INT) in the interaction model: urn:x-wiley:07410395:media:gepi22392:gepi22392-math-0002where, Y is the RINT ALT and AST residuals, G is the dosage genotype.

Summary statistics for the UKB and DiscovEHR cohorts were jointly meta-analyzed after genomic correction using the fixed effect inverse variance weighted method implemented in METAL (Willer et al., 2010). Specifically, GWAS genomic correction was performed based on the LDSC regression intercept within each cohort (Bulik-Sullivan et al., 2015); in GWIS, since LDSC intercept has not been tested as a genomic correction factor in interaction models, genomic correction was performed based on inflation factor (lambda). After meta-analysis, no major inflation was detected (Table S2) and therefore post meta-analysis genomic correction was not performed. HLA region was removed in Manhattan plots but were included for analyses.

2.4 Genome-wide significant variants and signals

GCTA COJO was performed on meta-analyzed GWAS and GWIS data, respectively, to identify a set of independently associated signals in each data set (31). A 10 Mb window was selected around signals with p values less than 5 × 10−8. The default settings for collinearity (R2 > 0.9) and allele frequency differences (>0.2) were selected. Linkage disequilibrium (LD) estimates were derived from a random selection of 10 K unrelated European individuals in UKB. A locus is defined as a 1 Mb region. A novel signal is defined with a r2 < 0.1 and at least 1 Mb away from any previously reported ALT or AST GWAS hits (ALT and AST GWAS catalog (Buniello et al., 2019) and a recent UKB study published on bioarchive (Sinnott-Armstrong et al., 2019). A significant GTEx expression quantitative trait locus (eQTL) is defined based on the GTEx Portal accessed on 12/09/2020 (dbGaP accession number phs000424. vN. pN) with a p < 9.80 × 10−10 (Bonferroni correction of the genome-wide significance threshold based on 51 tissue types) in at least one of the issue types (GTEx Consortium, 2015).

2.5 Gene–gene interaction analysis Interaction between PNPLA3 p.I148M and all GCTA COJO selected independent ALT and AST signals were tested. Similar to GWIS, a linear regression model was performed in PLINK2 (24). Rank inverse normalized ALT and AST residuals were analyzed after Age, Age2, Sex, BMI, the first 10 principle components, and UKB-specific covariates were regressed out. The PNPLA3 p.I148M genotype was coded as 0, 1, 2 and was included as the interaction variable in the model below: urn:x-wiley:07410395:media:gepi22392:gepi22392-math-0003where, Y is the RINT ALT and AST residuals, and G is the dosage genotype. A significant interaction signal is defined using a Bonferroni corrected p value threshold. 2.6 Polygenic risk score (PRS) Independent association signals identified by GCTA COJO were used to construct PRS according to the formula: urn:x-wiley:07410395:media:gepi22392:gepi22392-math-0004

The PRS for a given individual i is the sum product of the associated effect size (β) times the number of alternative (effect) alleles at all sites j. Scores were then transformed to a normal distribution with N (0,1). PRS associations are reported in standard deviation units.

2.7 Expression enrichment analysis

Independent association variants were mapped to genes if: (1) had a coding COJO variant, (2) had a coding variant in LD with a COJO variant or, (3) had an eQTL in LD with a COJO variant (but not in LD with a coding variant). Tissue expression enrichment analysis was performed using FUMA (Watanabe et al., 2017). In brief, 30 general tissue type tissue-specific expression patterns were derived from GTEx v8 RNA-seq data (GTEx Consortium, 2015). Upregulated gene-set enrichment was tested and Benjamini–Hochberg (FDR) was used to control for multiple testing. Only gene sets which overlap with ≥2 genes with the input list are reported.

2.8 Liver disease associations

A total of six liver disease traits were selected for associations: fatty liver (K760), Cirrhosis, Fibrosis or Cirrhosis, NALD Cirrhosis, NALD Composite, NASH-NAFLD Composite. The definition and number of cases for each liver disease trait in UKB is summarized in Table S12. Mixed effect associations were computed with the same set of imputed markers using SAIGE (Zhou et al., 2018). Since SAIGE accounts for relatedness, the entire European data set instead of the unrelated subset was analyzed. Age, Age2, Sex, Age × Sex, first 10 principle components, and UKB-specific covariates were adjusted. A fixed effect inverse variance weighted meta-analysis was performed using metal.

3 RESULTS 3.1 UKB and DiscovEHR

In total, 11 million imputed variants from 388,865 unrelated European individuals were analyzed for associations with ALT and AST levels. Sample demographics are summarized in Table S1. In UKB, 319,882 unrelated European individuals (53.7% females) were analyzed with 23.8% of the individuals being obese (BMI > 30 kg/m2). In DiscovEHR, 68,983 unrelated European individuals were included from DiscovEHR OMNI (N = 30,980) and DiscovEHR GSA (N = 38,003), respectively. Compared to UKB, DiscovEHR cohorts have proportionally more females (57.9% in OMNI and 61.3% in GSA) and a higher prevalence (50.2%) of obesity (Table S1).

3.2 Genome-wide association analysis of serum ALT and AST levels

GWAS of ALT and AST was performed in DiscovEHR and UKB separately. In the meta-analysis of the summary statistics from each study, 26,366 ALT and 43,727 AST variants reached genome-wide significance (p < 5 × 10−8) (Figures 1 and S1 and Table S2). SNP-heritability estimates for ALT and AST were approximately 19.09% (SE: 0.0131) and 21.75% (SE: 0.0215), respectively (Bulik-Sullivan et al., 2015). Conditional analysis using GCTA COJO identified 300 ALT and 336 AST independent associations (from 255 to 268 loci) (Tables S3 and S4). Of these, 55 ALT and 71 AST variants are coding or in strong LD (r2 > 0.8) with a coding variant based on Ensembl 85 gene model. Also, 172 ALT and 187 AST signals are in strong linkage disequilibrium (LD) with a significant GTEx expression quantitative trait locus (eQTL) (p < 9.80 × 10−10, after Bonferroni correction of the number of tissue types, Tables S3 and S4) (GTEx Consortium, 2015).

image

Manhattan plots of ALT and AST genome-wide associations. (a) Manhattan plots of ALT genome-wide associations. ALT GWAS main effects are plotted at the top; BMI interaction effects are plotted at the bottom. GCTA COJO selected variants are highlighted. Previously reported signals are highlighted in blue; novel signals are highlighted in green (defined as R2 < 0.1 with any previously reported signals and at least 1 Mb away from any previously reported signals). For visualization, main effect p values are capped at 1E−100, interaction p values are capped at 1E−25. HLA region was excluded in the plot. (b) Manhattan plots of AST genome-wide associations. AST GWAS main effects are plotted at the top; BMI interaction effects are plotted at the bottom. GCTA COJO selected variants are highlighted. Previously reported signals are highlighted in blue; novel signals are highlighted in green (defined as R2 < 0.1 with any previously reported signals and at least 1 Mb away from any previously reported signals). For visualization, main effect p values are capped at 1E−75, interaction p values are capped at 1E−25. HLA region was excluded in the plot. ALT, alanine aminotransferase; AST, aspartate aminotransferase; GWAS, genome-wide association studies

As expected, GWAS identified multiple previously reported liver enzyme associations. For example, rs738409 in patatin-like phospholipase domain-containing protein 3 (PNPLA3) gene (p.I148M, pALT = 4.15 × 10−402, pAST = 1.03 × 10−344, Figure S2) is associated with 1.66 and 1.02 units higher ALT and AST levels (Romeo et al., 2008). Similarly, rs10433937 in 17 β-hydroxysteroid dehydrogenase type 13 (HSD17B13) gene (pALT = 6.31 × 10−68) is significantly associated with lower ALT levels (Abul-Husn et al., 2018). In addition, 81 ALT and 61 AST variants are reported for the first time (having a r2 < 0.1 and at least 1 Mb away from any previously reported ALT or AST GWAS hits, see detail in method). The most significant novel association observed is an intronic variant within the gene peroxisome proliferator-activated receptor gamma (PPARG, rs13083375, pALT = 1.04 × 10−43, Figure S3), lowering ALT by 0.523 units per allele in an additive genetic model. A complete list of novel signals is summarized in Tables S3 and S4.

3.3 GWIS of BMI-dependent effects

A GWIS was performed to identify ALT- and AST-associated loci with BMI-dependent effects. In total, 571 ALT and 951 AST variants with significant BMI interactions were identified (p value for interaction (pINT) < 5 × 10−8, Figures 1 and S1 and Table S2). After conditional analysis, 9 ALT and 12 AST independent signals were observed (Tables 1 and 2). Among them, 4 ALT and 6 AST signals are either coding or in strong LD (r2 > 0.8) with a coding variant; 5 ALT and 8 AST signals are in strong LD with a significant GTEx eQTL (p < 9.80 × 10−10, Tables S5 and S6).

Table 1. Meta-analysis results of genome-wide significant ALT BMI-interaction effect association signals Main effect BMI interaction MarkerNamea Gene Annotation AAFb β (SE) p Directionc β (SE) p Direction 1:220796686:A:G MARC1 Missense 0.7017 0.0373 (0.0026) 2.52E−47 +++ 0.0182 (0.0024) 7.08E−14 +++ 4:87292732:T:C HSD17B13/− Intergenic 0.4324 −0.0415 (0.0024) 1.20E−67 −−− −0.0147 (0.0022) 6.30E−11 −−− 8:58480714:A:G CYP7A1/− Intergenic 0.6605 −0.0044 (0.0025) 0.07531 -++ −0.0134 (0.0023) 1.10E−08 −−− 8:125469835:A:G TRIB1/- Intergenic 0.5061 −0.0509 (0.0024) 7.58E−103 −−− −0.0181 (0.0022) 3.84E−16 −−− 9:129804387:G:A TOR1B Intronic 0.0951 −0.0306 (0.0041) 4.59E−14 −−− −0.0231 (0.0038) 1.40E−09 −−− 10:112142660:A:C GPAM Intronic 0.2475 0.036 (0.0027) 2.57E−39 +++ 0.0147 (0.0026) 1.39E−08 +++ 19:19349732:G:C MAU2 Intronic 0.0708 0.1005 (0.0046) 4.22E−105 +++ 0.0422 (0.0044) 4.86E−22 +++ 19:44908684:T:C APOE Missense 0.1519 −0.0417 (0.0033) 1.04E−36 −−− −0.0223 (0.0031) 5.59E−13 −−− 22:43928850:C:T PNPLA3 Missense 0.2191 0.1220 (0.0028) 4.15E−402 +++ 0.0588 (0.0027) 8.32E−107 +++ Abbreviations: ALT, alanine aminotransferase; BMI, body mass index. aMarkerName is based on chromosome number, position (hg38), reference, and alternative/effect alleles. bAlternative/effect allele frequency. cDirection of the effect across UKB, DiscovEHR OMNI, and DiscovEHR GSA. Table 2. Meta-analysis results of genome-wide significant AST BMI-interaction effect association signals Main effect BMI interaction MarkerNamea Gene Annotation AAFb β​​ (SE) p Directionc β​​ (SE) p Direction 1:220797157:A:G MARC1 Intronic 0.6854 0.0152 (0.0026) 2.49E−09 +++ 0.0206 (0.0025) 4.30E−16 +++ 2:27508073:T:C GCKR Missense 0.6004 −0.0246 (0.0024) 3.53E−24 −−− −0.0239 (0.0024) 2.78E−23 −−− 4:87292006:C:T HSD17B13/− Intergenic 0.4277 −0.0351 (0.0024) 2.92E−48 −−− −0.0191 (0.0024) 1.11E−15 −−− 8:125469835:A:G TRIB1/− Intergenic 0.5061 −0.024 (0.0024) 5.20E−24 −−− −0.0201 (0.0024) 1.57E−17 −−− 9:114383763:C:G AKNA Intronic 0.4853 −0.0324 (0.0024) 9.01E−42 −−− −0.0146 (0.0024) 7.30E−10 −−− 9:129804387:G:A TOR1B Intronic 0.0951 −0.0261 (0.0041) 1.52E−10 −−− −0.0225 (0.004) 2.74E−08 −−− 10:100174478:C:T ERLIN1 Intronic 0.4379 −0.0458 (0.0024) 2.75E−81 −−− −0.0144 (0.0024) 1.48E−09 −−− 10:112187282:C:T GPAM Upstream 0.7193 −0.0208 (0.0027) 5.73E−15 −−− −0.0153 (0.0026) 7.70E−09 −−− 19:7222366:G:C INSR Intronic 0.5818 0.0123 (0.0024) 3.25E−07 +++ 0.0134 (0.0024) 2.33E−08 +++ 19:19349732:G:C MAU2 Intronic 0.0708 0.0742 (0.0046) 1.38E−57 +++ 0.0459 (0.0046) 3.57E−23 +++ 19:44888997:C:T NECTIN2 3′ UTR 0.1702 0.003 (0.0032) 0.3392 −++ −0.025 (0.0031) 1.39E−15 −−− 22:43928847:C:G PNPLA3 Missense 0.2191 0.1136 (0.0029) 1.03E−344 +++ 0.0697 (0.0028) 2.95E−133 +++ Abbreviations: AST, aspartate aminotransferase; BMI, body mass index. aMarkerName is based on chromosome number, position (hg38), reference, and alternative/effect alleles. bAlternative/effect allele frequency. cDirection of the effect across UKB, DiscovEHR OMNI, and DiscovEHR GSA.

GWIS identified several previously reported BMI-modified signals, for example, PNPLA3, HSD17B13 (Abul-Husn et al., 2018; Giudice et al., 2011; Mann & Anstee, 2017; Stojkovic et al., 2014). The most significant BMI interaction was detected at rs738409 in PNPLA3 (p.I148M, pALT_INT = 8.32 × 10−107, pAST_INT = 2.95 × 10−133). In the highest BMI quartile (top 25%, BMI > 29.82 kg/m2), the effect of alternate allele (G) is 10-fold greater (3.37 units/allele) than the effect observed in the low BMI quartile (bottom 25%, BMI < 24.13 kg/m2) (Figure 2). Similarly, rs6811902 in HSD17B13 is also significantly modified by BMI (pALT_INT = 6.30 × 10−11, pAST_INT = 1.11 × 10−15) where the alternate allele (C) is associated with a greater effect on lowering ALT and AST in individuals with elevated BMI relative to the low BMI quartile (Figure 2).

image

Forest plot of PNPLA3, HSD17B13, MARC1, and CYP7A1 associations with ALT, stratified by BMI groups. (a) PNPLA3 I148M (22:43928847:C:G) association with ALT, stratified by BMI groups. (b) HSD17B13 (4:87292732:T:C) association with ALT, stratified by BMI groups. (c) MARC1 (1:220796686:A:G) association with ALT, stratified by BMI groups. (d) CYP7A1 (8:58480714:A:G) association with ALT, stratified by BMI groups. Association was analyzed with RINTed phenotypes in UKB with the adjustment of age, age2, sex, BMI, 10PCs, and study-specific covariates. BMI groups are defined based on the 25% quartiles of BMI distribution. ALT, alanine aminotransferase; BMI, body mass index

In addition, the GWIS also identified novel BMI-dependent associations in previously reported liver disease loci. For example, consistent with previous reports (Emdin et al., 2020), the alternative allele (G) of the missense variant rs2642438 (p.T165A) in mitochondrial amidoxime reducing component 1 (MARC1) is associated with higher ALT and AST levels (pALT = 2.52 × 10−47, pAST = 6.24 × 10−11). The associations were significantly modified by BMI (pALT_INT = 7.08 × 10−14, pAST_INT = 4.70 × 10−16) and a greater effect was observed in the higher BMI quartile. On average, the alternative allele is associated with 0.128 units higher ALT in the low BMI quartile and 0.935 units higher ALT in the high BMI quartile (Figure 2). Similarly, significant BMI-dependent effects were also observed in variants from gene MAU2 sister chromatid cohesion factor (MAU2) and tribbles pseudokinase 1 (TRIB1) (Tables S5 and S6).

GWIS also identified a novel BMI interaction with insignificant main effect association. An intergenic variant (rs4738684) near gene cytochrome P450 family 7 subfamily A member 1 (CYP7A1) was identified with a significant BMI interaction effect (pINT = 1.10 × 10−8). The alternative allele (G) is associated with lower ALT level only in the high BMI quartile and no significant effect is detected in the low BMI individuals (Figures 2 and S5). CYP7A1 encodes a protein that catalyzes the first reaction in the cholesterol catabolic pathway and converts cholesterol to bile acids, which is the primary mechanism for the removal of cholesterol from the body (O'Leary et al., 2016). However, it is still unclear why observed ALT association is only present in high BMI individuals and no effect is observed in low BMI individuals.

3.4 Gene × Gene interaction with PNPLA3 I148M

Independently associated ALT (N = 300) and AST (N = 336) signals were evaluated for genetic interactions with PNPLA3 p.I148M, as a proxy for their therapeutic potential in PNPLA3 risk allele carriers. Only HSD17B13 variants (rs10433937, pALT_INT = 3.19 × 10−7; rs13117201, pAST_INT = 4.91 × 10−9) met the stringent Bonferroni corrected significant threshold (Tables S7 and S8). The magnitude of the per PNPLA3 p.I148M allele increase in ALT and AST was significantly lowered by HSD17B13 genotype. On average, per HSD17B13 allele reduces the PNPLA3 p.I148M allelic effect on ALT by 21%. Interestingly, a greater effect was observed in high BMI quartile (Figure 3).

留言 (0)

沒有登入
gif