Sex differences in DNA methylation across gestation: a large scale, cross-cohort, multi-tissue analysis

Study populations

Our analysis included three cohorts: the betamethasone (BET) sample, the InTraUterine sampling in early pregnancy (ITU) sample and the Prediction and Prevention of Preeclampsia and Intrauterine Growth Restriction (PREDO) sample.

BET

The BET cohort is described in detail in [23,24,25]. In brief, pregnant mothers exposed to a single course of antenatal BET (Celestan®, MSD GmbH, Haar, Germany) for fetal lung maturation (2 × 12 mg intramuscular; n = 54) between 23 + 5 weeks to 34 + 0 weeks (weeks + days of gestation) were recruited prospectively before birth and compared to a gestational-age-matched control group that received no antenatal BET (n = 85). The control group consisted of gestational age (weeks) and fetal sex-matched pregnancies at the time of delivery. This study was approved by the Ethics Committee of the Charité Universitäts-medizin, Berlin, Germany.

ITU

The InTraUterine sampling in early pregnancy (ITU) is a prospective pregnancy cohort study [26]. It includes 943 pregnant Finnish women, their partners and their singleton children born alive between April 2012 and December 2017. Women were recruited through the national, voluntary trisomy 21 screening between 9 + 0 weeks and 21 + 6 gestational weeks. Of the participating women, 544 were screened positive and underwent fetal chromosomal testing. Test results then suggested no fetal chromosomal abnormality. Three hundred and ninety-nine women who screened negative and did not undergo fetal chromosomal testing were assessed as controls. The ITU research protocol has been approved by the Coordinating Ethics Committee of the Helsinki and Uusimaa Hospital District, Finland.

PREDO

The Prediction and Prevention of Preeclampsia and Intrauterine Growth Restriction (PREDO) study is a prospective, multicenter study of Finnish women who gave birth to a singleton live child between January 2006 and December 2010 [27]. The cohort includes 4774 mother–child dyads. PREDO recruited women when they visited antenatal clinics at any of the 10 study hospitals for their first ultrasound screening at 12 + 0 weeks to 13 + 6 weeks + days of gestation. Two groups of pregnant women were enrolled based on two eligibility criteria: pregnant women with a known clinical risk factor status for preeclampsia and intrauterine growth restriction (IUGR) and pregnant women who volunteered to participate regardless of their risk factor status for preeclampsia and IUGR. The first criterion was used to enrich the cohort for preeclampsia. The study protocol was approved by the Ethics Committee of Obstetrics and Gynaecology and Women, Children and Psychiatry of the Helsinki and Uusimaa Hospital District and by the participating hospitals.

Pregnancy and birth outcome data

Gestational age at sampling was based on fetal ultrasound. Child sex, maternal age at delivery and smoking during pregnancy (yes/no) were obtained from pre- and postnatal assessments and medical chart abstractions in the BET study, and extracted from the Finnish Medical Birth Register (MBR) [28] for ITU and PREDO. Further characteristics such as placental weight, birth weight, birth length and head circumference, mode of delivery (vaginal delivery or caesarean section), and induction of labor (yes or no) were obtained from medical records or the MBR.

Biological sampling and pre-processing of omics dataPlacental tissue sampling in BET, ITU and PREDO

In the BET study, full-thickness placental biopsies were taken by a uniform random sampling protocol from both peripheral and central areas starting immediately after delivery. In ITU, placenta samples were collected at birth, whereby midwives/trained staff took nine-site biopsies (within maximum 120 min after delivery) from the fetal side of the placenta. In PREDO, placenta samples were collected at birth, whereby midwives/trained staff took nine-site biopsies (within maximum 90 min after delivery) from the decidual side of the placenta. All samples were stored at − 80 °C.

Chorionic villus sampling (CVS) and cord blood sampling in ITU

In the ITU study also first-trimester placental biopsies as well as cord blood samples were available. First trimester placenta samples were obtained from leftover CVS, due to indications of elevated risk for chromosomal abnormalities between 10 and 15 weeks of gestation. Cord blood samples were taken immediately after birth by a midwife.

Pre-processing of DNA methylation (DNAm) in CVS, placenta and cordblood

DNA was extracted according to standard procedures and DNAm was assessed using the Illumina Infinium MethylationEPIC array v1 (Illumina, San Diego, USA). All DNAm data were pre-processed in the same way, as described in detail previously [29, 30] using an adapted pipeline from [31] implemented in the R package minfi [32]. Probes with SNPs at the CpG-site as well as probes reported to be cross-hybridizing [33, 34] were removed. Beta values were normalized using stratified quantile normalization [35] and beta-mixture quantile normalization (BMIQ) [36]. Batch-effects were removed performing ComBat [37] on the M-values (see Figure S1A–C for PCA-plots illustrating how much variability was removed due to Combat), and only autosomal DNAm probes were analyzed. The final datasets comprised 137 placenta samples (n = 708,222 probes) from the BET study, 264 CVS samples (n = 716,331 probes), 470 birth placental samples (n = 665,190 probes) and 426 cord blood samples (n = 724,075 probes) from ITU, and 139 placenta samples (n = 755,154 probes) from PREDO. Overall, 659,048 probes overlapped across all three birth placenta datasets.

Cell-type composition estimations from DNAm data

Cell-type composition into nucleated red blood cells, trophoblasts, syncytiotrophoblasts, stromal cells, Hofbauer and endothelial cells in CVS and placenta was estimated using the R package planet, by applying the robust partial correlation algorithm [38] based on a reference sample as described in [39]. Cell-type composition into nucleated red blood cells, granulocytes, monocytes, natural killer cells, B cells, CD4(+) T cells and CD8(+) T cells in cord blood was estimated in the R-package minfi [32] based on the approach proposed in [40].

Gene expression in CVS and placenta

The QuantSeq 3′ mRNA-Seq Library Prep Kit (Lexogen) was used to generate messenger RNA (mRNA) sequencing libraries from both CVS and birth placenta RNA samples in ITU. Libraries were multiplexed and sequenced using the Illumina HighSeq4000 system at a depth of 10 million reads per mRNA library. Adapter sequences were trimmed with cutadapt [41], sequenced reads were aligned to the human genome reference using the STAR aligner [42] and reads were quantified with featureCounts [43]. We filtered for autosomal genes with a raw read count of at least 10 in at least 90% of all individuals. This resulted in 260 individuals and 8754 genes for CVS and 478 individuals and 7955 genes for placenta.

Cell-type correction in gene expression analysis

We applied surrogate variable (SV) analysis to correct for possible batch effects and cell type heterogeneity in the ITU gene expression samples [37]. For CVS and placenta, the first SV was detected as significant (according to the permutation procedure implemented in the package) and used as covariate in the subsequent analyses.

Genotyping and imputation in cord blood and placenta

Genotyping was performed on Illumina Infinium Global Screening arrays for BET (from placenta) and ITU (from cord blood) and on Illumina Human Omni Express Arrays for PREDO (from cord blood). Genotypes were pre-processed separately for each cohort using a standard quality control pipeline—details are given in [29]. After quality control, each cohort was imputed based on the 1000 Genomes Phase 3 reference sample using shapeit2 [44] and impute2 [44]. After imputation, only SNPs with an info-score > 0.6 were retained in the analyses and converted into best-guessed genotypes using a probability threshold of 90%. We ran a second round of stringent quality control, keeping only SNPs with a call rate of at least 98%, a minor allele frequency of at least 5% and p-value for deviation from Hardy–Weinberg-Equilibrium > 1 × 10–05. This resulted in 136 individuals with 2,941,734 SNPs in BET, 425 individuals with 3,526,614 SNPs in ITU and 117 individuals with 5,284,432 SNPs in PREDO.

Statistical analyses

If not stated otherwise, statistical analyses were conducted in R 4.2.1 (https://www.R-project.org/). All statistical analyses performed on DNAm data was based on M-values.

Robust linear regression on DNAm

We conducted robust linear regression models using White’s estimator as implemented in the R-package MASS in each cohort separately. We fitted one regression model per CpG using the M-value of the respective CpG-site as dependent variable and sex (using males as reference category) as predictor and gestational age at sampling, maternal smoking during pregnancy, maternal age as well as cell type proportions as covariates. The same analysis was conducted separately for each tissue in the ITU CVS (n = 264) and cord blood samples (n = 426) as well as in the ITU samples which had DNAm available in CVS, placenta at birth and cord blood (n = 65).

Meta-analysis

Inflation of test-statistics from the robust linear regression in each cohort was assessed using the R-package bacon [45]. Afterwards, we conducted a random-effects meta-analysis using DerSimonian-Laird random effects model as implemented in an extension of METAL [46] correcting the individual results for the estimated inflation-factors (BET: λ = 1.14, ITU: λ = 1.37, PREDO: λ = 1.17). As suggested by [47], we considered p-values below 9 × 10–08 as epigenome-wide significant in the meta-analyses. These CpGs were defined as differentially methylated probes (DMPs). We included those 758,101 CpG-sites in the meta-analysis which were present in at least one cohort, 711,597 CpG-sites were available in at least two cohorts. Manhattan plots were created using the R-package GWASTools [48].

Sensitivity analysis

To evaluate if any of the sex-differential DMPs were driven by cohort-specific variables such as maternal health, we performed sensitivity analyses. Herefore, we additionally covaried for betamethasone intake in the BET cohort (24/29 male/female with betamethasone treatment, 46/38 male/female without betamethasone treatment), prenatal testing (142/139 male/female with prenatal testing, 96/93 male/female without prenatal testing) and maternal education (45/38 male/female with primary or secondary education, 184/189 male/female with tertiary education) as proxy for socio-economic status (SES) in the ITU cohort as well as known risk factors for pre-eclampsia and IUGR (24/27 male/female with known risk, 43/45 male/female without known risk) and maternal education (available in 136 IDs with: 23/28 male/female with primary or secondary education, 42/43 male/female with tertiary education) in PREDO. Unfortunately, no SES measurement was available in the BET cohort. Maternal education was used as proxy for SES as data was most complete for this variable and maternal education has been shown to be highly correlated with SES [49]. Afterwards, we compared effect directions and effect sizes to the original estimates. For all DMPs, effect directions were consistent across analyses with and without further covariates. Furthermore, effect estimates correlated very highly across analyses (r > 0.93).

Differential methylated regions (DMRs)

We calculated DMRs with comb-p [50] using the meta-analysis p-value for each CpG-site, requiring a nominal significant p-value to define a region and setting a cut-off of 200 base pairs to extend a region if another CpG-site with an at least nominal significant p-value was found within that region. Afterwards, only regions presenting with an epigenome-wide significant p-value (p < 9 × 10–08) were retained.

Enrichment for genomic location

To check if DMPs were enriched for specific genomic locations, we mapped them with regards to relation to CpG-islands, based on Illumina’s annotation, as well as with regards to location in genes using the R-package ChipSeeker [51]. To test for enrichment, we applied two-sided Fisher-tests using those 747,781 CpGs which were included in our analysis but did not show epigenome-wide significance in the meta-analysis as background.

Enrichment for pathways and tissues

For enrichment analysis, we mapped CpG-sites to the nearest gene with the R-package bumphunter [52]. DMPs mapped to 3,817 unique genes. We performed enrichment tests for GO terms and tissue specific expression in the Genotype-Tissue Expression (GTEx) project v8 [53] with FUMA (https://fuma.ctglab.nl, [54]). All CpGs included in the analysis mapped to unique 20,214 genes and were used as background. Furthermore, we checked for tissue specific gene expression in the Human Protein Atlas [55] using the R-package TissueEnrich [56].

meQTL analysis in birth placenta

The datasets used for meQTL calculation comprised 136 individuals (70 males and 66 females) with 2,941,734 SNPs available in BET, 425 individuals (219 males and 206 females) with 3,526,614 SNPs in ITU and 117 individuals (57 males and 60 females) with 5,284,432 SNPs in PREDO. To test for enrichment with meQTLs (methylation quantitative trait loci), we first calculated meQTLs in each cohort using matrixEQTL [57] and a p-value threshold of 0.05. Within each cohort separately, M-values of each CpG-site were tested for association with SNP-genotypes in a 150 kb window around the specific CpG, using maternal smoking during pregnancy, gestational age, cell type composition, the first two multi-dimensional-scaling-components for population stratification as well as sex as covariates. Furthermore, we ran sex-stratified meQTL analyses in each cohort. Afterwards, we meta-analysed meQTL results in BET, ITU and PREDO as well as from the sex-stratified analyses with the DerSimonian-Laird random effects model as implemented in an extension of METAL [46]. Meta-analysis p-values below 1 × 10–08 were considered as significant cis meQTLs based on the threshold used in [58]. Only meQTLs where the specific SNP and the specific CpG were available in at least 2 of the 3 cohorts were considered further. This resulted in 3,040,103 significant CpG-SNP combinations in the overall sample (based on 678 individuals, formed by 1,087,708 unique SNPs and 70,855 unique CpGs), 1,634,651 significant cis meQTLs in males (based on 346 males, formed by 721,621 unique SNPs and 42,751 unique CpGs) and 1,461,207 significant cis meQTLs in females (based on 332 males, formed by 664,352 unique SNPs and 39,918 unique CpGs).

Sex-consistent and sex-specific meQTLs

In a next step, we wanted to evaluate if significant meQTLs were different between sexes. At first, we checked the 3,040,103 significant CpG-SNP combinations in the overall sample for direction of effects in the male-only and female-only samples. For the high majority (over 99%), the direction of effects was consistent in males and females. To evaluate sex-specific effects, we checked if the effect direction in the significant male-only meQTLs was consistent with effect direction in the female-only sample and vice versa. Again here, effect directions were consistent for over 99% of combinations. Based on these results, we created a list of sex-consistent and sex-specific SNP-CpG-combinations. Sex-consistent meQTLs were defined as SNP-CpG-combinations which were significant in the overall sample and showed the same direction of effects in males and females. Sex-specific meQTLs were defined as SNP-CpG-combinations which were significant in the male-only or female-only samples and showed a different effect direction in the respective different sex. Few CpGs (n = 167) were contained in both groups. To create two disjunct groups, the specific meQTLs were categorized into the group that presented with the lowest p-value. This resulted in 3,039,847 sex-consistent meQTLs (formed by 1,087,627 unique SNPs and 70,848 unique CpGs) and only 173 sex-specific meQTLs (formed by 169 unique SNPs and 22 unique CpGs).

Enrichment for meQTLs

We checked if the 10,320 epigenome-wide significant sex-differential DMPs overlapped with CpGs involved in sex-consistent and sex-specific meQTLs. To test if the overlap was significant, we applied Fisher-tests.

Phenome-wide association analysis (PheWAS)

To evaluate if SNPs implicated in meQTLs which overlapped with DMPs were also associated with other phenotypes, we performed a PheWAS. For each overlapping meQTL, we chose the SNP with the lowest p-value, yielding a list of 1924 unique SNPs. GWAS associations for these SNPs were extracted from the MRC IEU OpenGWAS platform (https://gwas.mrcieu.ac.uk). We focused on phenotypes originating from the batches UK Biobank study, the NHGRI-EBI GWAS Catalog, and a GWAS on brain imaging phenotypes based on UK Biobank data. This list was further reduced and filtered for duplicates and high similarity in trait names as described in Krontira et al. [59] resulting in a final list of 7503 phenotypes.

Extracted associations were corrected by false discovery rate (FDR, across phenotypes and SNPs). Overall, 12,035 associations (made up of 1150 unique SNPs and 1698 unique GWAS traits) survived multiple testing correction at FDR 0.05 and associated traits were manually grouped into the broad domain categories “anthropometric”, “blood-cell composition”, “bloodmarker”, “brain imaging”, “disease”, “education”, “environment”, “metabolism”, “nutrition”, “puberty” and “other” (if less than 1% of hits matched into the respective category). We applied the same approach on sex-stratified GWAS focusing on traits identified in [60]. These authors conducted a genome and trait-wide genetic effect comparison between males and females and identified 103 traits where at least one autosomal genetic variant presented with a significantly different effect at a p-value threshold of 1 × 10−8 threshold. Overall, 87 associations (made up of 41 unique SNPs and 41 unique GWAS traits) survived multiple testing correction at FDR 0.05. These associations grouped into the (manually created) broad domains “anthropometric”, “blood-cell composition”, “disease” and “metabolism”.

Differential gene expression

Differential gene expression with sex in ITU CVS and birth placenta was tested using the R-package DESeq2 [61]. SV, gestational age at sampling and maternal smoking were used as covariates. For placental gene expression, we additionally covaried for induced labor yes/no and caesarian section yes/no. P-values were FDR-corrected at 0.05. The same analysis was conducted in the 91 individuals with RNASeq available in CVS and placenta.

留言 (0)

沒有登入
gif