Multi-omics analysis reveals novel causal pathways in psoriasis pathogenesis

Study design

An overview of our analytical framework is presented in Fig. 1. Our study integrated three types of molecular quantitative trait loci (QTL) data: methylation QTL (mQTL) from McRae et al. (n = 1980 Europeans) [13], expression QTL (eQTL) from the eQTLGen Consortium (n = 31,684 Europeans) [14], and protein QTL (pQTL) from Ferkingstad et al. (n = 35,559 Icelanders) [15]. For instrument selection, we applied criteria including p < 5 × 10–8, with top-SNPs selected within ± 2000 kb. For psoriasis associations, we utilized two independent datasets: a discovery cohort from EMBL-EBI (5,459 cases and 324,074 controls) and a replication cohort from UK Biobank (5,314 cases and 457,619 controls). For validation, we leveraged tissue-specific data from the GTEx Consortium (V8 release) [16], including both sun-unexposed and sun-exposed skin tissue, as well as EBV-transformed lymphocytes. Additional protein-level validation was conducted using the UK Biobank Pharma Proteomics Project's (UKB-PPP) inflammation panel [17]. All datasets utilized in this study were publicly available and are detailed in Table 1.

Fig. 1figure 1

Study design and workflow for our study. This figure outlines the step-by-step process of our study, including instrument selection, Mendelian randomization analysis, colocalization, multi-omics integration, and validation. The data sources, selection criteria, and analytical methods used at each stage, from initial discovery cohorts to final validation using tissue-specific and proteomic data is included

Table 1 Summary of datasets included in this studyMethylation, expression, and protein quantitative trait loci datasets

Specifically, for mQTL analysis, we utilized whole blood data from McRae et al. [13], which included 417,580 CpG sites genotyped using the Illumina HumanMethylation450 array. The CpG sites were filtered using a detection p-value threshold of 0.01 in at least 95% of samples. Methylation levels were normalized using both beta and M-values, with beta-values used for interpretability and M-values for statistical testing. For eQTL analysis, we used blood-derived data from eQTLGen Consortium [14]. Gene expression levels were quantified using RNA sequencing or gene expression arrays, with subsequent quality control including removal of technical covariates and normalization [14]. Expression data were adjusted for known and hidden confounders using principal component analysis [14]. Blood pQTL data from Ferkingstad et al. [15] measured 4,907 proteins using the SOMAscan platform. Raw protein measurements underwent several quality control steps including hybridization control normalization, median signal normalization, and calibration to remove batch effects. The protein levels were log-transformed and standardized to have a mean of zero and standard deviation (SD) of one. For tissue-specific validation, we utilized GTEx V8 data [16] from sun-exposed (n = 605) and sun-unexposed (n = 517) skin samples, as well as EBV-transformed lymphocytes (n = 147) . Gene expression was quantified using RNA-seq, with reads aligned to GRCh38 reference genome using STAR, followed by gene-level quantification using RNA-SeQC v1.1.9. Expression values were normalized using TMM method and transformed to log2 counts per million [16]. Additional protein-level validation used UKB-PPP data [17], which measured 1,463 proteins using the Olink® Explore platform. The protein levels were normalized using Olink's standard pipeline, including normalization against extension control, inter-plate control, and adjustment for technical variation [17]. The processing and quality control steps for all datasets aligned with established protocols in their respective original publications.

Psoriasis outcome datasets

Summary-level data for psoriasis were obtained from studies by the European Molecular Biology Laboratory–European Bioinformatics Institute (EMBL-EBI) and the UK Biobank. The EMBL-EBI (GCST90014456) dataset included 329,533 individuals of European descent, with 5,459 psoriasis cases and 324,074 controls [18]. For validation, we used the data from UK Biobank, which comprised 462,933 European individuals (5,314 psoriasis cases and 457,619 controls) [19].

Summary data-based mendelian randomization analysis

We employed summary-data-based Mendelian randomization (SMR) analysis to investigate potential causal relationships between molecular traits and psoriasis risk. The SMR approach extends traditional Mendelian randomization by utilizing summary-level data from independent GWAS and QTL studies to examine whether the effect of a SNP on a trait (psoriasis) is mediated through molecular features (such as gene expression, DNA methylation, or protein levels). The SMR method has been described in detail by Zhu et al. [12]. Briefly, the SMR effect size (bxy) was estimated as:

where bzy represents the SNP's effect on psoriasis from GWAS data, and bzx represents the SNP’s effect on molecular traits from QTL studies. The corresponding test statistic (TSMR) was calculated using z-statistics from both GWAS and QTL studies:

$$} = z^ zy z^ zx / \left( zy + z^ zx} \right)$$

where zzy and zzx are the z-statistics from GWAS and QTL studies, respectively. To implement this analysis, we utilized the SMR software (v1.3.1) [12] with the following criteria: (1) selected top cis-QTLs within ± 2,000 kb of each gene, (2) required p-value < 5 × 10–8 for QTL associations [12], and (3) excluded SNPs with allele frequency differences > 0.2 between datasets. Statistical significance was determined using false discovery rate (FDR)-corrected p-values (threshold < 0.05) via the Benjamini–Hochberg method.

Distinguishing functional association from linkage

To differentiate between pleiotropy and linkage disequilibrium, we implemented the heterogeneity in dependent instruments (HEIDI) test. Under the assumption of a single causal variant, the SMR effect size (bxy) estimated using any SNP in LD with the causal variant should be consistent. The HEIDI test statistic evaluates this consistency by comparing the bxy of the top associated cis-QTL (bxy(top)) with those of other significant SNPs in the cis-QTL region (bxy(i)):

$$di = bxy(i) - bxy(top)$$

where di follows a multivariate normal distribution MVN(d,V), with V representing the covariance matrix. The HEIDI test statistic (T_HEIDI) is calculated as:

$$T\_HEIDI = \Sigma z^ d\left( i \right)$$

where zd(i) = di/√var(di). We excluded SNPs in perfect LD with the top cis-QTL (r2 > 0.9) and those with weak associations (p > 1.6 × 10⁻3) to ensure robust testing. A p_HEIDI > 0.01 suggests a single causal variant affecting both the molecular trait and the outcome through the same pathway.

Colocalization analysis

To determine whether association signals from separate GWAS at the same locus share a causal variant, we performed colocalization analysis using the "coloc" R package (v5.2.3) [20,21,22]. Given the significant role proteins play in disease, we focused on genetic associations between psoriasis and corresponding pQTLs. The colocalization analysis tests five hypotheses: (H0) no causal variants for either protein or psoriasis in the locus; (H1) one causal variant for protein only; (H2) one causal variant for psoriasis only; (H3) two distinct causal variants for protein and psoriasis; and (H4) one shared causal variant for both protein and psoriasis. Corresponding posterior probabilities are denoted as PPH0, PPH1, PPH2, PPH3, PPH4, respectively. We defined colocalization regions as ± 1,000 kb around the locus and considered PPH4 > 0.7 (corresponds to a FDR of < 5%) as strong evidence supporting a shared causal relationship [23].

Integration of multi-omics results

We implemented a systematic approach to integrate multi-omics data. Our analytical framework was guided by the central dogma of molecular biology, where genetic variants influence phenotypes through sequential molecular changes from DNA methylation to gene expression to protein levels. First, we applied SMR analysis with HEIDI tests at each molecular level, requiring both SMR FDR-adjusted p-value < 0.05 and HEIDI p-value > 0.01 to identify significant associations while excluding potential linkage effects. Since proteins represent the functional endpoints of gene regulation, we prioritized our analysis by first identifying proteins showing robust causal associations with psoriasis. We then traced back through the molecular cascade to identify consistent signals at gene expression and DNA methylation levels.

For colocalization analysis, we implemented a PPH4 threshold > 0.7, following established precedents in genomic research. This threshold was chosen based on Foley et al.'s demonstration that it corresponds to a FDR of < 5% [23], and has been successfully applied in multiple recent genomic studies [24,25,26]. To define regulatory pathways, we required evidence of consistent effects across molecular layers. Specifically, a candidate pathway needed to meet three criteria:1). The protein showed significant causal association with psoriasis (SMR FDR-corrected p-value < 0.05, p-HEIDI > 0.01) and strong colocalization evidence (PPH4 > 0.7); 2). The corresponding gene demonstrated significant expression-level association with psoriasis (SMR FDR-corrected p-value < 0.05, p-HEIDI > 0.01); 3). At least one CpG site in the gene region showed significant methylation-level association with psoriasis (SMR FDR-corrected p-value < 0.05, p-HEIDI > 0.01). For example, if methylation at a CpG site (e.g., cg26804944) showed association with psoriasis through mQTL analysis, and we simultaneously observed consistent associations at both gene expression (through eQTL) and protein levels (through pQTL) for the same gene, we considered this as evidence for a potential regulatory pathway.

留言 (0)

沒有登入
gif