Kernel-based hierarchical structural component models for pathway analysis on survival phenotype

Background

High-throughput sequencing, particularly RNA-sequencing (RNA-seq), has advanced differential gene expression analysis, revealing pathways involved in various biological conditions. Traditional pathway-based methods generally consider pathways independently, overlooking the correlations among them and ignoring quite a few overlapping biomarkers between pathways. In addition, most pathway-based approaches assume that biomarkers have linear effects on the phenotype of interest.

Objective

This study aims to develop the HisCoM-KernelS model to identify survival phenotype-related pathways by accommodating complex, nonlinear relationships between genes and survival outcomes, while accounting for inter-pathway correlations.

Methods

We applied HisCoM-KernelS model to the TCGA pancreatic ductal adenocarcinoma (PDAC) RNA-seq dataset, comprising 4,498 protein-coding genes mapped to 186 KEGG pathways from 148 PDAC samples. Kernel machine regression was used to model pathway effects on survival outcomes, incorporating hierarchical gene-pathway structures. Model parameters were estimated using the alternating least squares algorithm, and the significance of pathways was assessed through a permutation test.

Results

HisCoM-KernelS identified several pathways significantly associated with pancreatic cancer survival, including those corroborated by previous studies. HisCoM-KernelS, especially with the Gaussian kernel, showed a better balance of detection rate and number of significant pathways compared to four other existing pathway-based methods: HisCoM-PAGE, Global Test, GSEA, and CoxKM.

Conclusion

HisCoM-KernelS successfully extends pathway-based analysis to survival outcomes, capturing complex nonlinear gene effects and inter-pathway correlations. Its application to the TCGA PDAC dataset emphasizes its utility in identifying biologically relevant pathways, offering a robust tool for survival phenotype research in high-throughput sequencing data.

留言 (0)

沒有登入
gif