PCLassoLog: a protein complex-based, group Lasso-logistic model for cancer classification and risk protein complex discovery

Elsevier

Available online 6 December 2022

Computational and Structural Biotechnology JournalAuthor links open overlay panelAbstract

Risk gene identification has attracted much attention in the past two decades. Since most genes need to be translated into proteins and cooperate with other proteins to form protein complexes to carry out cellular functions, which significantly extends the functional diversity of individual proteins, revealing the molecular mechanism of cancer from a comprehensive perspective needs to shift from identifying individual risk genes toward identifying risk protein complexes. Here, we embed protein complexes into the regularized learning framework and propose a protein complex-based, group Lasso-logistic model (PCLassoLog) to discover risk protein complexes. Experiments on deep proteomic data of two cancer types show that PCLassoLog yields superior predictive performance on independent datasets. More importantly, PCLassoLog identifies risk protein complexes that not only contain individual risk proteins but also incorporate close partners that synergize with them. Furthermore, selection probabilities are calculated and two other protein complex-based models are proposed to complement PCLassoLog in identifying reliable risk protein complexes. Based on PCLassoLog, a pan-cancer analysis is performed to identify risk protein complexes in 12 cancer types. Finally, PCLassoLog is used to discover risk protein complexes associated with gene mutation. We implement all protein complex-based models as an R package PCLassoReg, which may serve as an effective tool to discover risk protein complexes in various contexts.

Keywords

protein complex

deep proteomic data

group Lasso

logistic model

cancer classification

© 2022 The Author(s). Published by Elsevier B.V. on behalf of Research Network of Computational and Structural Biotechnology.

留言 (0)

沒有登入
gif