BC-Predict: Mining of signal biomarkers and multilevel validation of cascade classifier for early-stage breast cancer subtyping and prognosis

Abstract

Disease heterogeneity is the hallmark of breast cancer, which remains a scourge and the most common malignancy among women. With a steep increase in breast cancer morbidity and mortality, there exists a critical need for effective early-stage theragnostic and prognostic biomarkers. This would help in patient stratification and optimal treatment selection towards better disease management. In this study, we examined four key problems with respect to the characterization of breast cancer heterogeneity, namely: (i) cancer screening; (ii) identification of metastatic cancers; (iii) molecular subtype (TNBC, HER2, or luminal); and (iv) histological subtype (ductal or lobular). We mined the available public-domain transcriptomic data of breast cancer patients from the TCGA and other databases using stage-encoded statistical models of gene expression, and identified stage-salient, monotonically expressed, and problem-specific biomarkers. Next we trained different classes of machine learning algorithms targeted at the above problems and embedded in these feature spaces. Hyperparameters specific to each algorithm were optimized using 10-fold cross-validation on the training dataset. The optimized models were evaluated on the holdout testset to identify the overall best model for each problem. The best model for each problem was validated with: (i) multi-omics data from the same cohort (miRNA and methylation profiles); (ii) external datasets from out-of-domain cohorts; and (iii) state of the art, including commercially available breast cancer panels. External validation of our models matched or bested available benchmarks in the respective problem domains (balanced accuracies of 97.42% for cancer vs normal; 88.22% for metastatic v/s non metastatic; 88.79% for ternary molecular subtyping; and ensemble accuracy of 94.23% for histological subtyping). We have translated the results into BC-Predict, a freely available web-server that forks the best models developed for each problem, and provides the cascade annotation of input instance(s) of expression data, along with uncertainty estimates. BC-Predict is meant for academic use and has been deployed at: https://apalania.shinyapps.io/BC-Predict

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This work was supported in part on DST-SERB grant EMR/2017/000470. Computing in our lab is also supported on a Google TPU Research Cloud (TRC) grant of Cloud TPU VMs.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

TCGA - BRCA dataset (generated by The Cancer Genome Atlas Consortium), METABRIC dataset (generated by the Molecular Taxonomy of Breast Cancer International Consortium), ICGC dataset (generated by International Cancer Genome Consortium), and GEO datasets.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

留言 (0)

沒有登入
gif