An autoantibody-based machine learning classifier for the detection of early-stage non-small cell lung cancer

Abstract

The humoral immune system plays a significant role in the immune response to cancer but is challenging to study at scale. We used programmable phage immunoprecipitation sequencing (PhIP-Seq) to profile the autoantibody repertoire in non-small cell lung cancer (NSCLC) patients for the purpose of training a machine learning-based classifier to distinguish NSCLC patients from healthy controls using 301 primarily early-stage, asymptomatic NSCLC patients and 352 healthy controls. The classifier performed well in cross-validation (average ROC-AUC = 0.94) and in an independently analyzed clinical validation cohort of 134 NSCLC patients and 96 healthy controls (ROC-AUC = 0.84). Classification performance can be maintained with only a few hundred target peptides, provided a sufficiently large cohort is used for optimal training. Our findings suggest the existence of a measurable autoreactive humoral profile in NSCLC and demonstrate the potential for serum-based early detection of cancer independent of nucleic acids.

Competing Interest Statement

AFK, CAD, JRK, DMJ, and JLD are co-inventors on a patent application submitted by the Regents of the University of California and the Chan Zuckerberg Biohub San Francisco related to this work.

Funding Statement

Funding was provided by the Chan Zuckerberg Biohub (San Francisco). LB was supported by grants from the Lung Cancer Research Foundation (LCRF) and the MGH Transformative Scholars Program.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

The Institutional Review Board of the University of California, San Francisco gave ethical approval for this work (IRB# 11-06107).

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Data Availability

All PhIP-Seq data is available on Dryad at < https://doi.org/10.5061/dryad.08kprr5bk >. Code associated with analysis is available on github at < https://github.com/afkung/nsclc-classify >. Normal expression data from the Genotype-Tissue Expression Project (GTEx) used in the analyses described were downloaded from the GTEx Portal and dbGaP accession number phs000424.vN.pN. Lung adenocarcinoma data from The Cancer Genome Atlas Program (TCGA-LUAD) used in the analyses described in this manuscript were downloaded through the UCSC Xena Data Portal (University of California, Santa Cruz).

https://doi.org/10.5061/dryad.08kprr5bk

https://github.com/afkung/nsclc-classify

留言 (0)

沒有登入
gif