Antiprogrammed cell death protein 1 (anti-PD-1) immune checkpoint inhibitors such as pembrolizumab benefit a subset of patients with recurrent or metastatic head and neck squamous cell carcinoma, but current biomarkers are inadequate at identifying these patients.
WHAT THIS STUDY ADDSThis study describes the validation of a new RNA-based test that predicts disease control and progression-free survival in response to anti-PD-1 therapy with high sensitivity and specificity.
The test was validated using two independent cohorts of patients from 17 community and academic sites.
HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICYThe test had significantly higher sensitivity than tumor mutational burden and significantly higher specificity than programmed death-ligand 1 combined positive score, enabling clinicians to make more informed decisions when prioritizing treatment.
Use of the test has the potential to avoid unnecessary chemotherapy and/or anti-PD-1 treatment and improve patient outcomes.
BackgroundHead and neck squamous cell carcinomas (HNSCC) represent a significant healthcare burden. Worldwide, HNSCC is the seventh most common cancer with 870 000 new cases and 440 000 deaths annually.1 More than 65% of these HNSCC patients are ultimately diagnosed with recurrent or metastatic disease.2 3 Patients with recurrent or metastatic HNSCC (RM-HNSCC) have a poor prognosis with median overall survival (OS) of just 10.7–13.0 months.4 The introduction of antiprogrammed cell death protein 1 (anti-PD-1) immune checkpoint inhibitors (ICIs) such as pembrolizumab and nivolumab has improved outcomes for a subset of patients, but ICIs are associated with serious adverse reactions and a high financial burden to the health system.5–9 In practice, many patients receive ICI in combination with platinum or other chemotherapies and choosing between ICI monotherapy and ICI in combination with chemotherapy (chemo-immunotherapy) is an important treatment decision.4 10
In KEYNOTE-048, both monotherapy ICI and chemo-immunotherapy demonstrated a survival benefit in patients with combined positive score (CPS) ≥1 relative to the non-ICI control arm, leading to the approval of pembrolizumab and recommendations for its use in patients with CPS ≥1.4 10 However, while chemo-immunotherapy-treated patients with CPS 1–19 had improved survival, there was no clear survival benefit for CPS 1–19 monotherapy-treated patients.11 As a result, many oncologists treat CPS 1–19 patients with the more aggressive chemo-immunotherapy option, limiting monotherapy ICI to patients with CPS ≥20. There is a critical need for novel predictive biomarkers to improve on CPS, particularly to provide more confidence to give more patients monotherapy, sparing potentially unnecessary chemotherapy.12 13 While tumor mutational burden (TMB) is sometimes used to aid treatment decisions, its clinical utility in HNSCC is less clear.14–16 ICI undoubtedly provides clinical benefit to a subset of patients, but current methods for identifying which patients benefit are insufficient.
There is an unmet clinical need for more robust methods of predicting disease control in response to PD-1 inhibitors. Previously, we described the development of an RNA-sequencing-based classifier to predict disease control with increased sensitivity and specificity compared with programmed death-ligand 1 (PD-L1) CPS in patients with RM-HNSCC treated with anti-PD-1 monotherapy.17 In that study, we classified patients as progressors or non-progressors based on the median predicted probability of disease control as determined by the test. Building on that work, we refined the test and its thresholds for disease control prediction to create three groups correlated with predicted likelihood of disease control: low, medium and high. Here, we report the validation and test performance in two independent cohorts of patients, one treated with anti-PD-1 alone (monotherapy) and one with anti-PD-1 in combination with platinum-based chemotherapy (chemo-immunotherapy). The resulting laboratory developed test, OncoPrism-HNSCC, classifies patients into three groups, low (25% of patients with lowest likelihood of disease control), medium (25% of patients with indeterminate likelihood of disease control) and high (50% of patients with high likelihood of disease control). The test predicts disease control in response to anti-PD-1 treatments with both high sensitivity and specificity.
MethodsStudy design and participantsPatients were recruited from the following academic and community study sites across the USA, with the aim of a representative sample of the affected population: Washington University in St. Louis (St. Louis, Missouri), University of California San Diego (San Diego, California), Intermountain Healthcare (Salt Lake City, Utah), Gundersen Medical Foundation (La Crosse, Wisconsin), Cancer Care Northwest (Spokane, Washington), Cox Medical Centers (Springfield, Missouri), Decatur Memorial Hospital (Decatur, Illinois), Holy Cross Hospital (Fort Lauderdale, Florida), John B Amos Cancer Center (Columbus, Georgia), MultiCare Institute for Research and Innovation (Tacoma, Washington), Northwest Oncology and Hematology (Hoffman Estates, Illinois), Ochsner Lafayette General Medical Center (Lafayette, Louisiana), Providence Regional Cancer System (Lacey, Washington), Sharp Clinical Oncology Research (San Diego, California), Stanford University (Stanford, California), William Beaumont Army Medical Center (Fort Bliss, Texas), Baylor College of Medicine (Houston, Texas), Brooke Army Medical Center (Fort Sam Houston, Texas), Dayton Physicians Network (Dayton, Ohio), Mayo Clinic (Rochester, Minnesota), Revive Research Institute (Sterling Heights, Michigan), and Valley Cancer Associates (Harlingen, Texas).
Patients were enrolled from 2019 to 2023 in a retrospective, observational study. No patient-level study data were reported to patients or physicians and the patients and public were not involved in the study design. Patients were enrolled following the inclusion and exclusion criteria outlined below. Eligible patients had recurrent or metastatic histologically or cytologically confirmed HNSCC and were treated with anti-PD-1 either as a single agent (monotherapy) or in combination with chemotherapy (chemo-immunotherapy) for recurrent or metastatic disease. Acceptable 10th revision of the International Classification of Diseases codes are listed in online supplemental table S1. Tissue specimens analyzed in the study were collected from pretreatment tumor samples originally processed as formalin-fixed and paraffin-embedded (FFPE) specimens using standard histological protocols. De-identified, pretreatment FFPE tumor biopsy specimens were provided to Cofactor Genomics for OncoPrism-HNSCC and PD-L1 immunohistochemistry analysis. Following treatment, each patient’s tumor response to immunotherapy was evaluated using RECIST (Response Evaluation Criteria in Solid Tumors), PERCIST (PET Response Criteria in Solid Tumors), or other clinical criteria as appropriate in standard of care to determine disease control. This outcome label was extracted from the documented medical record for the purposes of this study. Patients with insufficient tissue for analysis (<10% tumor cells as determined by a study pathologist (EJD)) and samples with >22.4 months between biopsy and treatment were excluded from the study.17 Primary or metastatic tumor specimens were accepted, but metastatic tumors from liver or bone were not included due to confounding tissue RNA expression and the difficulty of recovering and processing decalcified FFPE RNA. Length of follow-up ranged from 34 days to 64 months. The study protocol, ‘A Multicenter Cancer Biospecimen Collection Study’ is registered as ‘NCT04510129—PREDicting immunotherapy efficacy from Analysis of Pre-treatment Tumor biopsies (PREDAPT)’ on ClinicalTrials.gov. Independent data monitoring was conducted by the study clinical research organization Curebase (San Francisco, California).
RNA extractionRNA was extracted using RNAstorm (Biotium, Fremont, California) according to the manufacturer’s instructions. RNA quantity was assessed by the high-sensitivity RNA Qubit assay (Thermo Fisher Scientific, Waltham, Massachusetts). A predefined yield of 40 ng FFPE RNA was used as the minimum QC threshold. Quality of the RNA was assessed using a bioanalyzer (Agilent Technologies, Santa Clara, California), and a DV200 of >24% was required for all samples.
Library preparation and sequencingLibraries were prepared using the QuantSeq 3’ mRNA-Seq Library Prep Kit FWD for Illumina (Lexogen, Greenland, New Hampshire), following the manufacturer’s instructions. Library RNA input was 40 ng for all samples. UMI Second Strand Synthesis Module for QuantSeq FWD (Lexogen) replaced Second Strand Synthesis Mix 1 in the workflow. All samples were processed with two OncoPrism-HNSCC-positive controls and a No Template Control. The positive (high or medium scoring) controls were RNA extracted from RM-HNSCC samples as described above. Final libraries were sequenced to a minimum depth of 10 million single-end 75 base pair reads on a NextSeq500 (Illumina, San Diego, California), following the manufacturer’s protocols.
ImmunohistochemistryPD-L1 staining was performed by Mosaic Labs (Lake Forest, California) using the 22C3 pharmDx antibody (Agilent Technologies) or by NeoGenomics Laboratories (Fort Myers, Florida) using the PD-L1 22C3 FDA (KEYTRUDA) assay for HNSCC stain. CPS assessment was performed by WHW or by NeoGenomics. H&E staining was performed by NeoGenomics as part of the PD-L1 22C3 test or at Cofactor Genomics using xylene substitute Slide Brite (Newcomer Supply, Middleton, Wisconsin), as detailed by manufacturers.
Processing of RNA-sequencing dataFASTQ files were preprocessed with trim_galore/cutadapt V.0.4.1 to remove adapter sequences, reads with PHRED quality scores <20, and reads shorter than 20 base pairs. The trimmed reads were aligned to the human genome GRCh38 with STAR V.2.5.2a using the two-pass method as previously described.18 Read counts were generated using htseq-count V.0.9.1 and annotation from Gencode V.22.18 The data were normalized as counts per million and log2 transformed using unique reads aligning to protein coding regions. Samples were required to have a minimum of 30% exonic alignment and 800 000 unique deduplicated counts to be included in the study.
Tumor mutational burdenTMB was measured using the GatewaySeq targeted DNA assay (Washington University), which was run in a CLIA-accredited clinical laboratory. GatewaySeq calculates TMB using the Illumina (San Diego) Dragen TMB caller in tumor-only mode and non-synonymous TMB output. We used the clinically validated GatewaySeq definition of TMB high (20 or more mutations per megabase) to categorize patients as TMB high or TMB low; 50–250 ng DNA was used as input. DNA was extracted using DNAstorm (Biotium) according to the manufacturer’s instructions.
Model trainingData from 1205 total samples were used to select features, refine the protocol, train the model, and validate the model (figure 1). Data from 790 patient samples were used to identify 149 candidate features related to immune response with detectable expression across two publicly available datasets (online supplemental table S2).19 20 The features are enriched in genes related to T cell activation, JAK/STAT signaling, interleukin signaling, interferon-gamma signaling, and inflammation. An additional 415 patient samples were collected for the PREDAPT trial. Samples that were excluded or failed quality control (QC) requirements are detailed in online supplemental table S3. The remaining PREDAPT samples (n=211) were divided into a training cohort and two validation cohorts based on treatment and time enrollment was completed. The training cohort consisted of 99 samples from patients receiving anti-PD-1 monotherapy at 11 PREDAPT healthcare systems.17 A supervised machine learning, logistic regression model was built using this training dataset. Patients with complete response, partial response or stable disease were treated as the positive class. Samples from 34 patients ultimately assigned to validation cohort 1 were used as a preliminary evaluation of the training model performance.
Figure 1Samples used to develop, train, and validate OncoPrism-HNSCC. Data from a total of 1205 samples were used to select features, refine the protocol, train the model, and validate the model. Data from 790 publicly available samples were used to select features. 415 patient samples were collected as part of the PREDicting immunotherapy efficacy from Analysis of Pre-treatment Tumor biopsies trial. 116 samples were ineligible and were excluded from this study. 86 samples failed quality control (QC) and were excluded. Two patients were withdrawn from the study. Of the remaining patient samples, 161 were treated with monotherapy antiprogrammed cell death protein 1 (anti-PD-1) and 50 were treated with chemo-immunotherapy anti-PD-1. Of the monotherapy samples, 99 were used to train the model, and the remaining 62 monotherapy samples served as validation cohort 1. The 50 chemo-immunotherapy samples served as validation cohort 2. HNSCC, head and neck squamous cell carcinomas.
OncoPrism Scores and predictionThe OncoPrism-HNSCC biomarker generates an OncoPrism Score from 0 to 100 that correlates with predicted disease control in patients with RM-HNSCC treated with anti-PD-1 monotherapy. Higher OncoPrism Scores represent higher confidence by the model that the patient will have disease control. The thresholds for the OncoPrism groups were defined from the training data. Considering n unique patient samples, patients are chosen n times with replacement and used for training a model. Using this trained model, an ‘out-of-bag’ score is generated for the remainder of patients.17 21 This process was repeated 1000 times, and the out-of-bag score of each patient was averaged to generate a mean training OncoPrism Score. The threshold between the low group (OncoPrism Scores 0–37) and the medium group (OncoPrism Scores 38–51) is defined as the value of the 25th percentile mean score. The threshold between the medium group and the high group (OncoPrism Scores 52–100) is defined as the value of the 50th percentile mean score. These training cohort mean score thresholds are used for all subsequent validation and analysis to define the OncoPrism groups.
Validation of performanceClinical validation of the OncoPrism-HNSCC assay was performed using a separate cohort of 112 unique patient samples divided into two independent cohorts (cohort 1 and cohort 2). Samples were processed in the Cofactor Genomics CAP-accredited, CLIA-certified laboratory using strict QCs. The primary validation metric was disease control rate (DCR) in each OncoPrism group. DCR was calculated by dividing the sum of patients with RECIST 1.1-defined categories of stable disease, partial response, and complete response as initial response by the total number of patients in each group. RECIST label was determined 2–4 months after initiation of ICI treatment when possible, but six patients were evaluated at 5 months or later for reasons related to treatment regimen or availability for follow-up imaging. DCR was used because of similar progression-free survival (PFS) and clinical benefit previously observed among patients with best response of stable disease and partial response.17 To measure the test’s ability to enrich for disease control in response to anti-PD-1 monotherapy, 62 FFPE tumor samples from 62 monotherapy-treated patients from 15 clinical sites were processed through the OncoPrism-HNSCC workflow (cohort 1). As an additional independent validation, 50 FFPE tumor samples from 50 chemo-immunotherapy-treated patients at 11 clinical sites were processed through the OncoPrism-HNSCC workflow (cohort 2). Patient specimens came from 17 unique clinical sites in total. OncoPrism Scores were generated for each sample. Patients were assigned to the low, medium, or high OncoPrism groups based on these scores. Operators were blinded to the RECIST label when processing samples and generating OncoPrism Scores. The RECIST labels for each patient were determined independently from the OncoPrism group and were used to determine the DCR for each group in the validation set.
For the purposes of test validation and treatment recommendations, low group patients are classified as predicted progressors, medium group patients are considered an indeterminate result, and high group patients are classified as predicted to have disease control in response to ICI. The medium group is considered an indeterminate result due to the variation seen in medium group DCR and PFS across datasets (data not shown and figure 2B,C and E,F). When comparing performance with PD-L1 CPS, the high group patients are considered the predicted positive class (predicted disease control), while the low group and medium group patients are treated as the predicted negative class (no predicted disease control). Including the entire set of patients in these calculations allows for a direct comparison with PD-L1 CPS, even though the intended use of OncoPrism-HNSCC is to consider a medium group result as indeterminate.
Figure 2OncoPrism-HNSCC score and group are correlated with disease control in independent monotherapy and chemo-immunotherapy validation cohorts. Samples are ordered by their OncoPrism Score for the monotherapy-treated (A) and chemo-immunotherapy (D) validation cohorts. Lower scores are more likely to be progressors (gray) while higher scores are more likely to have disease control (orange). Based on their OncoPrism Score and predetermined thresholds (dotted lines), each patient sample is assigned to an OncoPrism group (low, medium, or high). OncoPrism groups are significantly correlated with disease control rate (DCR) in the monotherapy (p=0.004) (B) and chemo-immunotherapy (p=0.004) (E) validation cohorts. P values for the significance of the trend were calculated using Cochran-Armitage test. OncoPrism groups are significantly correlated with progression-free survival (PFS) in the monotherapy (p=0.015) (C) and chemo-immunotherapy (p=0.037) (F) validation cohorts. P values for PFS were calculated using log rank methods. HNSCC, head and neck squamous cell carcinomas.
StatisticsThe primary end point of this study was DCR. A two-sided Cochran-Armitage test for trends was used to test the significance of the trend of increasing proportions for the DCRs of OncoPrism groups. Power analysis was performed using the training cohort out-of-bag area under the receiver operating characteristic (ROC) curve, seeking a two-sided type 1 error (alpha) of 0.05 and a type 2 error (beta) of 0.80. A minimum sample size of 36 was calculated to power the primary end point (Cochran-Armitage test for trend in proportions for DCR). Expecting that training cohort out-of-bag performance may overestimate independent cohort performance, we sought a minimum of 50 samples for each cohort. PFS was defined as the time from start of ICI treatment to progression or death. Patients were censored if they had not progressed at last follow-up. One OncoPrism high group patient treated with chemo-immunotherapy was excluded because of an unknown date of progression. PFS figures and analysis were done using the ‘survminer’ and ‘survival’ packages, and significance was determined using log rank methods.22 23 Differences in sensitivity and specificity were tested using McNemar’s test; 95% CIs for model performance metrics were calculated using a non-parametric bootstrap resampling method. In all cases, a p value of <0.05 was considered significant.
ResultsPatients meeting all inclusion criteria (n=211) were divided into a training cohort (n=99) and two validation cohorts (cohort 1, n=62 and cohort 2, n=50; figure 1; online supplemental table S4). The training and validation cohorts have similar patient and disease characteristics, except that the training cohort and validation cohort 1 were treated with monotherapy ICI, while validation cohort 2 was treated with ICI in combination with chemotherapy (chemo-immunotherapy; table 1).
Validation samples were processed with OncoPrism-HNSCC, generating an OncoPrism Score and resulting OncoPrism group for each patient sample. Analytical variation was low, with highly repeatable results for replicate samples (manuscript in preparation). The primary end point of this study was disease control, so performance was evaluated using the DCR for each OncoPrism group, with an expected trend from lower DCR in the low group to higher DCR in the high group.
Validation cohort 1: monotherapy patientsSpecimens from 62 patients treated with anti-PD-1 monotherapy (cohort 1) were scored and categorized into the low, medium, or high group based on their score and the pre-established thresholds (figure 2A). The groups roughly mirrored the expected population distribution, with 27% of patients in the low group, 32% of patients in the medium group, and 40% of patients in the high group (table 2).
Table 2Clinical validation cohorts DCR by OncoPrism group
In cohort 1, the DCR increases from OncoPrism low to medium to high groups (table 2; figure 2B; Cochran-Armitage trend p=0.004). In addition to higher DCR, patients in the high group also had significantly longer PFS (log rank test, p=0.015). Median PFS was 2.6 months for the low group, 4.2 months for the medium group, and 9.8 months for the high group (figure 2C). Table 3 shows key performance metrics when comparing the low group with the high group, the two actionable OncoPrism groups. OncoPrism medium patients were excluded from these calculations as an indeterminate result (see ‘Methods’ section). OncoPrism-HNSCC predicted disease control with high accuracy (0.71), high sensitivity (0.88), specificity (0.60), positive predictive value (PPV (0.60)), and negative predictive value (NPV (0.88)).
Table 3Performance metrics for validation cohorts
Validation cohort 2: chemo-immunotherapy-treated patientsTo test the ability of OncoPrism-HNSCC to predict disease control in chemo-immunotherapy-treated patients, performance of OncoPrism-HNSCC was evaluated in specimens from 50 patients treated with anti-PD-1 in combination with chemotherapy (cohort 2). The groups roughly mirrored the expected population distribution, with 22% of patients in the low group, 32% of patients in the medium group, and 46% of patients in the high group (table 2; figure 2D). As expected, the overall DCR for this cohort was higher than the monotherapy cohort, likely due to the additional effect of chemotherapy on outcome (61% vs 45%; table 2). As with cohort 1, the DCR increases from OncoPrism low to medium to high groups (table 2; figure 2E; Cochran-Armitage trend, p=0.004). This trend corresponded with significantly longer PFS for patients in the high group (log rank test, p=0.037). Median PFS was 3.0 months for the low group, 3.4 months for the medium group, and 16.3 months for the high group in this cohort (figure 2F). One patient from the high group was excluded due to an unknown date of progression (n=49). OncoPrism-HNSCC predicted disease control with high accuracy (0.76), high sensitivity (0.83), specificity (0.64), PPV (0.83), and NPV (0.64) when treating the OncoPrism high group as the predicted positive class and the OncoPrism low group as the predicted negative class (with the OncoPrism medium group excluded as an indeterminate result; table 3).
OncoPrism-HNSCC is not predictive in non-ICI datasetsOur data show that the OncoPrism group assignment is correlated with DCR in patients treated with ICIs. To explore whether OncoPrism group is predictive of disease control in response to ICI or simply prognostic of outcome regardless of therapy, we used the underlying OncoPrism-HNSCC model on four publicly available datasets of patients with HNSCC who were not treated with ICI.19 20 24 25 The OncoPrism-HNSCC biomarker was not significantly correlated with OS in any of the non-ICI datasets (online supplemental table S5), suggesting that it is not an overall prognostic biomarker per se and is consistent with the idea that OncoPrism-HNSCC is predictive of ICI disease control specifically.
OncoPrism-HNSCC outperforms the existing biomarkers PD-L1 CPS and TMBCurrently, the biomarkers most frequently used to predict response to ICI in patients with RM-HNSCC are PD-L1 CPS and, less commonly, TMB. Using our two validation cohorts, we compared the performance of OncoPrism-HNSCC with PD-L1 CPS and TMB. First, we compared PD-L1 CPS with OncoPrism-HNSCC at all possible thresholds for each biomarker using ROC curves. For monotherapy-treated cohort 1, the area under the curve (AUC) for OncoPrism-HNSCC was 0.73, compared with 0.62 for PD-L1 CPS (figure 3A). Likewise, for the chemo-immunotherapy-treated cohort 2, the OncoPrism-HNSCC AUC was 0.76 compared with 0.61 for PD-L1 CPS (figure 3B).
Figure 3OncoPrism-HNSCC outperforms existing assays programmed death-ligand 1 (PD-L1) combined positive score (CPS) and tumor mutational burden (TMB). Receiver operating characteristic (ROC) curves are shown for the monotherapy (A) and chemo-immunotherapy (B) cohorts. OncoPrism-HNSCC (orange) has a higher area under the curve (AUC) than PD-L1 CPS (gray) in both cohorts. In monotherapy (C) and chemo-immunotherapy (D) cohorts, OncoPrism-HNSCC (orange) has high sensitivity and specificity, while PD-L1 CPS (gray) has high sensitivity but low specificity. The distribution of CPS is similar in each validation cohort (E). (F) Sensitivity and specificity for OncoPrism-HNSCC (orange), PD-L1 CPS (gray), and TMB (blue) in 32 patients from the OncoPrism high and low groups. Error bars represent 95% CIs. HNSCC, head and neck squamous cell carcinomas.
Examining ROC curves is useful since each test has its own thresholds for dividing groups. However, it is also important to compare performance using the commonly used thresholds for each test. For OncoPrism-HNSCC, these thresholds are the divisions between the low, medium, and high OncoPrism groups. For PD-L1 CPS, we categorized patients as PD-L1 CPS <1 or PD-L1 CPS ≥1, the threshold recommended by the American Society of Clinical Oncology guidelines.10 In cohort 1, the DCR for PD-L1 CPS <1 patients was 25% compared with 48% for PD-L1 CPS ≥1 patients. In cohort 2, PD-L1 CPS <1 patients had a DCR of 50% compared with 59% for PD-L1 CPS ≥1. PD-L1 status was not correlated with PFS in either cohort (online supplemental figure S1).
To compare sensitivity and specificity between OncoPrism-HNSCC and PD-L1 in the same population, the OncoPrism high group was designated as predicted disease control (predicted positive class), while the OncoPrism medium and low groups were designated as predicted disease progression (predicted negative class). This strategy differs from the metrics shown in table 3, where the medium group was excluded as indeterminate in order to match the intended use of the test, but it allows calculation of metrics in the same patient populations for direct comparison. In the monotherapy cohort (figure 3C), OncoPrism-HNSCC had a sensitivity of 0.54 and a specificity of 0.71. The lower sensitivity compared with table 3 is due to the inclusion of the medium group. Using CPS ≥1 to define predicted disease control, PD-L1 CPS had a sensitivity of 0.93 in this cohort. However, the OncoPrism-HNSCC specificity of 0.71 is significantly higher than the CPS specificity of 0.18 (McNemar’s test, p<0.001). Likewise, in the chemo-immunotherapy cohort (figure 3D), OncoPrism-HNSCC had a sensitivity of 0.63, compared with 0.90 for PD-L1 CPS (McNemar’s test, p<0.05). Again, the OncoPrism-HNSCC specificity of 0.80 is significantly higher than the PD-L1 CPS specificity of 0.15 (McNemar’s test, p<0.001; figure 3C,D and online supplemental table S6). Reflecting the relative sensitivities and specificities of each test, OncoPrism-HNSCC had more false negatives while PD-L1 CPS had more false positives, although 71% of the OncoPrism-HNSCC false negatives were in the medium group and would typically be treated as an indeterminate result (online supplemental table S7). Interestingly, the proportions of patients in each CPS category were very similar between cohort 1 and cohort 2, suggesting that the CPS result was not influencing the treatment of the patients or skewing the datasets (figure 3E).
Because PD-L1 CPS has low specificity, clinicians may have limited confidence that a PD-L1 CPS ≥1 patient will indeed benefit from ICI, driving more aggressive treatment decisions. To test whether OncoPrism-HNSCC can predict PD-L1 CPS ≥1 patients who will benefit from ICI, we evaluated PFS for each OncoPrism group in PD-L1 CPS ≥1 patients only (combined monotherapy and chemo-immunotherapy cohorts). OncoPrism high patients had significantly longer PFS than OncoPrism medium or low patients (log rank test, p<0.001; figure 4A). Likewise, CPS ≥20 is a common threshold for considering ICI monotherapy. In patients with CPS ≥20, OncoPrism high patients had significantly longer PFS (log rank test, p<0.001; figure 4B).
Figure 4(A) OncoPrism high patients have significantly longer progression-free survival (PFS) in combined positive score (CPS) ≥1 patients (p<0.001, log rank methods). (B) OncoPrism high patients have significantly longer PFS in CPS ≥20 patients (p<0.001, log rank methods). Cohort 1 and cohort 2 were combined for this analysis of programmed death-ligand 1 (PD-L1) CPS subgroups. (C) Immune checkpoint inhibitor (ICI) decision tree based on test results. Patients with recurrent or metastatic head and neck squamous cell carcinomas (HNSCC) tested with OncoPrism-HNSCC are categorized into the OncoPrism low, medium, or high group. Because OncoPrism-HNSCC has high specificity relative to PD-L1 CPS and OncoPrism high patients have longer PFS regardless of PD-L1 status, OncoPrism high patients should typically be treated with ICI regardless of PD-L1 status. OncoPrism low patients have low ICI disease control rate (DCR) and are not good candidates for ICI even if they are PD-L1 CPS ≥1. Patients in the OncoPrism medium group do not have a definitive treatment path; all test results and treatment options should be considered. Typically, ICI should be favored for OncoPrism medium patients who are PD-L1 CPS ≥1 while non-ICI or clinical trial options should be considered for PD-L1 CPS <1 patients. Tumor mutational burden (TMB) testing is not recommended for most patients with HNSCC. However, if TMB testing is performed, ICIs should be prioritized for TMB high patients given the high observed specificity of TMB. Only 9% of patients in our study were TMB high. Due to the low sensitivity of TMB, a TMB low result should not be strongly considered in treatment decisions. PD-1, programmed cell death protein 1.
TMB is less commonly used to guide treatment decisions in RM-HNSCC, but is recommended in some tumor types and when PD-L1 CPS is not available.10 To compare the performance of OncoPrism-HNSCC with TMB, we evaluated TMB status for samples from the monotherapy-treated cohort (cohort 1). Specifically, all OncoPrism high or low samples with sufficient material were evaluated (32 samples in total). We evaluated samples in the high and low groups as these two categories are the most likely OncoPrism-HNSCC test results to influence a clinical decision. TMB of at least 20 mutations/Mb was classified as TMB high, while <20 mutations/Mb was considered TMB low (see ‘Methods’ section). Overall, OncoPrism-HNSCC had a sensitivity of 0.85 and a specificity of 0.53 in this group, compared with a sensitivity of 0.23 and a specificity of 1 for TMB (figure 3F). The sensitivity and specificity for CPS in this group is also shown for reference. While TMB had significantly higher specificity than OncoPrism-HNSCC (McNemar’s test, p=0.008), it only identified three patients with disease control (online supplemental table S8). OncoPrism-HNSCC had significantly higher sensitivity than TMB (McNemar’s test, p=0.027).
DiscussionOncoPrism-HNSCC significantly predicts disease control and PFS in response to anti-PD-1 (ICI) therapy in patients with pretreatment RM-HNSCC. Importantly, the test was validated in two separate cohorts using patient samples from 17 clinical academic and community sites from across the USA, which allowed us to account for test performance across a variety of possible pre-analytic sample processing conditions. The multidimensional biomarker underlying OncoPrism-HNSCC was built using the careful evaluation of a previously published study of cell composition, cell state and immune modulatory genes in the tumor microenvironment.17 Both validation cohorts (cohort 1: monotherapy and cohort 2: chemo-immunotherapy) had similar results, with a significant correlation of OncoPrism group classification with DCR and PFS, as well as high accuracy, sensitivity, specificity, PPV and NPV. The OncoPrism-HNSCC model was not predictive in patients treated with non-ICI therapies, suggesting that the biomarker is predictive rather than prognostic (online supplemental table S5). These results also suggest that the predictive nature of the biomarker may be specific to the ICI component in chemo-immunotherapy-treated cohort 2.
There was no distinct difference observed in PD-L1 CPS status of those patients prescribed monotherapy (cohort 1) vs chemo-immunotherapy (cohort 2) (figure 3E), suggesting that PD-L1 score was not driving treatment decision between monotherapy and chemo-immunotherapy in this study. Limitations of PD-L1 for guiding treatment have been previously published.12 13
The intended use of OncoPrism-HNSCC is to aid clinicians in choosing whether to treat with anti-PD-1 as a single agent, anti-PD-1 in combination with chemotherapy, or alternative treatment options. Currently, PD-L1 CPS is the most common biomarker used to guide such decisions. Unfortunately, PD-L1 CPS has high sensitivity but low specificity for predicting disease control (figure 3C,D and online supplemental table S6). This low specificity means that many patients with high CPS do not clinically benefit from ICI, and clinicians are reluctant to use the CPS to exclude patients from more aggressive treatment options like chemo-immunotherapy. OncoPrism-HNSCC has significantly higher specificity than PD-L1. In addition, OncoPrism groups stratify patients by PFS among all patients (figure 2C and E), in patients with PD-L1 CPS >1 (figure 4A), and in patients with PD-L1 CPS ≥20 (figure 4B). Together, these results suggest that ICI therapy should be prioritized for OncoPrism high patients, with a preference for monotherapy given the reduced toxicities (figure 4C).4 9 26 Because OncoPrism-HNSCC predicts PFS in both the CPS ≥1 and CPS ≥20 populations, it can help identify patients who should be considered for monotherapy ICI and CPS ≥20 patients who should nevertheless be considered for chemo-immunotherapy. Finally, given the low DCR and PFS in the OncoPrism low group, clinicians should consider a non-ICI treatment and/or available clinical trials for OncoPrism-HNSCC low patients regardless of PD-L1 status. While PD-L1 CPS had significantly higher sensitivity than OncoPrism-HNSCC when categorizing OncoPrism low and medium patients as predicted progressors (figure 3C,D), OncoPrism-HNSCC had similar sensitivity to PD-L1 when comparing OncoPrism low with OncoPrism high (see ‘Methods’; table 3 and online supplemental table S6). An important limitation of this study is that it does not investigate interaction effects between OncoPrism group and CPS or evaluate performance in the CPS 1–19 subgroup. A future prospective, randomized study comparing OncoPrism-HNSCC directed treatment to standard of care, with consideration of interaction effects, would increase confidence in these treatment recommendations. In contrast to PD-L1, TMB had high specificity but low sensitivity (figure 3F). As a result, ICI treatment should be prioritized for TMB high patients, but OncoPrism-HNSCC results appear to have superior prediction over TMB low results. In all treatment decisions, it is important to consider both the potential treatment benefit and the potential toxicities of ICI and chemotherapy, as single agents and in combination. Online supplemental figure S2 provides a summary of treatment recommendations based on test results.
Better treatment decisions improve patient outcomes, and they limit unnecessary treatment-associated toxicities and reduce costs. For example, while patients treated with chemo-immunotherapy have higher DCR than monotherapy-treated patients, chemo-immunotherapy is also associated with higher toxicities.4 9 26 The health economics of non-ICI treatments (eg, EXTREME (platinum plus fluorouracil and cetuximab)), chemo-immunotherapy, and monotherapy ICI have previously been studied.27 Based on these published costs, OncoPrism-HNSCC has the potential to decrease costs, primarily by increasing the fraction of patients treated with monotherapy ICI while reducing the number of patients receiving chemo-immunotherapy (online supplemental figure S3). Future studies will estimate the impact with real-world use.
OncoPrism-HNSCC fills the unmet clinical need of predicting which patients with RM-HNSCC will benefit from ICI. Outside of PD-L1 CPS, there are no widely used biomarkers for predicting benefit from ICI in RM-HNSCC. TMB is used rarely and has only modest correlation with clinical benefit.15 28 Attempts at pan-cancer biomarkers typically are trained and validated with relatively few HNSCC samples, making their performance in HNSCC difficult to evaluate.29 30 Gene expression-based biomarkers in HNSCC have been previously described. In some cases, these biomarkers are prognostic, but not trained to predict response to ICI.31 32 Others are trained to predict response to ICI, but have not been robustly validated and made available for clinical use.16 33 OncoPrism-HNSCC is an RNA-sequencing-based biomarker trained and validated using patients with RM-HNSCC to aid treatment decisions.
This real-world observational validation study has several limitations. These limitations include patients with long time between test biopsy and treatment, the inclusion of patients who had intervening treatments between the biopsy and the ICI and instances of incomplete data within the patient’s clinical record. The inclusion criteria in this study balanced the recruitment of real-world, well-controlled patient cohorts that reflect the intended treatment scenario while maximizing the study size. Ongoing and future studies aim to refine the model and study additional end points such as OS with additional patient cohorts. Importantly, OncoPrism-HNSCC predicts PFS, which is correlated with OS in response to ICI in RM-HNSCC.34 35 We also plan to further test clinical utility using a prospective cohort of patients. In addition, we aim to perform a meta-analysis of current and future cohorts to evaluate OS and potential interactions among test results, clinical features, and patient outcomes, as well as subgroup analysis, particularly in CPS 1–19 patients. Future work will also refine the thresholds for each OncoPrism group to maximize the number of patients with actionable results without sacrificing the sensitivity or specificity of the test. The medium group is considered an indeterminate result due to the variation seen in medium group DCR and PFS across the training cohort and validation cohorts (data not shown and figure 2B,C and E,F). As a result, currently the medium group result does not provide clear clinical guidance to physicians and patients. In the validation cohorts, 32% of patients fell in the medium group, meaning 68% of patients would have received treatment-guiding information. Because OncoPrism-HNSCC balances high sensitivity and high specificity for predicting disease control, it is clinically useful and has the potential to aid the treatment decisions of more patients than existing tests.
Although there is no ICI predictive biomarker with perfect sensitivity and specificity, OncoPrism-HNSCC addresses significant shortcomings of PD-L1 and TMB in the RM-HNSCC population through a balance of sensitivity and specificity, enabling clinicians to identify patients most likely to benefit from immunotherapy. OncoPrism-HNSCC exhibits clinical validity across a diverse patient population and holds promise to guide treatment decisions and improve patient outcomes.
Data availability statementData are available on reasonable request. The data underlying this study, including anonymized patient-level OncoPrism-HNSCC, PD-L1, and TMB measurements, are available for non-commercial use from the corresponding author on reasonable request.
Ethics statementsPatient consent for publicationNot applicable.
Ethics approvalThe study protocol was approved by institutional review boards at either the study (Advarra, Columbia, Maryland (ID: Pro00051202) or WCG IRB, Puyallup, Washington (ID: 20201975)) or site level, as appropriate. All patients provided signed, informed consent to participate, or consent was waived for deceased patients according to study protocol.
留言 (0)