Somatic copy number alteration and fragmentation analysis in circulating tumor DNA for cancer screening and treatment monitoring in colorectal cancer patients

Tumor-specific global fragmentation pattern

To establish a comprehensive data set for LB analysis in all stages of CRC, we applied WGS with a median coverage of 6x (SD = 2.37) in 259 plasma samples of CRC patients (n = 50) and healthy controls (n = 61) (Additional file 2: Table S2). We first evaluated, whether the global fragmentation pattern of cfDNA may be a suitable marker for untargeted ctDNA detection. Fragmentation patterns are a result of various chromatin states that are associated with altered expression of tumor-associated genes [19, 21, 28, 29].

We compared the global fragmentation of cfDNA from CRC patients to cfDNA from healthy controls which typically present with a peak of ~ 167 bp corresponding to DNA bound by one nucleosome plus linker DNA [20] (Fig. 1A). We observed a significant enrichment of short fragments (90–150 bp) in CRC patient samples with clinically diagnosed tumor burden (n = 134) compared to healthy controls (n = 55) (Mann–Whitney-U test, p-value = 4.75*10–5) (Fig. 1B). When allocating CRC patient samples according to the course of disease, we observed a significant enrichment of short fragments during therapy (between surgery and adjuvant chemotherapy, or during chemotherapy before staging) (n = 27) (p-value = 1.48*10–4). A tendency (albeit not statistically significant) toward a higher proportion of short fragments could be identified in all other progression sample groups with clinically diagnosed tumor burden (Fig. 1C). When stratifying CRC patient samples collected at diagnosis according to their disease stage, we further observed a significant enrichment of short fragments in patients with stage IV CRC (n = 16) (p-value = 7.25*10–5) (Fig. 1D).

Fig. 1figure 1

Differences in global fragmentation between cfDNA from CRC patients and healthy controls. A Heat map showing enrichment or decrease in cfDNA fragments from 90 to 410 bp according to their length as z-scores of each sample compared to healthy controls. B Short cfDNA fragments (90–150 bp) are significantly enriched in samples collected from CRC patients with clinically diagnosed tumor burden. C Only for samples collected in the beginning of therapy a significantly enriched fraction of short fragments can be observed. D At diagnosis a significant enrichment in short fragments was only observed in patients with stage IV CRC. (ns: p-value ≤ 1; *: p-value ≤ 5*10–2, **: p-value ≤ 1*10–2, ***: p-value ≤ 1*10–3, ****: p-value ≤ 1*10–4)

Interestingly albeit not statistically significant, we detected different fragmentation profiles due to enrichment of short fragments < 167 bp when analyzing samples from CRC patients in remission with no evidence of disease compared to healthy controls. When focusing on samples from CRC patients in remission more than six weeks post-treatment, we did no longer observe an enrichment of short fragments < 167 bp (Fig. 1A). The observed enrichment within the first weeks post-surgery is likely associated with the intake of low-molecular weight heparin, in accordance with previous findings [30, 31]. Taken together, our results indicate that cfDNA is more fragmented in CRC patients compared to healthy controls and can therefore support untargeted detection of ctDNA.

Tumor-specific regional fragmentation profiles

To assess whether regional fragmentation across the genome could serve as another non-genetic marker for ctDNA detection in CRC patients, we calculated the ratio of short (100–150 bp) to long (151–220 bp) fragments (S/L ratio) in 100 kb bins for each chromosome in CRC patients and healthy controls, as recently described [19, 20]. Notably, data of chromosome arms harboring SCNAs were excluded to avoid bias due to regionally enriched ctDNA. Compared to healthy controls, we observed distinct differences in the S/L ratio of CRC patients at diagnosis, during therapy, and with stable or progressive disease. In contrast, in CRC patients with partial remission or in remission, we did not observe such differences (Fig. 2A). Focusing on CRC patient samples with clinically diagnosed tumor burden, we observed a significant enrichment in short fragments on chromosome arms 1p and 15q, and significant enrichment of long fragments on chromosome arms 4p, 5p, 11p, 11q, 19q, 21p and 21q (Fig. 2B). Overall, we were able to detect ctDNA in 75% (100/134) of samples collected from CRC patients with clinically diagnosed tumor burden based on significantly different regional fragmentation on at least one chromosome arm.

Fig. 2figure 2

Differences in regional fragmentation between cfDNA from CRC patients and healthy controls. A Heat map showing the z -scored of S/L-ratios in 100 kb bins of each sample compared to healthy controls. B Significant differences in z-scored S/L-ratios between samples collected from CRC patients with clinically diagnosed tumor burden and healthy controls were observed on multiple chromosome arms. (*: p-value ≤ 5*10–2, **: p-value ≤ 1*10–2, ***: p-value ≤ 1*10–3, ****: p-value ≤ 1*10–4)

The differences in regional fragmentation between CRC patients and healthy controls support recent findings identifying cfDNA fragmentation as independent biological feature representing chromatin profiles of the cells of origin [19, 20].

Combination of global and regional fragmentation analysis using machine learning

To test whether machine learning (ML) classifier based on global fragmentation and regional fragmentation in 5 Mb bins increase accurate detection of ctDNA, we trained four ML algorithms using 100 bootstrapping iterations with fivefold cross-validation (see Materials and Methods). For each iteration the prediction of the best model was stored and predictions for the two classifiers based on global and regional fragmentation were combined within a supervised meta-learner [20]. Samples collected from CRC patients with clinically diagnosed tumor burden (n = 134) served as positive cohort, and healthy individuals, including samples collected from patients in remission more than six weeks post-treatment without any known recurrence at a later time point (n = 63) served as control cohort for a better representation of biological variability (Additional file 1: Figure S1). All classifiers showed high prediction performance to distinguish cfDNA from CRC patients and healthy controls, with receiver operating characteristic (ROC) area under the curve (AUC) values of up to 94% and sensitivity at 95% specificity of up to 70% (Fig. 3A). Since our ultimate goal was to develop a workflow applicable in clinical practice, we trained a final model based on the best performing ML algorithm for each feature set. Evaluating the performance of ML classifiers using only the support vector machine, we observed ROC AUC values and sensitivity at 95% specificity of up to 95% and 75%, respectively (Fig. 3B). Eventually, we trained final ML models for both feature sets as well as the meta-learner including all data of CRC patients (n = 134) and controls (n = 63) without further subsetting. Applying these models with 95% specificity, ctDNA presence was correctly predicted in 36% (48/134) of samples based on global fragmentation (34/91 metastatic, 14/43 localized), and in 90% of samples based on regional fragmentation (121/134: 85/91 metastatic, 36/43 localized) and based on the meta-learner (120/134: 84/91 metastatic, 36/43 localized). However, also samples collected from patients in remission, especially within the first six weeks post-surgery were classified as ctDNA positive (Fig. 3C). These results in combination with the findings above indicate that the non-genetic cfDNA features analyzed within LIFE-CNA are not informative for the correct identification of ctDNA within the first six weeks post-surgery. However, the effects of surgery on cfDNA fragmentation seem to normalize after six weeks, indicating a potential use for recurrence monitoring starting at this time point.

Fig. 3figure 3

Performance of ML classifiers based on global and regional fragmentation as well as a meta-learner. Performance was assessed over 100 bootstrapping iterations with fivefold cross validation A using the best performing model out of four classifiers for each iteration and B only a support vector machine over all iterations. C The three final classifiers detect ctDNA in CRC patients with high sensitivity

CRC-specific active chromatin for ctDNA detection

We evaluated whether CRC specific chromatin signatures can be detected based on coverage changes using the LIQUORICE tool [20] and whether these chromatin signatures represent an independent marker for ctDNA detection. Specifically, we analyzed five sets of enhancer regions identified to be active in CRC including (i) active distal ChromHMM-defined [32] enhancer regions, (ii) CRC-specific gained enhancers identified by Hi-C [33], (iii) gained enhancers occupied by the transcriptional coactivators YAP/TAZ, (iv) highly conserved enhancers occupied by YAP/TAZ, and (v) active transcriptional start sites (TSS) in CRC [26]. In addition, we analyzed the coverage in universal DHS. In total, we observed significantly stronger coverage drops in all region sets in samples collected from CRC patients compared to healthy controls. In 5% (3/55) of healthy controls significantly stronger coverage drops in one of the analyzed region sets were detected when comparing the coverage to all 54 other healthy control samples. Therefore, to ensure a specificity of ≥ 95% for ctDNA detection based on the coverage in CRC-specific active chromatin regions, significantly stronger coverage drops need to be identified in at least two of the analyzed region sets rather than one. Overall, we detected ctDNA based on differential coverage in 33% (44/134) of samples collected from CRC patients with clinically diagnosed tumor burden (Additional file 2: Table S6). However, we obtained similar values [32% (12/37)] for remission patients and [33% (3/9)] for remission patients more than six weeks post-treatment. Taken together, coverage-based chromatin site analysis for ctDNA detection is suitable at diagnosis, but not for recurrence (also not > 6 weeks).

Quantification of the ctDNA fraction in CRC patients

To quantify the ctDNA fraction as a complement to fragmentation and coverage-based chromatin site analysis, we used the ichorCNA tool [17], which led to correct prediction of ctDNA in only 35% (47/134) of samples with clinically diagnosed tumor burden, even when selectively enriching for ctDNA-associated 90–150 bp fragments (Additional file 1: Figure S3) [18, 20].

Detection of genome-wide and focal SCNAs in CRC patients

To identify genome-wide and focal SCNAs we applied a combination of the Illumina DRAGEN CNV workflow and Plasma-Seq [15, 22], considering ctDNA-associated 90–150 bp fragments (Additional file 1: Methods, Figures S4 and S5) [18, 20]. We analyzed paired tumor tissue and plasma samples collected at diagnosis to validate the SCNA pipeline. To correct for germline CNVs, constitutional DNA from saliva was additionally analyzed. In 44% (12/27) of patients with localized- and in 94% (15/16) of patients with metastatic CRC genome-wide SCNA profiles were highly concordant to the corresponding tissue. SCNAs unique to plasma were identified in 78% (21/27) of patients with localized- and 82% (13/16) of patients with metastatic CRC (Fig. 4A). In addition, we identified focal SCNAs in plasma matching tumor tissue in 4% (1/27) of patients with localized-, and in 63% (10/16) of patients with metastatic CRC, and focal SCNAs only in plasma in 15% (4/27) of patients with localized-, and in 63% (10/16) of patients with metastatic CRC (Fig. 4B). Certain genetic events found in plasma may not be present in tumor tissue because of the representation of only one site of the entire tumor mass rather than the complete tumor heterogeneity including metastatic sites. It is likely that low amplitude SCNAs may not be detected in plasma since ctDNA represents only a fraction of total cfDNA. Overall, although some SCNAs might be missed in plasma, with our approach we are able to detect genome-wide SCNAs in plasma from CRC patients over all stages, including subclonal events not identified in tumor tissue.

Fig. 4figure 4

Matched plasma and tumor analysis. To validate the SCNA analysis integrated in LIFE-CNA we performed a matched analysis of plasma samples collected at diagnosis with tumor tissue. A Total SCNAs present in plasma (red) or tumor (blue) only or in both plasma and tumor (yellow) and B focal SCNAs present in plasma (pink) or tumor (violet) only or in both plasma and tumor (green) present on each chromosome for individual patients and summarized over all patients below. Since more than one SCNA can be present per chromosome, it is possible that on the same chromosome different SCNAs are detected in plasma only, tissue only or in both plasma and tumor tissue

Complementary ctDNA detection by combining cfDNA features

Based on our results showing that global and regional fragmentation as well as chromatin signatures, and SCNAs are capable to independently detect ctDNA, we compared the sensitivity of all features in CRC patients in general and across stages considering the time point of sample collection in the course of disease (Fig. 5A, B).

Fig. 5figure 5

LIFE-CNA enables accurate disease monitoring in CRC patients. SCNAs, focal SCNAs (foc. SCNA), tumor fraction in all (tum. frac.) and filtered fragments (tum. frac. short), enrichment in fragments from 90 to 150 bp (glob. frag.), regional fragmentation (reg. frag.), and significantly stronger coverage drops (low cov.) were analyzed with LIFE-CNA. In addition ctDNA was predicted with machine learning classifiers based on global (ML glob. frag.) and regional fragmentation (ML reg. frag.), and a meta-learner (ML Meta.) integrated into LIFE-CNA. To assess performance of LIFE-CNA, hotspot variants (SNVs) cfDNA concentration (cfDNA) and CEA were analyzed A in samples from CRC patients collected at different time points during disease summarized over all samples and B stratified by disease stage. C LB-CRC-32 was used as one example to show response and resistance to treatment throughout the course of disease

Regional fragmentation and coverage in active chromatin enabled ctDNA detection in 77% (33/43) and 23% (10/43) of patients with localized- and in 74% (67/91) and 37% (34/91) of patients with metastatic CRC with clinically diagnosed tumor burden, respectively. As expected, increased numbers of called SCNAs as well as elevated tumor fractions (Additional file 1: Data) were mainly observed in patients with metastatic CRC (57%, 52/91 vs. 26%, 11/43 and 45%, 41/91 vs. 14%, 6/43, respectively). Enriched short cfDNA fragments enabled ctDNA detection only in a small number of patients with metastatic CRC (19%, 17/91). Considering the three ML classifiers integrated in our LIFE-CNA workflow, we observed that the classifiers based on regional fragmentation and the meta-learner have a higher sensitivity for ctDNA detection (90%, 121/134 and 120/134, respectively), compared to the classifier based on global fragmentation (36%, 48/134). However, when focusing on samples collected within the first six weeks post-surgery, we observed ctDNA predictions with all non-genetic cfDNA features besides the global fragmentation, with the highest numbers of 68% (25/37) being with the ML classifiers based on regional fragmentation and the meta-learner. When focusing on only those samples collected from patients in remission more than six weeks post-treatment ctDNA detection rates decreased.

LIFE-CNA for accurate treatment monitoring in CRC patients

The analysis of multiple ctDNA features improves the sensitivity of untargeted ctDNA detection. To assess the clinical validity of LIFE-CNA for disease monitoring, we assessed changes of our measures over a median follow-up time of 7.5 months (range 1–35.5 months) in 15 patients and correlated these changes with treatment outcome as a proof-of-concept (Additional file 2: Table S6). In addition to LIFE-CNA, we analyzed the commonly used serum protein marker CEA, plasma cfDNA concentration, and SNVs for patients with available hotspot variant data (n = 5). We were able to predict response to treatment in 77% (10/13) of patients (7/7 metastatic, 3/5 localized) by decreasing numbers of SCNAs, normalizing regional or global fragmentation, and/or normalizing coverage in regions of interest. CEA was informative in only 25% (3/12) of patients in two of those patients ~ 2 months later than LIFE-CNA, and decreasing plasma cfDNA concentrations could be correlated to treatment response in only 46% (6/13) of patients in one of those patients ~ 1 month later than LIFE-CNA. Further, LIFE-CNA correctly predicted progressive disease in 100% (5/5) of patients up to four months before clinical evidence with increasing differences to healthy controls of all analyzed cfDNA features. CEA was informative in only 80% (4/5) of patients in one of those patients ~ 3.5 months later than LIFE-CNA and cfDNA concentration was informative in only 20% (1/5) of patients ~ 9 months later than LIFE-CNA, respectively (Additional file 1: Figures S6–S20). For example, response and resistance to treatment could be detected with LIFE-CNA in patient LB-CRC-32 up to five and three months before clinical evidence, respectively (Fig. 5C). CEA identified response to treatment > 2 months later and resistance to treatment in parallel to LIFE-CNA. Although, decreasing cfDNA concentration was associated with response to treatment, at the time of progression no increase could be observed which is in line with previous reports showing low sensitivity and specificity of cfDNA concentration for treatment monitoring [34]. For SNVs, response to treatment could be identified in 3/4 samples, whereas no data were available to evaluate changing SNV levels for progression detection.

LIFE-CNA for cancer screening but not for MRD

To analyze whether LIFE-CNA could be applied for the detection of MRD post-surgery, plasma samples of 33 CRC patients collected up to 8 days pre-surgery and follow-up samples collected between 1 and 42 days post-surgery were analyzed (Additional file 1: Figure S21). Pre-surgery, we detected ctDNA in 92% (22/24) of patients with localized- and in 89% (8/9) of patients with metastatic CRC. Post-surgery, ctDNA was identified in 96% (23/24) of patients with localized- and in 100% (9/9) of patients with metastatic CRC, in particular due to the classifiers based on regional fragmentation and the meta-learner. Further, significant differences in coverage were observed in a large number of post-surgery samples (Additional file 1: Figures S21–S54). Decreasing ctDNA predictions more than six weeks post-treatment might enable the application of LIFE-CNA for recurrence monitoring (Fig. 5A&B, turquoise: remission more than six weeks post-treatment). In addition, the high sensitivity of ctDNA detection at diagnosis of patients with localized CRC (92%) suggests the great potential of LIFE-CNA for cancer screening.

Proof-of-principle of LIFE-CNA using six healthy controls and in silico dilutions

We evaluated the specificity of all cfDNA features by analyzing six additional healthy controls not included in the reference set. Of all analyzed cfDNA features only differential regional fragmentation was detected in 1/6 healthy controls while the remaining cfDNA features did not indicate ctDNA (Fig. 6). The ML classifiers based on regional fragmentation and the meta-learner, predicted ctDNA in 2/6 healthy controls. These results indicate low specificity of the regional-fragmentation and meta-learner based classifiers for ctDNA detection.

Fig. 6figure 6

Proof-of principle showing the high sensitivity of LIFE-CNA. Focal SCNAs (foc. SCNA), tumor fraction (tum. frac.), tumor fraction in 90 to 150 bp fragments(tum. frac. short), enrichment in fragments from 90 to 150 bp (glob. frag.), differential regional fragmentation (reg. frag.), significantly stronger coverage drop in at least to region sets (low cov.), classifier based on global fragmentation (ML glob. frag.), classifier based on regional fragmentation (ML reg. frag.), and classifier based on meta-learner (ML Meta.) were analyzed in six additional healthy controls not included in the panel of normals and in in silico dilutions with 0.5%, 1%, 2.5%, 5% and 10% tumor fraction as a proof-of-principle for ctDNA detection using LIFE-CNA

In addition to specificity, we also assessed the sensitivity of LIFE-CNA for the detection of low ctDNA levels using in silico dilutions with tumor fractions of 0.5%, 1%, 2.5%, 5% and 10% (Additional file 2: Table S7). Analogous to disease monitoring, also for the in silico dilutions we observed the highest sensitivity for ctDNA detection based on regional fragmentation that correctly identified ctDNA in 4/5 samples with 0.5% tumor fraction and in all samples with 1% tumor fraction. At 0.5% tumor fraction, elevated tumor fractions based on ichorCNA and significant enrichment of short fragments could be predicted in one sample. Further, SCNAs could be detected in 4/5 samples with 2.5% tumor fraction. These results indicate that the sensitivity of our SCNA analysis could be increased compared to the previously described required tumor fractions above 5% to 10%. Focusing on the ML classifiers for ctDNA prediction, it was not possible to detect ctDNA based on global fragmentation in any of the in vitro dilutions. Using the classifier based on regional fragmentation, we detected ctDNA in 1/5 samples with 1% tumor fraction.

留言 (0)

沒有登入
gif