Ganciclovir-induced mutations are present in a diverse spectrum of post-transplant malignancies

Target panel sequencing cancer patient cohort

A set of 121,771 cancer patient samples with somatic mutation and limited clinical information were obtained from AACR Project Genomics Evidence Neoplasia Information Exchange (GENIE) release 11-public [9] (n = 104,264) and FoundationMedicine (FM) [10] (n = 17,507). AACR Project GENIE is a publicly accessible database of real-world cancer genomics panel sequencing data assembled from cancer centres worldwide and includes the most common cancer types. The GENIE cohort consisted of a mixture of data from 93 different panels ranging from 6 to 1422 genes (average 544), spanning 6 bases to 9.95 megabases (average 1.06 megabases), with the number of mutations ranging from 0 to 9364 (average 7.938). The FM cohort consists of adult solid tumour samples that underwent genomic profiling on a single uniform platform with 287 genes, spanning 0.83 megabases as part of standard clinical care. The number of mutations in the cohort ranged from 1 to 366 (average 6.787) (see Additional file 1: Table S2 for detailed cohort characteristics). For patients with multiple samples in the database, only the most recent sample, indicated by participant age, was included. This combined dataset is used to identify the presence of GCV-associated signature across different cancer types.

Accessibility analysis for COSMIC census genes

Somatic mutation data was obtained for a set of recurrently mutated genes from the COSMIC census v77 database [11]. Genes with candidate hotspot mutations were identified by selecting those genes with high variance in mutation counts across the gene. We restricted our analysis to the 50 genes with the highest variance and focused on mutations with at least 100 occurrences in COSMIC. TP53 was removed because the volume of hotspot sites dominated the analysis. We used the mutational signature for GCV (GCVsig, derived from H015 [8]) to assess the accessibility of hotspot mutations based on their trimer context. We also assessed accessibility at the level of each driver gene by summing the values across all hotspots in that gene.

Cell line and organoid treatment and whole-genome sequencing

CRC cell lines H414 and HCT116, murine myeloid cells (HoxA9-Meis1-transformed primary cells), and a normal human colon organoid line were used as models for GCV treatment. Clonal H414 and HCT116 cells were seeded in a human plasma-like medium (Gibco, A4899101), for whole-genome sequencing experiments, or Dulbecco’s modified Eagles’ medium, for all other experiments, supplemented with 10% foetal bovine serum 24 h prior to drug treatment. To test cell viability, these cells were treated with GCV (Abcam) or ACV (Abcam) ranging from 0.1 to 1000 μM. After 48-h incubation, the cell viability was assessed by trypan blue exclusion assay. For mutation analysis, cells were treated with 100 μM GCV, 100 μM ACV, and/or 1 μM MMF (Roche) for 48 h. A colony (2–4 cells) was then isolated and expanded, as single cell clones were found to be not viable after drug treatment. Following expansion, DNA was extracted for WGS with 150-bp PE sequencing at 30 × using an Illumina Novaseq sequencer.

Myeloid cell lines were generated by infecting E13.5 C57/BL6 liver cells with pMSCV-HoxA9-IRES-Meis1 (a gift from Guy Sauvageau). These cells were infected with pMIG (GFP +) and single cell cloned by plating in Dulbecco’s modified Eagles’ medium with 20% FBS, 0.3% BactoAgar, and 10 ng/mL mIL-3 (WEHI). Individual colonies were expanded and maintained in IMDM with 10% FBS and 10 ng/mL mIL-3. For growth inhibition experiments, myeloid cell lines were seeded and treated with ACV (Hospira AU) or GCV (Pharmaco AU) ranging from 0.00316 to 100 µM. To assess mutagenicity, clonal cultures were treated with GCV and ACV continuously for 13–15 days, then single cell cloned and expanded in liquid culture. Control samples were either untreated or cultured in vehicle (DMSO). DNA was extracted for WGS with 150-bp PE sequencing at 250–600 M reads per sample using an Illumina Novaseq sequencer.

The normal colon organoid was previously generated from pooled sigmoid organoid derived from a 40-year-old female CRC patient [12]. Briefly, bulk sigmoid organoid was trypsinized into single cells, passed through a cell strainer, and then loaded to a BD Influx™ cell sorter (Biosciences). Single cells were selected based on their physical size and molecular granularity. The sorted cells were serially diluted and seeded in Matrigel for expansion. The clonal organoids were manually picked at 1–2 weeks post-sorting. Fresh medium (advanced DMEM/F12, 1 × GlutaMax, 1 × HEPES, 1 × P/S, 50% Wnt3a, 10% RSPO-1, 10% Noggin, 1xB27, 50 ng/mL EGF, 200 ng/mL FGF10, 1 mM N-acetylcysteine, 1 nM gastrin, 2 µM A83-01) was changed every 2–3 days and organoids were passaged every week. Organoids were treated with 1 µM MMF (Roche), 20 μM GCV (Hainan Poly Pharm Co Ltd), or in combinations (1 µM MMF + 20 μM GCV or 1 µM MMF + 40 μM GCV) continuously for 4–6 weeks. The fresh medium was changed every 2–3 days and organoids were passaged every week. DNA was extracted from the bulk culture as the organoids were found to be not viable as single clones after drug treatment. WGS was performed with 150-bp PE sequencing at 30 × using an Illumina Novaseq sequencer.

The concentration for each cell line treatment is provided in Additional file 1: Table S3. A detailed description of the methods associated with cell viability, DNA damage response, and mass spectrometry assays are provided in Additional file 2: Supplementary Methods.

Mutation calling from whole-genome sequencing data

For H414 and HCT116 and normal human organoids, raw sequencing reads were aligned to the human (hg38) reference genome using bwa (v0.7.17-r1188) [13]. Mutations were called using MuTect2 (v4.2.5.0) [14] and Strelka2 (v2.9.7) [15] were run in paired normal-tumour mode, where the respective parental clone of each cell line was used as the “normal”, while the drug-treated or control cells were used as “tumour”. For MuTect2, mutation calls required support from 3 or more reads with at least one on each strand. The final set of mutations used for the downstream analysis required that a mutation was annotated as PASS by both MuTect2 and Strelka2 and that the mutation was not shared with any other sample.

For data from the murine cell line, reads were aligned to mm10 with bwa (v0.7.17-r1188) [13], and samples from the same clone were grouped for analysis with superFreq (v1.4.3) [16]. For signature analysis, we used somatic variants with somaticP > 0.5, read depth ≥ 15, VAF ≥ 0.25, and required VAF ≤ 0.05 in all other samples. Bases with at least 15 read depth were classified as callable. The median callable region for the clones was 2.0 Gbp (range 1.7 to 2.3 Gbp). Regions classified as Simple_repeat or Low_complexity by repeatMasker were excluded from the analysis.

Statistical analysis

To identify patients with evidence of GCV-associated mutagenesis, single nucleotide substitutions from each sample were represented based on trinucleotide substitution frequency [17]. The GCV-associated signature (GCVsig) was obtained from patient H015 [8] (Fig. 1A). The R package, sigfit [18], was used to identify samples that exhibited GCVsig. Briefly, sigfit applies Bayesian inference to fit known mutational signatures to an observed mutational spectrum. Sigfit was used to compute the contribution of mutational signatures (referred to as contribution score) from COSMICv3 with the addition of GCVsig for each patient.

Fig. 1figure 1

Identification of ganciclovir (GCV)-associated mutational signature from targeted sequencing cancer cohorts. A Trinucleotide mutational spectrum from an in-house colorectal cancer patient with GCVsig. B Identification of samples with GCVsig from patients from AACR Project GENIE and Foundation Medicine cohorts. One percent FDR is set at the sigfit GCVsig contribution score at which there is a 1% chance that the observed mutational spectrum arose from SBS18 (green line) or SBS38 (blue line). Samples with mutation contribution from GCV only or multiple mutational processes are also indicated. C Sample spectrum from one example patient from the AACR Project GENIE cohort with GCVsig. The mutational spectra of the other 21 GENIE + FM samples with GCVsig are shown in Additional file 3: Fig. S2. D Oncoprint of recurrent mutations in CA > AA context across 22 patients with GCVsig. Mutations from KRAS, HRAS, and NRAS were combined and labelled as RAS. Cancer type abbreviations: bladder cancers (BLCA), gastrointestinal epithelial cancers (GI), haematolymphoid malignancies (HEME), head and neck cancers (HNSC), sarcoma (SARC), skin cancers (SKIN), and other/unknown primary cancers (OTHER). E Mutational potential of known hotspot driver mutations from COSMIC cancer census genes, based on the ability of GCVsig to access trimer sequences (listed in descending order of accessibility)

To estimate the false discovery rate (FDR), we simulated trinucleotide mutational spectra based on COSMIC mutational signature SBS18, which shared the highest similarity with GCVsig among common COSMIC mutational signatures (Additional file 1: Table S4). A minor contribution from the age-associated SBS5 was also included in the simulation as a background process. To this end, mutation spectra consisting of 0.95 SBS18 and 0.05 SBS5 were simulated using the R rmultinom function. Sigfit was used to provide an estimate of GCVsig contribution for each simulated mutation spectrum. The process was repeated 1000 times for each mutation count ranging from 1 to 10,000. The 99th percentile GCVsig contribution score from the simulated SBS18 spectra was set as the 1% FDR cutoff for identifying samples with GCV-associated mutagenesis. A GCVsig cutoff was also calculated for SBS38 spectra, but this was only applied to skin malignancies as it has been specifically associated with UV damage [19].

As sigfit assigns uniformly low contribution scores across all signatures when it cannot confidently identify any distinctive mutational processes, samples with less than 10 mutations were excluded as our analysis lacks the ability to confidently detect GCVsig below this mutation count based on our 1% FDR threshold (Additional file 3: Fig. S1A).

While this analysis identified cancers where GCVsig was dominant, we noted that the FDR cutoff based on 0.95 SBS18 contribution would be too stringent for cancers with large contributions from multiple mutational processes. To this end, we further simulated SBS18 and SBS38 spectra at a range of SBS contributions (0.05–0.95, the remainder contribution being SBS5) across a range of mutation counts (5–250) to obtain the 99th percentile GCVsig contribution score from the simulated SBS18 or SBS38 spectra (Additional file 3: Fig. S1B-C). Using these values, we then used the following regression model to estimate the 1% FDR GCVsig score cutoff.

$$}^}}_}=_+__}+_\mathrm(X_})+__}+__}\mathrm(X_})$$

where \(_}\) indicates the SBS contribution fraction and \(_}\) is the mutation burden. The model provides a very good estimation of the actual GCVsig score (R2 = 0.98 and 0.85 for SBS18 and SBS38 respectively, Additional file 3: Fig. S1D-E) and enables the 99th percentile GCVsig cutoff score to be estimated without a computationally costly simulation for each sample. For individual samples with sufficient CA > AA mutations (> 10), the GCVsig contribution is estimated based on the fraction of CA > AA mutations and, along with the total mutation count, is input to the regression model for SBS18 and SBS38 to determine if the observed GCVsig score is above the estimated 99th percentile GCVsig score cutoff. These cutoff values are listed for each sample in Additional file 1: Table S5. To test if the GCVsig of the sample passed this cutoff, the SBS contribution is estimated using the fraction of CA > AA mutations in the sample.

To ensure that our FDR cutoff is robust, particularly for samples with low number of mutations, we further estimated the FDR of GCVsig−positive samples for the cohort. To do this, we randomly sampled mutational spectra from the cohort-wide average and calculated the GCVsig contribution score and whether it is above 1% SBS18 or SBS38 FDR as described above. This whole process was repeated 10 times.

Finally, to further account for multiple testing correction, we calculated p-values for each sample by comparing the difference between the expected and the observed frequency of mutations attributed to GCVsig. This frequency and the frequency of non-GCVsig mutations are calculated respectively as,

$$_}=\mathrm}^}}_}X}_}\ \mathrm\ _-\mathrm}=(1-\mathrm}^}}_})X}_}$$

Under the null hypothesis, the expected frequency of GCVsig is based on the cohort-wide average mutational spectrum simulated 1000 times. The chi-square test is then used to compute the p-value, which is then multiple testing corrected by the Benjamini–Hochberg procedure.

For analysis of significance relating to cell line and organoid models, two-sided Student’s test, ratio t-test, or Fisher’s exact test was used as appropriate.

留言 (0)

沒有登入
gif