Evolution of synchronous female bilateral breast cancers and response to treatment

The research complies with all relevant ethical regulations and was approved by the institutional review board of Institut Curie (Breast Group) on 6 July 2016.

Patients and treatments

We identified a cohort of 17,575 female patients with non-metastatic breast cancer treated at the Institut Curie (Paris and Saint-Cloud, France) between 2005 and 2015 in the institutional database (CNIL no. 1766392-v1; data collection and storage in REDCap 12.4.14 and MACRO version 3). Patients were treated according to local guidelines. When indicated, chemotherapy was administered in a neoadjuvant or adjuvant setting; endocrine therapy was indicated in the case of positivity for hormone receptor and according to prognostic factors; and patients with HER2+ tumors received neoadjuvant and/or adjuvant trastuzumab from 2007 onwards. This study was approved by the Breast Cancer Study Group and by the institutional review board of Institut Curie and was conducted in accordance with institutional and ethical rules regarding research on tissue specimens and patients. sBBCs were defined as the occurrence of primary tumors occurring in both breasts with a time interval no greater than 6 months. Metachronous breast cancers, defined as a time interval greater than 6 months between the diagnoses of the first and second tumors, were not included in the current study. Patients with exclusive ductal carcinoma in situ (DCIS) in one of the two sides were excluded from the analyses. Written informed consent was obtained for all patients included in the genomic analyses. No participant received any compensation. In the cohort of patients with sBBCs treated with NAT, regimens were as follows: neoadjuvant chemotherapy (n = 50, 46 of whom received anthracyclines/taxanes-based sequential regimen) or neoadjuvant endocrine therapy (NET, n = 20, 18 of whom received aromatase inhibitors).

Tumor samples, subtype of breast cancer and pathological review

In accordance with guidelines used in France46, cases were considered ER or PR positive if at least 10% of the tumor cells expressed ERs and/or PRs. HER2 expression was determined by immunohistochemistry with scoring in accordance with American Society of Clinical Oncology (ASCO)/College of American Pathologists (CAP) guidelines47. Scores 3+ were reported as positive; scores 1+/0 were reported as negative (−). Tumors with scores 2+ were further tested by fluorescence in situ hybridization (FISH). HER2 gene amplification was defined in accordance with ASCO/CAP guidelines.

The subtype of breast cancers were defined as follows: tumors positive for either ER or PR and negative for HER2 were classified as luminal; tumors positive for HER2 were considered HER2+ breast cancer; tumors negative for ER, PR and HER2 were considered to be TNBCs. In case of multiple tumors in the same breast, the subtype classification was made based upon the tumor of the largest diameter.

Bulk tumor specimens—and the corresponding pretreatment core needle biopsy specimens in case of neoadjuvant treatment—were reviewed by an expert in breast pathology (M.L.). All tumoral tissues studied were formalin-fixed, paraffin-embedded (FFPE) samples.

In the cases with bilateral invasive carcinoma, TILs were reviewed specifically for the purposes of this study, between September 2016 and March 2017. In accordance with the recommendations of the international TILs Working Group25, we checked for presence of a mononuclear cell infiltrate in the stroma on H&E-stained sections without additional staining, after excluding areas around DCIS, and tumor zones with necrosis and artifacts. Infiltrates were scored on a continuous scale, as the mean percentage of the stromal area occupied by mononuclear cells. We also assessed intratumoral TIL levels and TIL levels after NAT. Intratumoral TILs were defined as intraepithelial mononuclear cells within tumor nests or in direct contact with tumor cells, and stromal TILs were defined as mononuclear inflammatory cells within intratumoral stromal area and were reported as percentage of stromal area. After NAT, we assessed TIL levels within the borders of the residual tumor bed, as defined by the residual cancer burden (RCB)48 and as recommended by the TILs Working Group26. To ensure that the mononuclear cells infiltrate considered as TILs in the analyses indeed corresponded to lymphocytes, we carried out CD3+ immunostaining on a subset of 24 specimens, which strongly correlated with the levels of unstained TILs (Supplementary Fig. 9). We defined pCR as the absence of invasive residual tumor from both the breast and axillary nodes (ypT0/is N0).

Concordance between the tumors of sBBC pairs

We evaluated the concordance of the clinical and pathological characteristics between the two tumors within the same patient. Pairs of sBBCs composed of tumors of the same subtype of breast cancers were classified as concordant and, where otherwise, classified as discordant. As breast cancer subtype is known to be the main determinant of tumor biology (notably tumor grade and proliferation), immune infiltration, response to anti-cancer agents and, ultimately, oncologic outcomes, we displayed the results of the clinical section of the paper according to the concordance.

Independent datasets used in this study

To validate our findings, we analyzed different types of data from three validation cohorts:

Validation cohort 1 was a breast cancer dataset from the SEER program of the National Cancer Institute, which collects data on cancer diagnoses, treatment and survival for 35% of the US population. Based on 396,179 SEER medical records of patients with breast cancer diagnosed in the 2010–2016 period, we identified 8,367 patients with sBBCs (n = 16,734 tumors), defined as two breast cancers diagnosed within 6 months of the primary index diagnosis. This cohort was used to describe the concordance of the breast cancer subtype of the pairs of SBBCs and their relative repartition. To assess response to NAC, we used a subcohort of patients treated with NAC (n = 467 patients, representing 934 tumors). We calculated the axillar response rate to NAC, defined as the proportion of patients with a post-NAC number of positive nodes >1 divided by the total number of patients. No data were available on tumor immune infiltration. The use of the SEER database as a validation cohort was approved by the institutional review board of Institut Curie, and access to data followed the standard request access to SEER data.

Validation cohort 2 was obtained from the GBG and was composed of 105 patients with sBBCs treated within four NAC published trials (GeparTrio41 (NCT00544765), GeparQuattro42 (NCT00288002), GeparQuinto43 (NCT00567554) and GeparSixto44 (NCT01426880) trials). Validation cohort 2 was used to validate data on immune infiltration and response to NAT. The GeparTrio, GeparQuattro, GeparQuinto and GeparSixto trials were approved by ethics committees, and all patients consented to the reuse of their data.

Validation cohort 3 was an in-house cohort composed of patients diagnosed with sBBC at the Institut Curie/René Hugunin between 1984 and 1998 and with bilateral frozen material on bulk tumor specimens of sufficient quality to perform whole-exome analysis. Eight patients with sBBCs had left and right tumor sequencing, but both left and right samples were of sufficient quality for seven patients only. This retrospective study on human biological samples and clinical data collected during care was validated by the Institut Curie institutional review board on 5 July 2021 under reference DATA210188. As per French Public Health Law (art L 1121-1-1 and art L 1121-1-2), written consent is not required for human non-interventional and retrospective studies.

Sample preparation and next-generation sequencing analyses

DNA and RNA were obtained from the Biological Resource Center of Institut Curie. After selecting patients treated with NAC, tumors from which sufficient frozen material from both left and right tumors was available in the institutional tissue bank both before treatment (defined as PT) and after treatment (in case of RD) were included. Fresh-frozen samples were subjected to genomic DNA extraction and DNA qualification using the QuBit system.

DNA pre-processing

One microgram of genomic DNA from each sample was subjected to shearing using the Covaris system, and Illumina-compatible libraries were performed according to Agilent SureSelect XT2 library protocol consisting in repairing DNA ends and ligating Illumina barcoded adapters, followed by polymerase chain reaction (PCR) amplification. Libraries were pooled in equimolar condition before being hybridized on dedicated biotinylated RNA probes targeting whole-exome sequences (Agilent Human All Exon V5 capture probes). After selection using streptavidin beads and PCR amplification, enriched library pools were subjected to qPCR quantification using the KAPA Library Quantification Kit (Roche). Sequencing was carried out on the HiSeq 2000 instrument from Illumina based on a 2 × 100 cycles mode (paired-end reads, 100 bases) using high-output flow cells to produce over 50 million and 170 million paired-end reads for 30× (germline) and 100× (tumors), respectively.

DNA sequencing

Samples were sequenced to a median depth of coverage of 153 reads, with 95% of exonic bases passing 50× coverage. Reads were aligned on the human genome reference hg19/GRCh37 by Burrows–Wheeler Aligner49 version 0.7.5a; filtering of reads was based on target intersection, mapping quality and PCR duplicate removal, using Picard50, BEDTools51 and SAMtools52, and pre-process using GATK53 for local realignment around indels and base score recalibration. Preliminary variant calling was performed using Mutect2 (ref. 54) for tumor samples and haplotype caller55 for normal samples. Germline mutations are reported in Supplementary Table 12. SuperFreq version 1.3.2 (ref. 56) performed annotation and filtering of somatic indels and single-nucleotide variants (SNVs), copy number and purity estimation and subclonal reconstruction, using SNVs, indels and CNAs. SuperFreq was run to analyze together the samples of the same side for each patient. We performed an additional filtering of alterations present in either dbSNP57 or ExAC58 at a frequency greater than 0.2 or after manual review of the alignment on the Integrative Genomics Viewer or a Somatic Score computed by SuperFreq greater than 0.5. Two samples (RD4_R and RD6B_R) were discarded from the analyses owing to insufficient quality criterion (purity <0.3 and relative difference between the number of mutations in the PT and in the RD superior to 25%).

We computed the pairwise cosine similarity among the copy number calls at the gene level. To ensure a relevant computation, we subtracted the reference copy number (that is, 2) from the calls. The metric then reflects losses and gains, with some tolerance to the exact copy number. We used the function ‘cosine_similarity’ from the module ‘sklearn.metrics.pairwise’ in the package scikit-learn version 0.21.3 (ref. 59).

Somatic mutations interpretation

Somatic variants were annotated using VEP (version 104)60. A variant was denoted as driver if the mutation was present as a splice site, a nonsense, a frameshift or a non-synonymous SNV or indel in COSMIC census gene. The percentage of shared mutations in pairs (pair left–right; pair PT–RD; pair of two multicentric tumors of the same side) was defined as the intersect between the mutations present in both tumors × 2 divided by the sum of the mutation in the two tumors of the pair × 100.

Mutational signature deconvolution

The contribution of mutational signatures to individual tumor samples was explored using the signatures deconvoluted by Alexandrov et al.29 and referenced in the COSMIC database. We restricted the analyses to the 13 signatures previously evidenced in breast cancer. Signature activities were estimated using the decompTumor2Sig algorithm61 in the musicatk (version 1.0.0) R package62. The percentage of mutational signatures was calculated by summing the relative contribution of each signature in PT samples to the whole tumor spectrum, divided by the number of sampled, and the result was multiplied by 100. To avoid overrepresentation of patient 6 to the cohort and because signature profiles were similar on each side, we averaged the values of the left side on the one hand and the values of the right side on the other hand.

Neoantigens

The VCF files were converted to FASTA format and annotated using the convert2annovar.pl (which converts ‘genotype calling’ format into ANNOVAR format) and annotate_variation.pl (which classifies them as intergenic, intronic, non-synonymous single-nucleotide polymorphism (SNP), frameshift deletion or large-scale duplication) scripts in the ANNOVAR tool. We next used the ANNOVAR coding_change.pl script to infer mutated protein sequence to determine potential premature stop codons. Using these mutated amino acid sequences predicted from annotated non-synonymous variant calls, the peptide sequence surrounding the amino acid corresponding to the new variant was extracted for epitope prediction. As binding of the epitopes to major histocompatibility complex class I (MHC-I) is dependent on the patient-specific HLA alleles, HLA haplotypes were cross-referenced with HLA haplotypes determined by Seq2HLA, before executing netMHCpan for neoantigen prediction, outputting strong and weak binders. Only strong binders were used for neoantigen analysis.

RNA pre-processing

Total RNA extracts from tumor samples were subjected to quality control on a Bioanalyzer instrument, and only RNA with RNA integrity number (RIN) >7 were used for sequencing. RNA quantification was achieved using absorbance at 260 nm with a NanoDrop spectrophotometer. RNA-seq libraries were prepared from 1 µg of total RNA using the Illumina TruSeq Stranded mRNA Library Preparation Kit (following the provider’s instructions), which allows performing a strand-specific sequencing. In brief, a first step of polyA+ RNA selection using oligodT magnetic beads is done to focus sequencing on polyadenylated transcripts. After fragmentation, cDNA synthesis was performed, and resulting fragments were used for dA-tailing and then ligated to the TruSeq indexed adapters. PCR amplification is finally achieved to create the final cDNA library. After qPCR quantification using the KAPA Library Quantification Kit (Roche), sequencing was carried out using 2 × 100 cycles (paired-end reads 100 bases).

RNA-seq

Sequencing was performed by multiplexing barcoded libraries with the Illumina HiSeq2000 instrument using high-output flow cells to obtain 100 million paired-end reads per sample. Alignments were performed on human reference sequences using TopHat63 version 2.0.621. Reads with mapping quality <20 and reads marked as duplicates by Picard version 1.97 were excluded from further analysis. Gene-level read counts were obtained using HTSeq-count64 and RefSeq hg19/GRCh37. RNA-seq data are provided as raw counts in Supplementary Table 11.

Selection of the genes with the most variant expression, clustering and PCA

We selected the most variant genes, based on the inflection point of the interquartile range (IQR) distribution for gene expression. The gene expression was previously rlog-transformed with DESeq2 (1.22.1)65. This method is more data-driven than a fixed threshold to define the proportion of genes with the highest level of variation. For each gene, we applied the following procedure: (1) we calculated the IQR for all samples; (2) we sorted the IQR values of the genes in ascending order, to generate an ordered distribution; (3) we estimated the major inflection point of the IQR curve as the point on the curve farthest away from a line drawn between the start and end points of the distribution; and (4) we retained genes with an IQR higher than the inflection point. Hierarchical clustering was performed with Pearson correlation and Ward linkage. We next performed PCA to reduce dimensionality using the 3,000 most variable genes.

CIBERSORT

CIBERSORT30 is an analytical tool quantifying the levels of distinct immune cell types within a complex gene expression mixture. We applied the original CIBERSORT gene signature LM22 defining 22 immune cell subtypes to all the samples of the cohort, the number of permutations being set to 100 and the mode being set to ‘absolute’. For each immune subpopulation, (1) we calculated the difference between a given sample and the rest of the cohort; (2) we squared the result; and (3) we summed the difference between this patient and each other sample of the cohort, resulting in a dissimilarity index. We displayed the overall results on a correlogram.

Multiplexed immunofluorescence and immunohistochemistry

To characterize the microenvironment of these samples using an orthogonal experimental approach, we performed immunofluorescence stainings using two antibody panels (panel 1: CD8 CD45RO CD20 CD4 and FoxP3; panel 1: CD4 – CD8 – CD45RO – FOXP3 – CD20 – pan-cytokeratin; Extended Data Fig. 7a–f; panel 2: CD68 – CD163 – CD138 – CKIT – pan-cytokeratin; Extended Data Fig. 7g,h) to assess the concordance between the immune subpopulations inferred by gene expression and the number of cells stained on pathologic slides. We performed multiplexed immunofluorescence staining on the 20 samples of the cohort (14 microbiopsies and six residual tumor samples). Immunostaining was processed in a Bond RX automated (Leica) with Opal 7-Color IHC Kits (Akoya Biosciences, NEL821001KT) according to the manufacturer’s instructions. We used two different panels of antibodies: panel 1: CD4 (clone EP204) – CD8 (clone C8/144B) – CD45RO (clone 2B11 + PD7/26) – FOXP3 (clone 236A/E7) – CD20 (clone L26) – pan-cytokeratin (clone AE1/AE3); panel 2: CD68 (clone KP1) – CD163 (clone 10D16) – CD138 (clone MI15) – CKIT (polyclonal) – pan-cytokeratin (clone AE1/AE3). After applying DAPI for visualization of nuclei, slides were mounted and cover-slipped. Multiplexed slides were scanned using the Vectra 3 automated quantitative pathology imaging system (Vectra 3.0.5, Akoya Biosciences), and images were analyzed using inForm Advanced Image Analysis Software (inForm 2.6.0, Akoya Biosciences). We used the total number of stained cells for an antibody to perform the correlation with immune subsets inferred by the deconvolution, with the following correspondence: CD8+ cells, cytotoxic T cells; CD4+/FOXP3+ cells, T regs; CD45RO+ cells, T cells memory activated; CD20+ cells, B cells naive; CD68+/CD163+ cells, macrophage M2; CD69+ cells, macrophages M0, M1 and M2; CD138+ cells, plasma cells.

TCR sequencing analysis

We applied the MixCR algorithm on RNA-seq data of 20 samples to identify and quantify TCR beta chain CDR3 sequences. MiXCR (version 2.1.5)31 was used with its default parameters to extract and quantify TCR beta chain CDR3 sequences from RNA-seq FASTQ files. From the MiXCR output, we obtained for each sample the total number of distinct TCR beta clones and the number of reads supporting each clone, and we normalized the result by the total number of reads. We used immunarch32 to calculate quantitative descriptors of both the diversity and sharing of the TCR beta chain repertoire. For the estimation of repertoire diversity, we calculated Chao-1, a non-parametric asymptotic estimator of species richness and the D50 diversity index, representing the number of clonotypes occupying the 50% of repertoires. For repertoire similarity, we calculated the total number of shared clones between samples against ‘public’ clonotypes.

Statistical analysis

The study population was described in terms of frequencies for qualitative variables or medians and associated ranges for quantitative variables. Chi-squared or Fisher tests were performed to search for differences between subgroups for each variable (considered significant for P ≤ 0.05). Continuous variables were compared between groups in Wilcoxon–Mann–Whitney tests for groups of fewer than 30 patients and for variables following multimodal distributions. Student t-tests were used in all other cases. Pre-NAC and post-NAC TIL levels were analyzed as continuous variables. In case of categorical variables, the kappa coefficient60 was computed as a measure of concordance between the left and right side of a same patient (varying from −1 as absolute discordance to + 1 as absolute concordance and 0 as absence of concordance); otherwise, in case of numeric or integer variables, the Kendall test was used. Correlations between continuous variables were calculated using Spearman coefficient. Factors predictive of pCR were introduced into a univariate logistic regression model. A multivariate logistic model with a forward stepwise selection procedure was then applied; the covariates included having a likelihood ratio test P ≤ 0.05. The same approach was used for univariate and multivariate linear regression models.

All P values no greater than 0.05 were considered significant. In the case where we tested the hypothesis of potentially different effects of the concordant or discordant status of the tumor pair regarding the subtype of breast cancer on immune infiltration or response to treatment, we included an interaction term in a linear regression model or logistic regression model, respectively. A P value of 0.10 was selected to determine the statistical significance of the interaction term, as it has been suggested due to a low power of the test in the interaction setting. When appliable, all statistical tests were two-sided.

RFS was defined as the period from surgery to death, locoregional recurrence or distant recurrence, whichever came first. Patients who did not have any of these occurrences documented were censored at the last known contact date. The Cox proportional hazards model was used to determine hazard ratios associated with RFS and associated 95% confidence intervals. In the univariate analysis, variables with a P value for the likelihood ratio test of 0.05 or below were chosen for inclusion in the multivariate analysis. The final multivariate model was built using a forward stepwise selection approach with a significance criterion of 5%. In box plots, lower and upper bars represent the first and third quartiles, respectively; the medium bar is the median; and whiskers extend to 1.5 times the IQR. Data were processed and statistical analyses were carried out with R software version 4.2.1 (www.cran.r-project.org).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

留言 (0)

沒有登入
gif