Single-cell exome sequencing reveals polyclonal seeding and TRPS1 mutations in colon cancer metastasis

Human CRC clinical tissues

The paraffin-embedded CRC specimens and the corresponding evaluation of the histomorphology were obtained from the Department of Pathology. Clinical parameters were obtained from the Department of Hepatobiliary Surgery. The characteristics of the patients are detailed in Table 1.

Single-cell exome sequencing and mutation analysis

Tissues (n = 9) from a 76-year-old male stage IV colon patient were sequenced with a mean fold depth of 108.6 and 99.2% coverage (Supplementary Table 1 and Supplementary Fig. 1). Single cells from the tissues were lysed, and whole-genome amplification of the DNA was performed with multiple displacement amplification (MDA). Genomic libraries were constructed from the amplified DNA and the coding regions were enriched using the Agilent SureSelect Human All Exon 50 M platform (Supplementary Table 2-3 and Supplementary Fig. 1). The exome libraries were sequenced on the Hiseq 2000 platform (Illumina, San Diego, CA, USA).

Classical SNV detection algorithms were not supported by the ADO parameter. In addition, there were some SNVs that did not achieve sufficient coverage depth for SNV calling. In this case, a Bayes method was used to take both ADO and sequence data into consideration to estimate the genotype at SNV sites, assuming a diploid genotype at each SNV site. After Bayes estimation, the mutant allele frequency across the single-cell population was compared with the frequency observed in the deep tissue sequencing. All SNVs were supported by at least 10 sequencing reads in at least one tissue and 3 single cells. This high correlation also indicated that SNV calling followed by Bayes estimation exhibited good performance with a low false discovery rate.

Heterogeneity analysis and comparison

To compare the evolutionary relationship and the heterogeneity of the single cells within and between the tumor tissues, the exact mutation status of each single cell was determined. The status of a mutation in a single cell was classified as mutant, wild type, or cannot determine (CND) due to insufficient coverage.

The similarity between every pair of single cells was carefully compared based on the mutation status. The similarity of two single cells was defined as follows. (a) We considered two single cells as the same (R = 1) only when all the mutation statuses were the same or at least one of them could not be determined. (b) If the two cells had different mutation statuses among all sites, the R value was −1. c) If the mutation status was wild type in one single cell and mutant in the other, this pair was deemed as different (R = -1). The similarity index of a tissue was the average R value of all the cell-cell combinations.

Mutational signature analysis

The mutational signatures of 150 single cells were analyzed using the Wellcome Trust Sanger Institute mutational signatures framework with the non-negative matrix factorization (NMF) algorithm.40 Mutational signature analysis was performed using a 4-step process: a, 96-class mutational matrix was constructed, encompassing the 6 mutation types (T > C, T > G, T > A, C > G, C > T and C > A), along with the 5’ context (A, G, C, T), and 3’ context (A, G, C, T), derived from the mutation data of all samples. b, the number of operative processes in 150 single cells was determined by analyzing signature stability and Frobenius reconstruction errors across K = 1 to K = 15 signatures. c, mutational signatures (Sig A and Sig B) across all samples were identified using the NMF algorithm, reflecting the number of active processes established in step b. d, cosine similarity was used for unsupervised hierarchical clustering of the two mutational signatures (Sig A and Sig B) identified in our series, comparing them to the 30 mutational signatures (Sig 1-30) previously characterized in a pan-cancer study.3

Subclonal structure analysis within 10 metastatic colon cancers

Mutations in 20 tissues (colon primary tumor and liver metastases tumor) from 10 colon cancer patients with liver metastasis were analyzed through whole exome sequencing. The method of whole exome sequencing and data analysis were performed as previous described.41 The data quality control and mutation of each sample are shown in Supplementary Table 4-5. Subclonal structure and phylogenetic trees were elucidated with the “pigeon-hole” principle within each patient.42 The branch lengths of the phylogenetic trees were determined from the proportion of all mutations within a cluster.

Single-cell RNA-seq analysis

We obtained the single-cell RNA sequencing (scRNA-seq) data of colorectal liver metastases (CRLM) samples from the GEO database (GSE178318).43 The dataset included the primary CRC and corresponding liver metastases samples of six CRLM patients. We used the Seurat R package (version 4.2.2)44 for standard downstream processing of the sc RNA-seq data of the CRLM patients. The analysis excluded the genes detected in fewer than three cells, and cells containing fewer than 200 detected genes. The mitochondrial content was limited to less than 20%. Then, the data was normalized using the LogNormalize method. Principal component analysis (PCA) was used for unsupervised clustering, and the “JackStraw” function was applied to identify and visualise the selection of principal components. We employed nonlinear dimensionality reduction UMAP for clustering and applied the “FindAllMarkers” function to identify marker genes based on inter-cluster differences. The cell clusters were subsequently annotated according to known cell-specific marker genes. The “limma” R package was used to identify DEGs between clusters, with thresholds set at |log2 fold change (FC)| >1 and P < 0.01. The “clusterProfiler” R package was employed to conduct GO and KEGG enrichment analyses of the DEGs, with a threshold of P < 0.05 indicating significant enrichment. Epithelial cells were identified epithelial cells (EPCs) leveraging the higher expression of EPCAM and utilizing the InferCNV package to detect CNVs in EPCs and to recognize real cancer cells. The CNV score for each cell was determined by calculating the quadratic sum of CNV regions.45 The similarity between the primary lesion of CRC and the corresponding liver metastases was calculated using SCMAP method,46 and the extent of similarity was determined by the “metastatic contribution score (MC score)”. Then, we used the R package “ggalluvial” to construct a Sankey diagram. Based on previous studies, we defined the EMT characteristics and assessed the “EMT score” of each cell using the “AddModuleScore” function. The relevance between “contribution degree” and “EMT score” was examined using linear regression analysis. Histograms were employed to illustrate the distribution of tumor cell subsets migrating from the primary site to the liver in different patients.

To explore the correlation between EMT scores and clinical characteristics at the single-cell level, we collected six fresh tumor samples from liver metastases for single-cell transcriptome sequencing. These samples were obtained from six patients diagnosed with colorectal cancer liver metastases at the Cancer Hospital of the Chinese Academy of Medical Sciences. The analysis workflow for these samples is the same as the one described above for single-cell data analysis.

Cell culture

The cell lines HCT116, SW480, 293T, HT29, SW620, and RKO cell lines were acquired from the Cell Resource Center, Peking Union Medical College (Beijing, China). HCT116, SW480, RKO, HT29, and SW620 were cultured in IMDM (Hyclone, SH30228.01, Logan, UT, USA), which was supplemented with 10% FBS (Gibco/Thermo Fisher Scientific, 10099141C, Waltham, MA, USA) and 1% penicillin-streptomycin (PS) (Gibco, SH30256.01). The cultures were maintained at 37 °C in an atmosphere containing 5% CO2. 293T cells were maintained in DMEM (Hyclone, SH30023.01), which was supplemented with 10% FBS and 1% PS, under the same conditions of temperature and CO2 concentration. Authentication of all cell lines was performed using STR profiling, and they were also tested for mycoplasma contamination.

Plasmids and stable cell line constructions

The full-length human TRPS1 gene (Accession: NM_014112.5) was cloned into a 3 × Flag expression pHBLV vector (GeneChem, Shanghai, China) to generate a TRPS1 overexpression plasmid. A TRPS1 mutant (R544Q) was also constructed in the same vector with the Mut Express MultiS Fast Mutagenesis Kit (Vazyme, C215, Nanjing, China). All plasmids underwent verification through Sanger sequencing (TSINGKE, Beijing, China). Subsequently, the overexpression plasmid (including the empty vector [Con], TRPS1 wild-type vector [WT], and TRPS1 mutant vector [MT]) was co-transfected with packaging vectors psPAX2 and pMD2.G into 293 T cells for lentivirus production. Virus harvests were conducted at 48 h and 72 h post-transfection. Cell lines were infected with the lentiviruses and selected using puromycin (1 µg/mL, Selleck Chemicals, S7417, Houston, TX, USA) for two weeks. The selected cells were examined on a western blot and maintained to perform experiments.

Western blotting

Cells were lysed using an SDS lysis buffer consisting of 50 mM Tris–HCl (pH 6.8), 10% glycerol, and 2% SDS, which was supplemented with a protease inhibitor cocktail (Roche, 11697498001, Basel, Switzerland). Protein quantification was performed using the BCA protein assay kit (Applygen, P1511, Beijing, China). Subsequently, the cell lysates were loaded onto an 8%-12% SDS–PAGE gel, fractionated, and then transferred to a PVDF membrane (Merck Millipore, IPVH00010, Burlington, MA, USA). The membranes were blocked with 5% (w/v) skim milk at room temperature for 1 hour and then incubated with primary antibodies overnight at 4 °C. Following this, the membranes were washed with TBST (3 × 10 min) and incubated with HRP-conjugated secondary antibodies (Zhongshan Golden Bridge Biotechnology, ZB2305, ZB2301, Beijing, China) for 1 hour at room temperature. Protein bands were visualized using an ECL blot detection system (Applygen, P1020). The primary antibodies employed included rabbit anti-TRPS1 (1:1000, NOVUSBIOLOGICALS, NBP2-04082, CO, USA), rabbit anti-EpCAM (1:1000, Proteintech, 21050-1-AP, Rosemont, IL, USA), rabbit anti-SPARC (1:1000, Proteintech, 15274-1-AP), mouse anti-CDH1 (1:1000, Cell Signaling, 14472, Danvers, MA, USA), rabbit anti-Flag (1:1000, Cell Signaling, 14793), rabbit anti-ZEB1 (1:1000, Cell Signaling, 3396), and mouse anti-GAPDH (1:3000, Proteintech, 60004-1-Ig).

In vitro migration and invasion assays

The wound healing assay was conducted by plating cells (1 × 106 HCT116 or SW480 per well) in a 6-well plate, with wounds created in a confluent monolayer using a pipette tip. Assessment was performed at 0 h and 36 h, with representative images captured at each time point. For the migration assay, 1 × 105 HCT116 or SW480 cells in serum-free IMDM were seeded into the upper chamber of uncoated transwell inserts (Corning, 3422, Corning, NY, USA) featuring 8 μm pores. The lower chamber was filled with 600 μL of complete IMDM containing 10% FBS as a chemo-attractant. After 24 hours, cells were fixed with methanol and stained with 0.1% crystal violet at room temperature. Similarly, for the invasion assay, 1 × 105 HCT116 or SW480 cells in serum-free media were placed in the top chamber of Matrigel-coated inserts (Corning, 354480), with medium containing 20% FBS added to the lower chamber. Cells were fixed and stained after 48 hours as described above. For both assays, the number of cells in five random fields was counted using an inverted microscope.

In vivo metastasis assays

For the assessment of lung metastasis via tail vein injection, female BALB/c nude mice (aged 4–5 weeks) and NSG mice (aged 5–6 weeks) were obtained from SPF Biotechnology Company (Beijing). To analyze metastasis, 5 × 106 viable HCT116 or SW480 cells were resuspended in 0.2 mL of PBS and administered into BALB/c or NSG mice through the lateral tail vein. The survival of the mice was monitored daily. Following a period of 42 days for HCT116 or 90 days for SW480, mice were euthanized. The lungs, livers, and other metastatic sites were extracted, photographed, and fixed in 4% paraformaldehyde (PFA). The tissues were then embedded in paraffin, sectioned, and stained with hematoxylin and eosin (H&E). The foci of visible lung and liver metastases were measured and counted under a microscope. Additionally, HCT116 cells tagged with luciferase were constructed. A total of 1 × 106 cells resuspended in 0.2 mL of PBS were injected into the lateral tail vein of nude mice. After 21 days, mice were administered D-luciferin (150 mg/kg) intraperitoneally and imaged 15 minutes post-injection using an IVIS 100 Imaging System, with an acquisition time of 1 minute.

A mouse model of liver metastases was established through intrasplenic injection of HCT116 and SW480 cells. Briefly, the spleens of the mice were exposed via an incision on the left upper abdomen. For each mouse, 2 × 106 HCT116 cells or 5 × 106 SW480 cells were injected into the spleen in 50 μL of PBS, allowing the cells to be delivered to the liver through the portal circulation. Ten minutes post-injection, the spleen was removed and the abdominal cavity was closed. Thirty-two days after implantation, all mice were euthanized, and the number of tumor foci on the liver surface was counted. Liver tissues were fixed and stained with H&E to detect liver metastases.

Nuclear/Cytosol fractionation

Fractionation of cellular nuclear and cytoplasmic proteins was conducted utilizing the NE-PER Nuclear and Cytoplasmic Extraction Reagents (Thermo Fisher Scientific, 78833), adhering to the manufacturer’s guidelines. In summary, cells were collected and resuspended in CER I, followed by incubation on ice for 10 minutes. Subsequently, CER II was introduced to the lysates, which were then vortexed and maintained on ice for an additional 10 minutes. The cell lysates underwent centrifugation for 5 minutes at 16,000 g, and the supernatant was retained as the cytoplasmic extract. The insoluble pellets were resuspended in ice-cold NER and vortexed at intervals of 10 minutes over a total duration of 40 minutes. The lysates were centrifuged at 4 °C for 10 minutes at 16,000 g, and the supernatant was collected as the nuclear fraction. The cytoplasmic and nuclear extracts were subjected to western blotting analysis. GAPDH and H3 served as loading controls.

SiRNA transfection

HCT116 or SW480 cells were plated in 6-well plates at a density of 3 × 105 cells per well. Following an incubation period of 16 to 18 hours, the pooled siRNAs targeting TRPS1 and ZEB1 were transfected into the cells using Lipofectamine RNAiMAX (Invitrogen, 13778150, CA, USA), with a final siRNA concentration of 50 nM. Simultaneously, transfection with corresponding nonsense negative controls (NC) was performed. After 48 hours, RNA and protein levels were assessed to quantify the downregulation of gene expression. All siRNAs were synthesized by Shanghai GenePharma Co., Ltd. The siRNA sequences utilized are provided in Supplementary Table 7.

RNA isolation and quantitative real -time PCR (qRT-PCR)

RNA extraction from the cells was carried out using the RNA simple Total RNA Kit (TIANGEN, 4992858), adhering to the manufacturer’s protocol. Subsequently, a total of 1 μg of RNA was reverse transcribed into cDNA using the SMART Scribe Reverse Transcriptase (TaKaRa, 639536, Tokyo, Japan). To detect differential expression of ZEB1, SPARC, CDH1, and EpCAM in the cDNAs, qPCR was employed with TB Green® Fast qPCR Mix (TaKaRa, RR430). The amplification process was conducted on a 7500 Fast Real-time PCR System (Applied Biosystems, Waltham, MA, USA). GAPDH served as an internal control for normalizing RNA input. The relative expression levels of the target genes, defined as the fold change, were calculated in comparison to the gene expression of control groups using the delta delta Ct (ΔΔCt) formula. The primer sequences utilized are provided in Supplementary Table 7.

RNA sequencing (RNA-seq) and data analysis

RNA extraction from HCT116 and SW480 cells was carried out using TRIzol reagent (Invitrogen, 15596018). The preparation of RNA-seq libraries was performed using the TruSeq Ribozero Kit (Illumina, RS-122-2103). These libraries were then sequenced on the Illumina NovaSeq platform. The raw RNA-seq reads were trimmed using Skewer (version 0.2.2) to eliminate adapter sequences and subsequently aligned to the human hg19 reference genome using STAR (version 2.4.2a). RSEM (version 1.2.29) was employed to quantify expression abundance. Differentially expressed genes (DEGs) were identified based on a |fold change| of >=2 and an adjusted P-value of <0.05, using edgeR. Functional enrichment analysis of genes differentially enriched in experimental groups was conducted using the Metascape database (accessible at https://metascape.org/gp/index.html). The data have been deposited in the Gene Expression Omnibus database under accession number GSE206729.

CUT & Tag assay

The CUT & Tag assay was carried out utilizing the NovoNGS® CUT & Tag 3.0 High-Sensitivity Kit (NOVOPROTEIN, N259-YH01-01A, Suzhou, China), following the manufacturer’s protocols. Briefly, cells were incubated with activated NovoNGS ConA Beads. Subsequently, the bead-bound cells were permeabilized and incubated initially with Flag primary antibody (1:50, Cell Signaling, 14793), followed by incubation with anti-rabbit antibody (10 µg/mL; Abcam, ab6702, Cambridge, MA, USA). The diluted pAG-Tn5 adapter complex was then added, followed by the tagmentation reaction. The extracted DNA fragments were used for library preparation. The libraries were sequenced by Novogene on the Illumina NovaSeq 6000 platform. The raw DNA sequencing reads were preprocessed to filter out sequencing adapters, short-fragment reads, and other low-quality reads. Bowtie (version 0.12.8) was employed to map the cleaned reads to the human hg19 reference genome. Peak detection was conducted using MACS with a minimum P-value cut-off of 0.00001 (version 1.4.2, available at https://pypi.python.org/pypi/MACS/1.4.2). The peak recognition, overlap, subtraction, merge, and feature annotation analysis of enriched regions were performed using the Hypergeometric Optimization of Motif EnRichment 2 (HOMER2) suite (version 3.0, accessible at http://homer.ucsd.edu/homer/). Motif-density histograms were generated using HOMER for target regions and promoters, which were defined as −2 kb to +1 kb relative to the transcription start site (TSS). The data were deposited in the Gene Expression Omnibus database under accession number GSE206564.

Chromatin immunoprecipitation (ChIP) and ChIP-qPCR

Chromatin immunoprecipitation (ChIP) assays were conducted utilizing the Pierce Agarose ChIP Kit (Pierce/Thermo Fisher Scientific, 26156), adhering to the provided instructions. Briefly, HCT116 and SW480 cells were subjected to crosslinking with 1% formaldehyde (Sigma, F8775, St. Louis, MO, USA) for a duration of 10 minutes, following which the reaction was terminated with 0.125 M glycine. The cell pellets were lysed using micrococcal nuclease on ice for 10 minutes. Subsequently, rabbit anti-Flag antibody (5 μg per sample) or isotype IgG control (5 μg per sample) was added to 5 μg of sonicated chromatin and incubated at 4 °C overnight. The complexes were then captured on Protein A/G magnetic beads. The immunoprecipitates were washed and subjected to de-crosslinking. The eluted chromatin was purified and utilized as a template in qPCR, which was performed on a 7500 Fast Real-time PCR System (Applied Biosystems, USA). The primers employed for ChIP assays are enumerated in Supplementary Table 7.

Dual-luciferase reporter assay

The promoter sequence of ZEB1, specifically the fragment spanning −2079 to +200, along with its three truncated derivatives (fragment 1: −2079 to ion status was wild type in one single cell 1079; fragment 2: −1080 to +200; and fragment 3: −200 to +200), were cloned into the pGL3-Basic plasmid. Subsequently, 293 T cells were seeded at a density of 2.5 × 105 cells per well into 12-well plates and cultured until they reached approximately 70% confluence. The cells were then co-transfected with 200 ng/well of the pGL3-Basic plasmid harboring the specific promoter sequence, 500 ng/well of the PLBMV-TRPS1 WT/MT plasmid, and 10 ng/well of the pRL-TK plasmid. After 48 hours, the cells were assayed for fluorescence intensity utilizing the Dual-Luciferase Reporter Assay System (Promega, E1910, Madison, WI). Prior to luminescence detection, the cells were washed with PBS and lysed using 250 μL of PLB per well, incubated for 15 minutes at room temperature. A 20 μL aliquot of the lysate was transferred to a 96-well plate (Corning, 3917, Corning, NY) for luminescence measurement. The outcomes are presented as the relative luciferase activity, calculated as the ratio of firefly luciferase intensity to renilla luciferase intensity.

Immunohistochemistry (IHC) analysis

Formalin-fixed, paraffin-embedded tissue sections underwent deparaffinization and rehydration through a graded series of ethanol solutions, followed by immersion in PBS buffer. Antigen retrieval was achieved by heating the slides for 10 minutes at 97 °C in sodium citrate buffer, subsequently followed by incubation in 0.3% H2O2 for 10 minutes to inhibit endogenous peroxidase activity. To block nonspecific binding sites, the sections were incubated with 10% normal goat serum for 30 minutes. Overnight incubation at 4 °C was performed with the primary antibody (rabbit anti-ZEB1, diluted 1:100, Cell Signaling, 3396). Detection was carried out using the LSAB+ System-HRP (Dako/Agilent, Santa Clara, CA, USA). The staining intensity was categorized as 0 (negative), 1 (weak), 2 (medium), or 3 (strong). H scores were determined by multiplying the staining intensity by the percentage of positive cells, yielding a range of overall scores from 0 to 300. The histological specimens were evaluated by a trained pathologist with expertise in CRC disease.

Statistical analysis

Data are presented as the mean ± SEM. For unpaired group comparisons, the two-tailed unpaired Student’s t-test was utilized. Multiple comparisons were analyzed using one-way ANOVA. Contingency tables were assessed with the Fisher’s exact test. The comparison of Kaplan-Meier overall survival curves was conducted using the log-rank (Mantel-Cox) test. Statistical significance was set at P-values < 0.05. Significant differences are denoted as follows: * for P < 0.05, ** for P < 0.01, *** for P < 0.001, and **** for P < 0.0001. All statistical analyses were performed using GraphPad Prism 8.0 (San Diego, CA, USA) or SPSS, version 19.0 software (released in 2010 by IBM Corp., Armonk, NY, USA).

留言 (0)

沒有登入
gif