Circular RNA landscape in extracellular vesicles from human biofluids

EV collection and RNA-sequencing

Extracellular vesicles (EVs) were isolated from a comprehensive collection of human biofluids and supernatants from five cell lines. Detailed information on the biofluids is provided in Additional file 1: Table S1. In summary, the dataset consists of four types of human biofluids: plasma samples (n = 1031), urine samples (n = 28), bile samples (n = 19), and cerebrospinal fluid samples (n = 4). Specifically, for plasma samples, the dataset includes 707 samples from 11 disease types: breast cancer (n = 141), colon cancer (n = 74), hepatocellular carcinoma (n = 92), pancreatic adenocarcinoma (n = 150), ovarian cancer (n = 51), gastric cancer (n = 9), kidney cancer (n = 15), malignant lymphoma (n = 42), non-small cell lung cancer (n = 55), small cell lung cancer (n = 67), and coronary heart disease (n = 11). Additionally, 172 samples were collected from patients with benign conditions (including benign breast (n = 19), ovarian (n = 28), pancreatic (n = 49), colorectal (n = 43), and hepatic (n = 33) conditions), and 152 samples from healthy individuals. The samples were collected following standardized protocols and were approved by the Institutional Review Board of Fudan University Shanghai Cancer Center.

The isolation process commenced with the initial centrifugation of the biofluids at 3000 g for 10 min at room temperature (25 °C), followed by a further purification step at 13,000 g for 15 min at 4 °C. Subsequently, the supernatants were treated with a binding buffer and transferred onto an exoRNeasy affinity membrane centrifuge column (Qiagen, Hilden, Germany). After the EVs were bound to the membrane, they were cleaved and purified with QIAzol (Qiagen, Hilden, Germany). Finally, all EV-RNAs were separated and purified with the miRNeasy (RNeasy MinElute spin column).

After the extraction of EV RNAs, any adulterated DNAs were removed by deoxyribonuclease I (DNase I, NEB, Ipswich, Massachusetts, USA). Then, the strand-specific RNA-seq library was prepared using SMARTer Stranded Total RNA-Seq Kit—Pico Input Mammalian (Clontech, USA). The purified double-stranded DNA was amplified by 13–16 cycles of PCR. Finally, the quality of the library was assessed using Qubit (Thermo Fisher Scientific, USA) and Qsep100 (BiOptic Inc.), and sequencing was performed on the Illumina sequencing platform (San Diego, California, USA) using 150-bp paired-end sequencing.

Detailed methodologies for the isolation of EV RNAs, characterization of the isolated EVs, and preparation of the EV RNA-seq library were provided in the supplementary materials (Additional file 2) and can also be referenced in our previously published work [14].

Identification of EV-circRNAs in EVs

After the raw data underwent quantity control, EV-circRNAs were identified using two software, CIRI2 (V2.0.6) [15] and ASJA [16]. For the CIRI2, the sequences were aligned to hg38 using BWA-MEM (V0.7.17) with default settings, and then EV-circRNAs were identified using CIRI2.pl with the GENCODE v29 annotation file. Information regarding the expression count, host gene, circRNA types, and genome origin was obtained. ASJA also performed with default setting. Briefly, the chimeric genome was aligned using STAR (V2.5.3a) [17] in a two-pass mapping approach to GRCh38. The alignment was performed with the parameters "–chimOutType WithinBAM" and "–chimSegmentMin 20" to detect chimeric alignments and generate a Chimeric.out.junction file. To identify back-splicing junctions, we utilized the ASJA-All.pl script. Additionally, ASJA-All.pl allowed us to retrieve exon length and number information.

To ensure the reliability of the identified EV-circRNAs, candidate EV-circRNAs needed to be detected by both CIRI2 and ASJA identification methods. Furthermore, for inclusion in our analysis, EV-circRNAs had to be present in at least three plasma samples or detected at least once in other human biofluids, including urine, CSF, or bile. The expression levels of these EV-circRNAs were normalized to counts per million (CPM) [5], using the counts obtained from CIRI2. To ensure consistent levels of EV-circRNA expression across samples within each cohort, we introduced a filtering step to remove samples exhibiting a Pearson correlation coefficient lower than 0.4 among their EV-circRNA expression profiles in each cohort. In total, we identified 136,327 EV-circRNAs from 1082 EV samples.

Back-splicing junction ratio of EV-circRNAs

The back-splicing junction ratio of EV-circRNA can be calculated using the formula (2 * #junction_reads / (2 * #junction_reads + #non_junction_reads)), as described in the CIRI2 manual. To compare with cellular circRNA in tissue, we downloaded the raw data from five studies [18,19,20,21] (GSE144269, GSE77661, GSE142279, GSE172032, GSE221240), encompassing a total of 196 samples. The tissue cohort includes normal tissue, cancer tissue, and matched adjacent tissue (Additional file 1: Table S3). Only the circRNAs detected by both CIRI2 and ASJA methods, and expressed in at least two samples were considered reliable. Their back-splicing junctions were also calculated using the same method as described above. To compare the circRNA back-splicing ratio between different groups, we calculated the median value of the circRNA back-splicing ratio in each group as a representative measure (related to Figs. 2, 3, 4, and 5).

Gene ontology and KEGG enrichment

The function of host gene of EV-circRNA was enriched by DAVID [22], including biological process (BP), cellular component (CC), and Kyoto Encyclopedia of Genes and Genomes (KEGG).

Differential expression EV-circRNAs

A strategy akin to that employed in prior studies was utilized to estimate the specific expression of EV-circRNAs. Specifically, EV-circRNA with frequency greater than 30% in only one cancer type were defined as cancer-specific, while those with frequency greater than 30% in more than two cancer types were defined as pan-cancer specific. Note, that the frequency of these EV-circRNAs in plasma healthy samples was less than 10%.

To identify reliable differential expression of EV-circRNA between cancer patients and healthy samples, we performed a Wilcoxon rank-sum test and fold change analysis using EV-circRNA with an average expression greater than 0.1. EV-circRNA with a fold change of 1.5-fold (FC ≥ 1.5 or FC ≤ 0.67), a P-value less than 0.01, and a frequency greater than 5% were defined as differential EV-circRNA.

For the specific analysis of biomarkers related to NSCLC immunotherapy response, clinical response data were obtained from our recent study [23]. Out of the 55 samples in the NSCLC cohort, 36 samples had available clinical information on immunotherapy response. These 36 samples were divided into two groups: 16 classified as responders and 20 as non-responders. This subset of samples, referred to as the "NSCLC immunotherapy group," was used for the biomarker analysis. EV-circRNAs with an average expression greater than 0.1 in the immunotherapy group were selected for downstream analysis. Differentially expressed EV-circRNAs were identified using the Wilcoxon test and fold change (median). For PFS and OS survival analysis, the optimal cutoff was determined using the surv_cutpoint function, and the Kaplan–Meier analysis was performed using the survfit and surv functions with the log-rank test. We identified 20 candidate EV-circRNAs as biomarkers for NSCLC immunotherapy based on the following criteria: First, EV-circRNAs were filtered by differential expression analysis with a p-value less than 0.05, higher expression in the response group, and an AUC (calculated by pROC package) greater than 0.65, which is higher than the AUC of the clinical biomarker NLR (0.622). Among these filtered EV-circRNAs, we then selected those meeting at least one of the following conditions: (1) OS analysis with a p-value less than 0.05 and expression in more than 10 responders (10/16); (2) PFS analysis with a p-value less than 0.05 and expression in more than 10 responders (10/16); (3) Exclusive expression in the response group, with expression in more than 6 samples. For training the Lasso model, we included all 36 samples from the NSCLC immunotherapy group. We first obtained all pairwise combinations of the 20 candidate EV-circRNAs identified above. For each pair of EV-circRNAs, we used the cv.glmnet function (nfolds = 10, family = "binomial") to determine the best lambda for the Lasso model. The glmnet function, with “binomial” and best lambda parameters, was then used to establish the model. AUC was used to assess predictive performance. For internal validation of the model, we randomly selected 60% of the samples (22/36) from the NSCLC immunotherapy group 50 times and calculated the AUC for each iteration. Additionally, we collected another cohort for external validation, consisting of 17 responders and 19 non-responders, to validate the model’s performance using AUC. qPCR was performed on 57 samples (32 non-responders and 25 responders) to validate these findings. Single EV-circRNA and Lasso models were also established based on the qPCR results of EV-circRNAs.

EV-circRNAs as prognosis biomarkers

For the prognosis biomarker analysis, we retained EV-circRNA with an average expression greater than 0.1 and frequency greater than 10%. Patients were divided into two groups based on median expression, and the survfit and surv functions were used to compare survival analysis. The coxph function was used to obtain the risk and protective factors. Potential biomarkers were defined as EV-circRNAs that expressed and had a p-value of less than 0.05 for at least ten patients in the high-expression group.

Motif of EV-circRNAs

The top 1% (1363) most highly frequency expressed EV-circRNAs were analyzed by retrieving their genome sequences using samtools faidx. We incorporated 119 motifs from previous studies by Ray et al. [24] and Fabbiano et al. [25], which were then converted to the strand input format of MEME using iupac2meme. The resulting sequences were analyzed using online MEME SEA (online version 5.5.3) [26] with default settings.

RBP regulated EV-circRNAs into EVs

To further validate the binding between RBPs and EV-circRNA, we obtained the bed file for RBP CLIP-seq data from ENCODE (K562) [27] and GEO (GSE133620 [28] and GSE39682 [29]). Additionally, we downloaded a list of proteins expressed in EVs from the study by Kugeratski et al. [30]. We utilized RBP CLIP-seq data only when its corresponding protein was expressed in EVs. The binding site was defined as the overlap between EV-circRNA and RBP CLIP-seq data using bedtools intersect (V2.30.0) [31].

To compare with tissue-derived circRNA, we randomly selected 1363 cellular circRNAs in a tissue cohort with similar exon length and number distribution to EV-circRNA, and this step was repeated 100 times. Additionally, we conducted an analysis comparing the top 1% of EV-circRNAs with a subset of highly expressed tissue circRNAs. This subset consists of the top 1000 circRNAs, ranked by their expression levels (CPM) and occurrence frequency. The number of binding sites between EV-circRNAs and tissue cellular circRNAs was calculated using the paired Wilcoxon test.

To annotate the position of RBP binding on EV-circRNA, we normalized the binding site positions from 0 to 1, based on the genome position of the corresponding EV-circRNA.

Brain-derived circRNA enriched in plasma EVs

To investigate the origin of EV-circRNAs in healthy plasma EVs, we downloaded the human normal tissue gene expression matrix from GTEx [32] (GTEx_Analysis_2017-06-05_v8_RNASeQCv1.1.9_gene_tpm.gct.gz) to obtain tissue-specific genes. We calculated the row mean of each tissue and filtered out those with values less than 0.1 for downstream analysis. Using the same method as our previous study [14], we calculated the Ti-score and retained only the tissue-specific genes with a Ti-score greater than 1 and a relative frequency greater than twice that of the second-most frequent tissue. We removed whole blood and testis, and merged brain, nerve, and pituitary as one category “Brain”. The EV-circRNA was classified as originating from the tissue if its host gene is a tissue-specific gene. To obtain more accurate tissue-specific EV-circRNAs, we queried each EV-circRNA in CircAtlas3.0 [33] to confirm its tissue specificity. The final results were determined by selecting circRNAs that maintained consistent tissue specificity according to CircAtlas3.0. To calculate the Brain-derived circRNAs score, we first obtained the average expression of brain-derived circRNAs excluding zero values. Subsequently, we transformed this average expression to the log2 scale.

To compare plasma EV-circRNAs with other cellular components in blood, we obtained cell-type markers from the TISCH database [34]. The EV-circRNA is classified as originating from the blood cell type if its host gene is a cell type-specific gene.

Cell culture and treatments

HuH7 cells were obtained from the Japanese Collection of Research Bioresources (JCRB, Tokyo, Japan). The HepG2, Hela, SK-Hep-1, and HEK-293 T cell lines were purchased from American Type Culture Collection (ATCC, Manassas, Virginia, USA) and cultured in DMEM supplemented with 10% FBS and Penicillin–Streptomycin at 37℃ with 5% CO2.

RT-qPCR

RNA was extracted using Trizol reagent (Life Technologies, Carlsbad, CA, USA), and the cDNA was synthesized using Evo M-MLV RT Master Mix (Accurate Biology, Hunan, China). SYBR Premix Ex Taq (Accurate Biology, Hunan, China) was used to measure the expression level of gene expression. The 2^ − ΔΔCt method was used to calculate each RNA’s relative expression level. The primers are provided in Additional file 1: Table S11.

RNA immunoprecipitation (RIP) assay

The cells were lysed with RIP lysis buffer (150 mM NaCl, 25 mM Tris–HCl (pH 7.4), 0.1% NP-40, 1 mM EDTA, 5% glycerol) at 4 °C for 1 h. After incubating the anti-FLAG antibody (Sigma-Aldrich, St. Louis, MO, USA) and immunoglobulin G antibody (Sigma-Aldrich, St. Louis, MO, USA) with protein G beads (Life Technologies, Carlsbad, CA, USA) for 1 h at room temperature. The supernatants were collected after centrifugation at 12,000 × g for 20 min. The antibody-bound beads were added to the supernatants and incubated overnight at 4 °C after being washed three times with NT2 buffer (150 mM NaCl, 1 mM MgCl2, 50 mM Tris–HCl (pH 7.4), 0.05% NP-40). The complexes were then lysed with Trizol reagent after the beads had been washed six times with NT2 buffer. Reverse transcribed FLAG-interacting RNAs were isolated. RNA binding to YBX1 was detected by qPCR. The normalization process uses a 1% input sample as a reference. Specifically, the calculated value is derived from 2^ − ΔCT, with ΔCT values determined using the formula: ΔCT = Ct(Sample) − Ct(1% input).

Plasmid construction

The coding sequence of YBX1 was amplified and then cloned into the laboratory modification plasmid pCDH-3*Flag. In order to confirm YBX1 regulation circRNA into EVs, we used CRISPR/Cas9 technology to knockdown YBX1. sgRNA of YBX1 (Additional file 1: Table S11) was cloned into lentiGuide-Puro vector (Addgene #52,963).

Lentiviral production and transduction

HEK-293 T cells were transfected with pCDH-3*Flag-YBX1, lentiGuide-puro-gRNA, and the packaging and envelope plasmids psPAX2 and pMD2.G. Virus particles were collected 48 h after transfection. SK-Hep-1 cells were infected with lentiGuide-puro-gRNA lentivirus and polybrene (Sigma-Aldrich, St.Louis, MO, USA).

Statistical analyses

All experimental data were expressed as mean ± standard deviation (SD). P-values were obtained by using two-tailed Student’s t-test as indicated in corresponding figure legends. A p-value less than 0.05 was considered statistically significant and was noted by asterisks (*, p < 0.05; **, p < 0.01; ***, p < 0.001, ****, p < 0.0001).

留言 (0)

沒有登入
gif