Isoform-resolved mRNA profiling of ribosome load defines interplay of HIF and mTOR dysregulation in kidney cancer

Overview of the cell line and experimental conditions

VHL-defective kidney cancer cell lines, RCC4 and 786-O, were from Cell Services at the Francis Crick Institute and were maintained in DMEM (high glucose, GlutaMAX Supplement, HEPES, Thermo Fisher Scientific, no. 32430100) with 1 mM sodium pyruvate (Thermo Fisher Scientific, 12539059) and 10% FBS at 37 ˚C in 5% CO2. Cells were confirmed to be of the correct identity by STR profiling and to be free from mycoplasma contamination.

Hypoxic incubation was performed using an InvivO2 workstation (Baker Ruskinn) in 1% O2 and 5% CO2 for 24 hours. To inhibit mTOR, cells were treated with 250 nM of Torin 1 (Cell Signaling Technology, no. 14379) for 2 hours.

An overview of the experimental interventions and analyses is provided in Supplementary Data 1. Biological replicates are individual experiments using different clones derived from the same cell line. All other replicates are defined as technical replicates.

Genetic modification of cellsLentiviral transduction

Reintroduction of VHL or the empty vector control was performed using lentiviral transduction. The expression vector (pRRL-hPGK promoter-VHL-IRES-BSD) containing the coding sequence and the last 6 nucleotides of the 5′ UTR of VHL (RefSeq ID, NM_000551) and the empty control vector (pRRL-SFFV promoter-MCS-IRES-BSD) were constructed from pRRL-SFFV promoter-MCS-IRES-GFP (provided by K. R. Kranc, Queen Mary University of London). Lentiviruses were prepared from these plasmids, and RCC4 or 786-O cells were transduced with the viruses. Three or four clones each of VHL- or empty-vector-transduced RCC4 or 786-O cells were isolated using flow cytometry. These cells were maintained in DMEM (high glucose, GlutaMAX Supplement, HEPES) with 1 mM sodium pyruvate, 10% FBS and 5 µg/mL blasticidin (Thermo Fisher Scientific, A1113903) at 37 ˚C in 5% CO2. Empty-vector-transduced RCC4 or 786-O cells are referred as RCC4 or 786-O, and VHL-transduced RCC4 or 786-O cells are referred as RCC4 VHL or 786-O VHL.

CRISPR–Cas9-mediated HIF1B or EIF4E2 inactivation of 786-O cells

CRISPR–Cas9-mediated inactivation of HIF1B or EIF4E2 was performed using the electroporation of gRNA–Cas9 ribonucleoprotein (RNP). CRISPR RNAs (crRNAs) with the following sequences were synthesized by Integrated DNA Technologies (Alt-R CRISPR–Cas9 crRNA):

HIF1B, rGrArCrArUrCrArGrArUrGrUrArCrCrArUrCrArC

EIF4E2 (g1), rGrUrUrUrGrArArArGrArUrGrArUrGrArCrArGrU

EIF4E2 (g2), rGrGrUrCrCrCrCrArGrGrArCrGrUrArCrCrArUrG.

The HIF1B and EIF4E2 (g2) gRNA sequences were designed by Integrated DNA Technologies (Hs.Cas9.ARNT.1.AD and Hs.Cas9.EIF4E2.1.AH, respectively), whereas the EIF4E2 (g1) gRNA sequence was designed using an online tool developed by F. Zhang’s lab (https://crispr.mit.edu).

To prepare the gRNA, 100 µM of crRNA and 100 µM of tracrRNA (Integrated DNA Technologies, no. 14899756) were annealed in duplex buffer (Integrated DNA Technologies, 11-01-03-01) by incubation at 95 ˚C for 5 minutes, then at room temperature for 30 minutes. Cas9–gRNA RNP was formed by mixing 10 µM of the annealed tracrRNA–crRNA and 16.5 µg of TrueCut Cas9 protein (Thermo Fisher Scientific, A36498) in PBS, followed by incubation at room temperature for 30 minutes. The RNP was transfected into 786-O cells or 786-O VHL cells (pools of cells were used for HIF1B inactivation, whereas clone 1 of each sub-line was used for EIF4E2 inactivation). Transfections were performed using a 4D-Nucleofector System (Lonza) with a SF Cell Line 4D-Nucleofector X Kit L (Lonza, V4XC-2024) and the EW-113 transfection program. The transfected cells were cultured in DMEM (high glucose, GlutaMAX Supplement, HEPES) with 1 mM sodium pyruvate and 10% FBS at 37 ˚C in 5% CO2 for at least 3 days, and single clones were isolated using flow cytometry. Inactivation of the target genes was confirmed by Sanger sequencing of the gRNA target region using TIDE analysis78 and by immunoblotting.

ImmunoblottingProtein extraction

Cells were grown on 6-cm dishes. Cells were washed with 3 mL of ice-cold PBS and lysed by adding 150 µL of urea SDS lysis buffer (10 mM Tris-HCl pH 7.5, 6.7 M urea, 5 mM DTT, 10% glycerol, 1% SDS, 1× HALT protease and phosphatase inhibitor (Thermo Fisher Scientific, 78447), and 1/150 (v/v) of benzonase (Sigma-Aldrich, E1014-25KU)). The lysate was incubated at room temperature for 30 minutes before mixing with loading buffer (LI-COR Biosciences, 928-40004).

Immunoblotting

Proteins were separated by SDS–PAGE using a Mini-PROTEAN TGX Gel (4–15% or 8–16%, Bio-Rad Laboratories, nos. 4561086 and 4561106, respectively) and transferred to Immobilon-FL PVDF Membrane (Sigma-Aldrich, IPFL00010). Membranes were stained using a Revert 700 Total Protein Stain (LI-COR Biosciences, 926-11011). The data acquisition was performed using an Odyssey CLx system (LI-COR Biosciences), and the data were analyzed using Image Studio software (LI-COR Biosciences). The membrane was blocked by incubating in TBS (20 mM Tris-HCl pH 7.6 and 137 mM NaCl) with 5% fat-free milk for 1 hour with shaking at room temperature. The membrane was incubated in Odyssey Blocking Buffer (PBS, LI-COR Biosciences, no. 927-40000) with 0.2% Tween-20 and 1/1,000 (vol/vol) primary antibody (for anti-HIF2A antibody) or TBST (TBS with 0.1% Tween-20) with 5% fat-free milk and 1/1,000 (vol/vol) primary antibody (for other primary antibodies), with shaking overnight at 4 ˚C. The membrane was washed three times with TBST and incubated in Odyssey Blocking Buffer (PBS for anti-HIF2A antibody and TBS (LI-COR Biosciences, no. 927-50000) for other primary antibodies) with 0.2 % Tween-20, 0.01% SDS, and 1/15,000 (vol/vol) secondary antibody, with shaking for 1 hour at room temperature. The membrane was washed three times with TBST and once with TBS.

Antibodies

The following antibodies were used for the western blotting analysis. Primary antibodies (used at 1/1,000 dilution): anti-VHL (Santa Cruz Biotechnology, sc-135657), anti-HIF1A (BD Biosciences, 610959), anti-HIF2A (Cell Signaling Technology, 7096), anti-HIF1B (Cell Signaling Technology, 5537), anti-EIF4E2 (Proteintech, 12227-1-AP), anti-NDRG1 (Cell Signaling Technology, 9485), anti-SLC2A1 (Cell Signaling Technology, 12939), anti-EGFR (Santa Cruz Biotechnology, sc-373746), and anti-CA9 (Cell Signaling Technology, 5649). Secondary antibodies (used at 1/15,000 dilution): anti-mouse-IgG DyLight 800 (Cell Signaling Technology, 5257) anti-mouse-IgG IRDye 680RD (LI-COR Biosciences, 925-68072), and anti-Rabbit-IgG IRDye 800CW (LI-COR Biosciences, 926-32213).

Total RNA extraction

Cells were grown on 6-well plates or 6-cm dishes. Total RNA used for the analysis of unfractionated mRNAs was extracted from the cells using the RNeasy Plus Mini Kit (QIAGEN, 74136), according to the manufacturer’s instructions, except for technical replicate 2 of the samples from RCC4 cells (see Supplementary Data 1). For these samples, cells were lysed with 350 µL of Buffer RLT Plus (QIAGEN, 1053393), and total RNA was extracted from the lysate using an RNA clean and concentrator-25 kit (Zymo Research, R1018) with the following modification: 752.5 µL of preconditioned RNAbinding buffer (367.5 µL of RNA binding buffer (supplied with an RNA Clean & Concentrator-25 kit), 367.5 µL of absolute ethanol, and 17.5 µL of 20% SDS) was added to the cell lysate. After mixing, the material was loaded onto the column of an RNA Clean & Concentrator-25 kit, and the manufacturer’s instructions were followed for the remaining steps.

HP5 protocol (polysome profiling)Sucrose gradient preparation

Sucrose gradients were prepared in polyallomer tubes (Beckman Coulter, 326819) by layering 2.25 mL 50% sucrose in 1× polysome gradient buffer (10 mM HEPES pH 7.5, 110 mM potassium acetate, 20 mM magnesium acetate, 100 mM DTT, 40 U/mL RNasin plus (Promega, N2615), 20 U/mL SuperaseIn RNase Inhibitor (Thermo Fisher Scientific, AM2694) and 100 µg/mL cycloheximide (Sigma-Aldrich, C4859-1ML)) under 2.15 mL of 17% sucrose in 1× polysome gradient buffer. Each tube was sealed with parafilm, placed on its side, and kept in the horizontal position at 4 ˚C overnight to form the gradient79.

Cell lysis and fractionation

Cells were grown on 15-cm dishes. To arrest mRNA translation, the cells (~80% confluency) were treated with 100 µg/mL cycloheximide for 3 minutes. The medium was removed, and the dish was placed on ice during the following steps. Cells were washed with 10 mL of ice-cold PBS with 100 µg/mL cycloheximide. Cells were then lysed by adding 800 µL of polysome lysis buffer (10 mM HEPES pH 7.5, 110 mM potassium acetate, 20 mM magnesium acetate, 100 mM potassium chloride, 10 mM magnesium chloride, 1% Triton X-100, 2 mM DTT, 40 U/mL RNase plus, 20 U/mL SuperaseIn RNase Inhibitor, 1× HALT Protease inhibitor (Thermo Fisher Scientific, 78438), and 100 µg/mL cycloheximide).

The cytoplasmic lysate was homogenized by passage through a 25-G syringe needle 5 times. To remove debris, the lysate was centrifuged at 1,200g for 10 minutes at 4 ˚C, and the supernatant was collected. This material was centrifuged again at 1,500g for 10 minutes at 4 ˚C, and the supernatant was collected. The protein and RNA concentrations were measured using 660-nm Protein Assay Reagent (Thermo Fisher Scientific, 22660) with Ionic Detergent Compatibility Reagent (Thermo Fisher Scientific, 22663) and Qubit RNA BR Assay Kit (Thermo Fisher Scientific, Q10210), respectively.

Lysate was then normalized according to the protein concentration, and 500 µL of the normalized lysate was overlaid on the sucrose gradient, as prepared above. The gradient was ultracentrifuged at 287,980g (average; 55,000 r.p.m.) for 55 minutes at 4 ˚C, with max acceleration and slow deceleration using an Optima LE-80K Ultracentrifuge and SW55Ti rotor (Beckman Coulter). The sucrose gradient was fractionated according to the number of associated ribosomes (from 1 to 8 ribosomes; material lower in the gradient was pooled with the 8 ribosome fraction), as determined by the profile of the absorbance at 254 nm using a Density Gradient Fractionation System (Brandel, Model BR-188). The fractionated samples were then snap-frozen on dry ice.

External control RNA addition and RNA extraction

Equal amounts of external control RNA were added to the polysome-fractionated samples after thawing the snap-frozen samples on ice. Commercially available external control RNA, including the ERCC RNA Spike-In Mix-1 kit (Thermo Fisher Scientific, 4456740) that we used, does not have a canonical mRNA cap. This can influence the template-switching reaction efficiency. Thus, the amount of external control RNA added to the polysome-fractionated samples was determined by preliminary experiments, so as to result in a library containing around 0.1% of reads from the external control RNA.

RNA was extracted from 150 µL of the fractionated samples using an RNA Clean & Concentrator-5 kit (Zymo Research, R1016), using the same procedure to extract RNA from unfractionated cell lysate (described above), and was eluted into 10 µL of water. For a subset of samples, as indicated in Supplementary Data 1, half of the input volume was used, and RNA was eluted into 8 µL of water. The integrity of the purified RNA was confirmed using a Bioanalyzer (Agilent); the median value of RNA integrity number (RIN) for the samples from RCC4 VHL cells was 9.5, indicating that the RNA was largely intact.

5′ end-seq protocolPrimer sequences

The sequences of oligonucleotide primers used for 5′ end-seq are summarized in Supplementary Data 4. All the primers were synthesized and HPLC-purified by Integrated DNA Technologies.

The 5′ end-seq method involves the following steps.

Step 1: reverse transcription and template switching

cDNAs with adapter sequences at both the 5′ and 3′ ends were generated from full-length mRNAs using a combined reverse-transcription and template-switching reaction. The RT primers, containing an oligonucleotide (dT) sequence, were annealed to the poly A tail of mRNAs by incubating 4 µL reaction mix (1.9 µL extracted RNA, 1 µL 10 mM dNTP, 0.1 µL 20 U/µL SUPERaseIn RNase-Inhibitor, and 1 µL 10 µM RT primer) at 72 ˚C for 3 minutes and holding it at 25 ˚C. Then, 1 µL of 10 µM template-switching oligonucleotide (TSO), and 5 µL of RT reaction mix (2 µL of 5× RT buffer (supplied with Maxima H Minus Reverse Transcriptase), 2 µM of 5 M betaine, 0.25 µL of water, 0.25 µL of SUPERaseIn RNase-Inhibitor, and 0.5 µL of 200 U/µL Maxima H Minus Reverse Transcriptase (Thermo Fisher Scientific, EP0753)) were added to the reaction. The TSO contained an adapter sequence (the constant region annealed by the PCR primers), an index sequence (to identify the sample source of the cDNA), unique molecular identifiers (UMI), and three riboguanosines at the 3′ end (to facilitate template-switching reaction80). To perform the reverse-transcription and template-switching reactions, the mixture was kept at 25 ˚C for 45 minutes, 42 ˚C for 25 minutes, 47 ˚C for 10 minutes, 50 ˚C for 10 minutes, and 85 °C for 5 minutes, and held at 4 ˚C.

Step 2: enzymatic degradation of primers and RNA

Preliminary experiments indicated that the degradation of unused primers using a single-stranded DNA specific 3′–5′ exonuclease, Exonuclease I, reduced primer dimer artifacts in the subsequent PCR amplification, whereas the degradation of RNA by RNase H improved the yield of cDNA library. Furthermore, it is important to degrade TSO because, if unused TSO contaminates the cDNA library after multiplexing, it confounds the library indexing. Because we suspected that the TSO is resistant to Exonuclease I owing to the riboguanosines at the 3′ end, the TSO contains three deoxyuridines (after the adapter sequence, index sequence, and UMI) so that the TSO can be degraded by the combination of an enzyme-cleaving DNA at a deoxyuridine and Exonuclease I. Importantly, this degrades all the TSO except the adapter, which forms a high-melting-temperature duplex with the cDNA, protecting the cDNA from Exonuclease I. All these reactions were performed in a single step by adding 2 µL enzyme mix (1 µL of Thermolabile USER II (New England Biolabs, M5508L), 0.5 µL of Exonuclease I (New England Biolabs, M0293S), and 0.5 µL of RNase H (New England Biolabs, M0297S)) to the sample, which was incubated at 4 ˚C for 1 second, 37 ˚C for 1 hour, 80 ˚C for 20 minutes, and held at 4 ˚C.

Step 3: limited-cycle PCR amplification

Fifteen microliters of PCR reaction mix (1.25 µL of 10 µM of each PCR primer 1 forward/reverse, 12.5 µL of KAPA HiFi HotStart Uracil+ ReadyMix (Roche, KK2802), and 1.25 µL of water) was added to the RT reaction, and limited-cycle PCR amplification was performed by keeping the mixture at 98 ˚C for 3 minutes; 98 °C for 20 seconds, 67 °C for 15 seconds, and 72 °C for 6 minutes (4 cycles); and 72 ˚C for 5 minutes; and the mixture was then held at 4 °C.

Step 4: multiplexing and optimized PCR cycle amplification

After adding 37.5 µL ProNex beads (Promega, NG2002) to each sample, up to 16 samples were multiplexed. The cDNA library was purified according to the manufacturer’s instructions, eluted into 42 µL 10 mM Tris-HCl, pH 7.4, then re-purified using ProNex beads (1.5:1 vol/vol ratio of beads to sample) and eluted into 45 µL of 10 mM Tris-HCl, pH 7.4. The library was reamplified by preparing PCR reaction mix (20 µL of cDNA library, 25 µL of KAPA HiFi HotStart Uracil+ ReadyMix, and 2.5 µL of 10 µM each PCR primer 1 forward/reverse), and the mixture was kept at 98 ˚C for 3 minutes; 98 °C for 20 seconds, 67 °C for 15 seconds, and 72 °C for 6 minutes (4–6 cycles (see beelow)); and 72 ˚C for 5 minutes; and the mixture was then held at 4 °C. The number of PCR cycles for each amplification was determined by a pilot experiment using quantitative PCR (qPCR) to ensure that the amplification was at the early linear phase. The amplified cDNA library was purified using ProNex beads, as above, and eluted into 26 µL of 10 mM Tris-HCl, pH 7.4. The purified cDNA library was quantified using a Qubit dsDNA HS Assay Kit (Thermo Fisher Scientific, Q32851).

Step 5: tagmentation

Tagmentation with Tn5 transposase was performed on 90-ng aliquots of the cDNA library using an Illumina DNA Prep kit (Illumina, 20018704), according to the manufacturer’s instructions.

Step 6: PCR amplification of mRNA 5′-end library

The ‘tagmented’ library was attached to the beads of an Illumina DNA Prep kit. Limited-cycle PCR amplification was performed by adding 50 µL of the following reaction mix (2.5 µL of 10 µM each of the PCR primer 2 forward/reverse, 20 µL of Enhanced PCR Mix (supplied with an Illumina DNA Prep kit), and 27.5 µL of water) and using a program of 68 ˚C for 3 minutes; 98 ˚C for 3 minutes; 98 ˚C for 45 seconds, 62 ˚C for 30 seconds, and 68 ˚C for 2 minutes (3 cycles); and 68 ˚C for 1 minute; and it was then held at 10 ˚C. The PCR primers used here anneal to the TSO and an adapter added by tagmentation, and thus specifically amplify DNA fragments containing 5′ ends of mRNAs. The amplified mRNA 5′-end library was purified using ProNex beads, as above, and eluted into 25 µL of 10 mM Tris-HCl (pH 7.4).

The mRNA 5′-end library was reamplified by preparing a PCR reaction mix (10 µL of the mRNA 5′-end library, 25 µL KAPA HiFi HotStart ReadyMix, 2.5 µL of 10 µM each of PCR primer 3 forward/reverse (containing i5 and i7 index sequences), and 12.5 µL water), and the mixture was kept at 98 ˚C for 3 minutes; 98 ˚C for 20 seconds, 62 ˚C for 15 seconds, and 72 ˚C for 30 seconds (5 cycles (cycle number determined by a pilot experiment to define the early linear phase, as described above)); and 72 ˚C for 5 minutes; and it was then held at 4 °C. The mRNA 5′-end library was again purified using ProNex beads (1.4:1 vol/vol ratio of beads to sample) according to the manufacturer’s instructions, and eluted into 20 µL of 10 mM Tris-HCl, pH 7.4. The purified mRNA 5′-end libraries were multiplexed again and then sequenced on HiSeq 4000 (Illumina) using paired-end (2×100 cycles) and dual-index mode.

RT–qPCR

RNAs extracted from polysome-fractionated samples were converted into cDNAs using the same protocol as the 5′ end-seq protocol described above, except that the anchored oligonucleotide dT primer (Integrated DNA Technologies, 51-01-15-08) was used, and the TSO was omitted from the reaction. The cDNA was purified using an RNA Clean and concentrator-5 kit (Zymo Research, R1016) according to the manufacturer’s instructions. qPCR was performed using TaqMan Fast Advanced Master Mix (Thermo Fisher Scientific, 4444557) according to the manufacturer’s instructions with the mRNA isoform-specific primers and Taqman probes summarized in Supplementary Data 4. All the primers were synthesized by Integrated DNA Technologies. Quantification of mRNAs in each fraction was normalized to the quantification of ERCC-0002 RNA in the same fraction.

Overview of computational data analyses

Data analyses were performed using R (4.0.0)44 and the following packages (data.table (1.12.8)45, dplyr (1.0.0)46, stringr (1.4.0)47, magrittr (1.5)48, and ggplot2 (3.3.1)49) were used throughout.

The following reference data were used to annotate the data: human genome: hg38, obtained via BSgenome.Hsapiens.UCSC.hg38 (1.4.3)50; human transcripts: RefSeq51 (GRCh38.p13) and GENCODE52 (GENCODE version 34: gencode.v34.annotation.gtf) (these two reference data were combined and redundant GENCODE entries that have a corresponding RefSeq annotation were removed).

Prior to the high-throughput DNA-sequencing data analysis, sequencing data from the technical replicates were concatenated. Data are presented as the mean value of the biological replicates.

TSS boundaries and their associated mRNA isoforms were identified by 5′ end-seq of total (unfractionated) mRNAs. The TSSs assigned to a particular gene were those mapping within 50 base pairs of that gene locus, as specified by RefSeq and GENCODE. The abundance of the mRNA isoform associated with each TSS is the number of reads starting from that TSS. The gene-level mRNA abundance is the sum of these isoforms for the relevant gene.

Statistics

The correlation of two variables was analyzed with the cor.test function of R to calculate statistics on the basis of Pearson’s product moment correlation coefficient or Spearman’s rank correlation coefficient. The difference between two distributions was tested using the two-sided Mann–Whitney U test (for two independent samples) or the two-sided Wilcoxon signed-rank test (for paired samples). To analyze the effect size of the Wilcoxon signed-rank test, the matched-pairs rank biserial correlation coefficient53 was calculated using the wilcoxonPairedRC function of the rcompanion package (2.3.26)54.

Kernel density estimation was performed using the geom_density function of the ggplot2 package with the parameter, bw = SJ.

Sequencing read alignmentRead pre-processing

The sequence at positions 1–22 of read 1 is derived from the TSO and was processed before mapping. First, the UMI located at positions 10–16 was extracted using UMI-tools (1.0.1)55. Note that the UMI was not used in the analyses because we found that the diversity of UMI was not sufficient to uniquely mark non-duplicated reads. Next, the library was demultiplexed using an index sequence located at positions 1–8, after which the constant regions of the TSO located at position 9 and positions 17–22 were removed using Cutadapt (2.10)56 with the parameters, -e 0.2–discard-untrimmed.

Read alignment

The pre-processed reads were first mapped to cytoplasmic rRNAs (NR_023363.1 and NR_046235.1), mitochondrial ribosomal rRNAs (ENSG00000211459 and ENSG00000210082), and ERCC external control RNAs (https://www-s.nist.gov/srmors/certificates/documents/SRM2374_putative_T7_products_NoPolyA_v1.fasta) using Bowtie2 software (2.4.1)57 with the following parameters: -N 1–un-conc-gz. The unmapped reads were then aligned to the human genome (hg38: sequence obtained via BSgenome.Hsapiens.UCSC.hg3850) with the annotation described above using STAR software (2.7.4a) in two-pass mode58 with the following parameters: –outFilterType BySJout–outFilterMultimapNmax 1.

Definition of TSS peaks and boundaries

To define TSS clusters, we considered two widely used peak callers, paraclu (9)59 and decomposition-based peak identification (dpi, beta3)60 software. Our preliminary analysis indicated that paraclu software was more accurate in determining total peak area, whereas dpi was more accurate in resolving peaks within multimodal clusters. To obtain the most accurate resolution and quantification of TSS clusters, we therefore combined the strength of these programs and included information from existing large-scale database using the following four-step procedure.

Step 1: definition of cluster areas

Using the standard workflow of paraclu software on pooled data from normoxic cells, RCC4, RCC4 VHL, 786-O, and 786-O VHL, cluster areas of 5′ termini were identified.

Step 2: definition of TSS clusters within cluster areas

The cluster areas defined above were further resolved by combining above data with FANTOM5 data and using dpi software, as was originally used for FANTOM5, to resolve bona fide subclusters within the data. Internal sub-cluster boundaries were defined as the midpoint between adjacent dpi-identified peaks.

Step 3: quality controls and filters

Artifactual clusters of 5′ termini, potentially generated by internal TSO priming, were filtered on the basis of a low (<15%) proportion of reads bearing non-genomic G between the TSO and mRNA, as the template-switching reaction commonly introduces such bases at the mRNA cap but not following internal priming4. Since mitochondrial mRNAs are not capped, these transcripts were filtered if they did not overlap an annotated site.

A further filter was applied to remove TSS subclusters of low-abundance mRNA isoforms whose biological significance is unclear; low abundance was defined as ≦10% of the most abundant mRNA isoform for the relevant gene in any of the analyses.

Step 4: final assignment of TSS boundaries

To provide the most accurate identification of the TSS peaks and their boundaries, the resolved and filtered peaks from step 3 were mapped back onto the input cluster areas as defined in step 1, and boundaries were set at the midpoint between filtered peaks.

Assignment of transcripts to TSS

To identify mRNA features that might affect translational efficiency, we used base-specific information on 5′ termini and assembled paired-end reads starting from each TSS (StringTie software, 2.1.2 (ref. 61)) to define the primary structure of the 5′ portion of the transcript. We then used homology with this assembly to assign a full-length transcript from RefSeq and GENCODE. The CDS of the assigned transcript was then used for the analysis. In small number of cases, where this TSS was downstream of the start codon, we took the most upstream in-frame AUG sequence to redefine the CDS. The most abundant primary structure from each TSS and its CDS were then used for calculation of the association of mRNA features with mean ribosome load (see below). Details of this process are given in the computational pipeline.

mRNA feature evaluation

Features within the mRNA (for example TOP motif, structure near cap) were evaluated at base-specific resolution using the following formula:

$$\left( } \\ \end} \right) = \mathop \limits_^n }\nolimits_^n }}}$$

where i is a base position within the TSS, n is the linear sequence extent of the TSS, mRNA feature valuei is the value of mRNA feature for the isoform transcribed from position i, and mRNA abundancei is the mRNA abundance of the isoform transcribed from position i. The values were rounded to the nearest integer; a rounded value of 0 being taken as the absence of the feature.

All non-overlapping uORFs, starting from an AUG, were identified using the ORFik package (1.8.1)62. Kozak consensus score was calculated by the kozakSequenceScore function of the ORFik package. Using the mode including G-quadruplex formation, the minimum free energy (MFE) of predicted RNA structures was estimated using RNALfold (ViennaRNA package, 2.3.3)63. The MFE of RNA structures near the cap was that of the first 75 nucleotides. The MFE of the region distal to the cap was that of entire 5′ UTR minus the first 75 nucleotides. The position of a TOP motif was defined as the position of the 5′ most pyrimidine base, and its length was defined as that of the uninterrupted pyrimidine tract from that base.

The effect of HIF-dependent alternate TSS usage on CDS was defined by alteration in the genomic position of the start codon (Extended Data Fig. 8 and Supplementary Data 2). Expressed isoforms of a gene were defined as those with an abundance greater than 10% of that of the most highly expressed isoform of the same gene in either RCC4 VHL or RCC4 cells.

Functional annotation of genesFunctions

Functional classes of genes were defined by KEGG orthology64, as indicated by the following KEGG IDs. Transcription factors: 03000, Transcription machinery: 03021, Messenger RNA biogenesis: 03019, Spliceosome: 03041, Cytoplasmic and mitochondrial ribosome: 03011 (genes with the name starting with MRP and DAP3 were categorized as mitochondrial ribosomes), Translation factors: 03012, Chaperones and folding catalysts: 03110, Membrane trafficking: 04131, Ubiquitin system: 04121, and Proteasome: 03051; Glycolysis: hsa00010, Pentose phosphate pathway: hsa00030, TCA cycle: hsa00020, Fatty acid biosynthesis: hsa00061 and hsa00062, Fatty acid degradation: hsa00071, Oxphos: hsa00190, Nucleotide metabolism: hsa00230 and hsa00240, and Amino acid metabolism: hsa00250, hsa00330, hsa00220, hsa00270, hsa00260, hsa00340, hsa00310, hsa00360, hsa00400, hsa00380, hsa00350, hsa00290, and hsa00280.

Genes associated with angiogenesis or vascular process were defined by referencing to gene ontology (GO)65 database: GO:0003018, vascular process in circulatory system; GO:0001525, angiogenesis.

Analysis of existing literatures describing mTOR targets

In the analyses comparing HP5 data with previously published studies reporting the effects of mTOR inhibition14,21,32,33, we followed the definition of mTOR hypersensitive genes in the original reports; for Hsieh et al. and Larsson et al., the genes showing changes in translation with PP242 were used; for Morita et al., genes described in Fig. 1b of the paper33 were used. Since the data of Thoreen et al. were obtained using mouse cells, we mapped mouse genes to human genes using the gorth function of the gprofiler2 package (0.1.9)66. Since Hsieh et al. did not supply values for changes in translational efficiency for all genes, we took this data from Xiao et al.67, who calculated the relevant values using the data from the original report.

To define known activities of mTOR via any mode of regulation except translational regulation (as indicated in Fig. 2c, first row), we considered review articles by Saxton et al.13 and Morita et al.68. Known systematic translational downregulation by mTOR inhibition (as indicated in Fig. 2c, second row) was defined from previous genome-wide studies listed above14,21,32. A class of targets was defined as systematically regulated if ≥10% of genes in the class were identified as mTOR hypersensitive or resistant in any of these previous studies21,32 or highlighted in the original report.

Analyses of differential mRNA expression upon VHL loss

The identification of differentially expressed genes and the calculation of log2(fold change in mRNA abundance) upon VHL loss were performed using the DESeq2 package (1.28.0)69. Genes with an FDR < 0.1 and either log2(fold change) > log2(1.5) or < –log2(1.5) were defined as upregulated or downregulated, respectively.

HIF-target genes (as considered in Extended Data Fig. 10f) were defined as those upregulated upon VHL loss in RCC4 cells. For this analysis, genes with very low expression in both 786-O and 786-O VHL cells, as identified by the DESeq2 package, were excluded from the analysis.

Analysis of alternative TSS usage upon VHL loss

Genes manifesting alternative TSS usage upon VHL loss were identified using the approach described by Love et al.70. Briefly, TSSs for mRNA isoforms with very low abundance were first filtered out using the dmFilter function of the DRIMSeq package (1.16.0)71 with the parameters min_samps_feature_expr = 2, min_feature_expr = 5, min_samps_feature_prop = 2, min_feature_prop = 0.05, min_samps_gene_expr = 2, min_gene_expr = 20. The usage of a specific TSS relative to all TSSs was then calculated by DRIMSeq with the parameter add_uniform = TRUE.

The significance of changes in TSS usage upon VHL loss for a particular gene was analysed by the DEXSeq package72. The FDR was calculated using the stageR package (1.10.0)73, with a target overall FDR < 0.1. For genes with significant changes in VHL-dependent TSS usage, a VHL-dependent alternative TSS was selected as that showing the largest fold change upon VHL loss (FDR < 0.1), and a base TSS was selected as that showing the highest expression in the presence of VHL. In these calculations, the DESeq2 and apeglm (1.10.0) package74 were used to incorporate data variance to provide a conservative estimate of fold change and standard error.

To provide the highest stringency definition, genes manifesting VHL-dependent alternative TSS usage were further filtered by the proportional change > 5%, the absolute fold change > 1.5, and the significance of the difference in fold change between the alternate TSS and the base TSS (assessed by non-overlapping 95% confidence intervals).

For the comparative analysis of the VHL-dependent alternate TSS usage in various conditions (Extended Data Fig. 7), genes with very low expression that did not meet a criterion of 20 read counts in more than 1 sample were excluded.

Calculation of mean ribosome load

Mean ribosome load was calculated using the following formula:

$$\scriptstyle\nolimits_^8 } \\ \end} \right) \times \left( } \\ \end} \right)} \right\}} }}\nolimits_^8 } \\ \end} \right)} }}}$$

The mRNA abundance values for each polysome fraction were normalized by the read count of the external control using the estimateSizeFactors fraction of the DESeq2 package. Very-low-abundance mRNAs that did not meet a criterion of six read counts in more than six samples were excluded.

Statistical analysis of differences in polysome distributionVHL-dependent alternative TSS mRNA isoforms

To define VHL-dependent alternative mRNA isoforms with a different translational efficiency with reference to all other isoforms from the same gene, the significance of changes in their polysome profile was determined by considering the ratio of mRNA abundances as a function of polysome fraction using the DEXSeq package (1.34.0)72. The false-discovery rate (FDR) was calculated using the stageR package73, with the target overall FDR < 0.1.

Differentially translated mRNA isoforms from the same gene

In analysis of two most differentially translated mRNA isoforms transcribed from the same gene (for Fig. 1f), each of these isoforms was censored for statistically significant differences from all other isoforms of the same gene using the same analysis as above.

Changes in response to mTOR inhibition

To identify genes that were hypersensitive or resistant to mTOR inhibition, genes manifesting a significant change in polysome distribution upon mTOR inhibition, compared to the population average, were first identified using the DESeq2 package72 with the internal library size normalization and the likelihood ratio test. The genes with a significant change (FDR < 0.1) were classified as hypersensitive or resistant to mTOR inhibition if the log2 fold change of the mean ribosome load was lower or higher than the median of all expressed genes.

Simulation of changes in translational efficiency with omitting a parameter

Log2(fold change) in mean ribosome load of a gene upon VHL loss can be expressed by the following formula:

$$log2\left( \nolimits_^n \times }}}\,mRNA\,abundance_)} }}\nolimits_^n \times }}}\,mRNA\,abundance_)} }}} \right)$$

In this formula, i is mRNA isoform i (out of n mRNA isoforms), MRLno VHLorMRLVHL,i is the mean ribosome load of isoform i in RCC4 or RCC4 VHL cells, and % mRNA abundanceno VHLor% mRNA abundanceVHL,i is the percentage abundance of isoform i relative to that of all isoforms in RCC4 or RCC4 VHL cells.

To assess the contribution of alternative TSS usage to changes in mean ribosome load of a gene, we tested a simulation that omitted the VHL-dependent changes in translational efficiency within each mRNA isoform using the following formula:

$$log2\left( \nolimits_^n }}_}},}}} \times \% \,mRNA\,abundance_)} }}\nolimits_^n }}_}},}}} \times \% mRNA\,abundance_)} }}} \right)$$

In this formula, MRLaverage, i is the combined average of MRLno VHL, i and MRLVHL, i as defined above. When values for either of MRLno VHL, i and MRLVHL, i are missing, these values are excluded from the calculation of the average.

To assess the contribution of VHL-dependent changes in translational efficiency within each mRNA isoform to changes in mean ribosome load of a gene, we tested a simulation which omitted the VHL-dependent changes in TSS usage using the following formula:

$$\log 2\left( \nolimits_^n \times \% \,}}\,}}_}}})} }}\nolimits_^n \times \% \,}}\,}}_}}})} }}} \right)$$

In this formula, % mRNA abundanceaverage, i is the combined average of % mRNA abundanceno VHL, i and % mRNA abundanceVHL, i defined above. When values for either of MRLno VHL, i and MRLVHL, i are missing, these genes were excluded from the analysis.

Generalized additive model to predict mean ribosome load

A generalized additive model was used to predict mean ribosome load of mRNAs from the preselected mRNA features. To test the model, a cross-validation approach was deployed to predict the MRL of the top 50% expressed genes on 4 randomly selected chromosomes, which were excluded from the training data used to derive the model. To provide an accurate estimate of the model’s performance, this process was repeated ten times, and the median value of the coefficient of determination (R2) was calculated.

For model construction, the gam function of the mgcv package (1.8-31)75 of R was used, deploying thin-plate regression splines with an additional shrinkage term (with the parameter, bs = ‘ts’) and restricted maximum likelihood for the selection of smoothness (with the parameter, method = ‘REML’). The analysis was restricted to mRNAs with a 5′ UTR length longer than 0 nt and a CDS length longer than 100 nt; 5′ UTR and CDS length were log10-transformed, and the MFE values of RNA structures were normalized by the segment length (nt).

Principal component analysis

Library-size normalization and a variance-stabilizing transformation were applied to the mRNA abundance data using the vst function of the DESeq2 package69 with the parameter, blind = TRUE. Principal component analysis of the transformed data was performed for genes showing the most variance (top 25%) using the plotPCA function of the DESeq2 package.

留言 (0)

沒有登入
gif