RNA stability controlled by m6A methylation contributes to X-to-autosome dosage compensation in mammals

Cell culture

All cell culture was performed in a humidified incubator at 37 °C and 5% CO2. All cell lines were routinely monitored for mycoplasma contamination.

Parental male and female mESCs32,44 were provided by D. Dominissini (Tel Aviv University, Israel) and E. Heard (EMBL Heidelberg, Germany). mESC lines were further authenticated by RNA-seq. Standard tissue culture was performed in 2i/LIF medium. Briefly, 235 ml of each DMEM/F12 and neurobasal (Gibco, 21331020, 21103049) was mixed with 7.5 ml BSA solution (7.5%, Thermo Fisher Scientific, 11500496), 5 ml penicillin–streptomycin (P/S, Thermo Fisher Scientific, 10378016), 2 mM l-glutamine (Thermo Fisher Scientific, 25030024), 100 µM β-mercaptoethanol (Gibco, 21985023), 5 ml mM nonessential amino acids (Gibco, 11140050), 2.5 ml N-2 supplement (Gibco, 17502048), 5 ml B-27 supplement (Gibco, 17504044), 3 µM CHIR99021 (Sigma, SML1046), 1 µM PD 0325901 (Biomol, 13034-1), 10 ng ml–1 LIF (IMB Protein Production Core Facility). Cell culture dishes were coated using 0.1% gelatine (Sigma, ES-006-B). The medium was exchanged every day, and cells were passaged every second day. Single colonies of female mESCs were picked under the microscope using a pipette tip and cultured under standard conditions in 96-well plates until confluency was reached.

HEK293T (ATCC, CRL-3216) and C643 (CLS, RRID: CVCL_5969) cells were cultured in DMEM (Thermo Fisher Scientific, 21969035) supplemented with 10% fetal bovine serum (FBS, Pan Biotech, P40-47500), 1% P/S (Thermo Fisher Scientific, 10378016), and 1% l-glutamine. RPE1 (ATCC, CRL-4000) cells were cultured in DMEM/F12 (Thermo Fisher Scientific, 21331020) supplemented with 10% FBS (Pan Biotech, P40-47500), 1% P/S (Thermo Fisher Scientific, 10378016), 1% l-glutamine, and 0.04% hygromycin B (Thermo Fisher Scientific, 10453982).

Human primary dermal fibroblasts were provided by S. Schweiger (University Medicine Mainz, Germany). Cells were grown in IMDM medium (Thermo Fisher Scientific, 12440053) supplemented with 15% FBS and 1% P/S.

Primary human dermal fibroblasts derivation

Primary human dermal fibroblasts were isolated from skin punch biopsies obtained at the Children’s Hospital of the University Medical Center in Mainz, Germany, as previously described with small adjustments62. Briefly, 4-mm skin biopsies were processed in small pieces and transferred into a 6-well plate coated with 0.1% gelatine. DMEM (Thermo Fisher Scientific, 21969035) supplemented with 20% FBS (Pan Biotech, P40-47500) and 1% P/S (Thermo Fisher Scientific, 10378016) was used for culturing the skin biopsies, and medium was exchanged every other day. After 3–4 weeks, when the 6-well plate was full of dermal fibroblasts that had migrated out of the skin biopsies, cells were transferred to T75 flasks and cultured in standard conditions. Human dermal fibroblasts were further expanded or frozen in liquid nitrogen for long-term storage. Ethical approval by the local ethical committee was obtained (no. 4485), and consent for research use was obtained in an anonymized way.

Mettl3 inhibitor treatment

For acute m6A depletion in mESCs, the Mettl3 inhibitor STM2457 (STORM Therapeutics) was used. Cells were treated with medium supplemented with 20 µM STM2457 in DMSO 0.2% (vol/vol) or with DMSO 0.2% (vol/vol) alone as control. m6A depletion was monitored by liquid chromatography with tandem mass spectrometry (LC–MS/MS). After 3–24 h of treatment, cells were washed twice with ice-cold 1× PBS and collected on ice for further analysis

RNA isolation and poly(A) selection

Cells were washed twice with ice-cold 1× PBS and collected on ice. For total RNA isolation, the RNeasy Plus Mini Kit (Qiagen, 74136) was used, following the manufacturer’s instructions. For poly(A) selection, Oligo d(T)25 Magnetic Beads (Thermo Fisher Scientific, 61002) were used, following the manufacturer’s instructions.

qPCR

For quantification of mRNA levels, 500 ng total RNA was reverse transcribed into complementary DNA (cDNA) using the RevertAid Reverse Transcriptase (Thermo Fisher Scientific, 10161310) using oligo(dT)18 primer (Thermo Fisher Scientific, SO131), following the manufacturer’s instructions. In accordance with the manufacturer’s instructions, qPCR reactions were performed in technical triplicates using the Luminaris HiGreen qPCR Master Mix, low ROX (Thermo Fisher Scientific, K0971), with forward and reverse primer (0.3 µM each) and 2 µl of 1:10 diluted cDNA as template. All qPCR reactions were run on a ViiA 7 Real-Time PCR System (Applied Biosystems). All qPCR primers are listed in Supplementary Table 4.

LC–MS/MS

LC/MS-MS experiments were performed as described in ref. 33. For all samples, quantification involved biological duplicates and averaged values of m6A normalized to A, and the respective s.d. values are shown.

SLAM-seqCell viability for optimization

To determine the 10% maximal inhibitory concentration in a determined time window (IC10,ti), the Cell Viability Titration Module from LeXogen (059.24) was used, following the manufacturer’s recommended protocol. In brief, 5,000 cells were plated in a 96-well plate 1 d prior to the experiment. Cells were incubated for 24 h in media supplemented with varying s4U concentrations. For optimal incorporation, the s4U-supplemented media were exchanged every 3 h. Cell viability was assessed using the CellTiter-Glo Luminescent Cell Viability Assay Kit from Promega (G7570), following the manufacturer’s recommended protocol. The luminescence was measured using Tecan Infinite M200 Pro plate reader. The cell doubling time of male mESCs in the presence of 100 µM s4U was 13.3 h, as determined by cell counting.

SLAM-seq experiment

mRNA half-lives were determined by SLAM-seq using the Catabolic Kinetics LeXogen Kit (062.24). In brief, mESCs were seeded 1 d before the experiment in a 24-well plate to reach full confluency, according to the doubling time, at the time of sample collection. The metabolic labeling was performed by addition of 100 µM s4U to the mESC medium for 24 h. The medium was exchanged every 3 h. After the metabolic labeling, cells were washed twice with 1× PBS, and fresh medium was supplemented with a 100× excess of uridine. At time points increasing at a 1.5× rate, medium was removed and cells were directly lysed in TRIzol (Thermo Fisher Scientific,15596026) reagent in reducing conditions. Total RNA was resuspended in the elution buffer in the Lexogen catabolic kit. The iodoacetamide treatment was performed using 5 µg of RNA. The library preparation for sequencing was performed using the QuantSeq 3′ mRNA-Seq Library Prep Kit for Illumina (FWD) from Lexogen, following the recommended protocol.

For stable m6A depletion, STM2457 or DMSO was supplemented 6 h prior to the uridine chase. The media for the uridine chase were supplemented with STM2457 and DMSO for continuous m6A depletion.

SLAM-seq library preparation

Library preparation for next-generation sequencing was performed with QuantSeq 3′ mRNA-Seq Library Prep Kit FWD (Lexogen, 015), following the manufacturer’s standard protocol (015UG009V0252). Prepared libraries were profiled on a 2100 Bioanalyzer (Agilent Technologies) and quantified using the Qubit dsDNA HS Assay Kit, in a Qubit 2.0 Fluorometer (Life Technologies). All samples were pooled together in an equimolar ratio and sequenced on an Illumina NextSeq 500 sequencing device using three High Output flow cells as 84-nt single-end reads.

Data processing

Published SLAM-seq data were taken from ref. 28. 3′ UTR annotations were taken from ref. 28 and filtered to match the GENCODE annotation63 release M23. Non-overlapping annotations were discarded.

Raw data were quality checked using FastQC (v0.11.8) (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Sequencing data were processed using SLAM-DUNK (v0.4.3)64 with the following parameters: mapping was performed allowing multiple mapping to up to 100 genomic positions for a given read (-n 100). Reads were filtered using SLAM-DUNK -filter with default parameters. For annotation of single-nucleotide polymorphisms (SNPs), all unlabeled samples were merged and SNPs were called using SLAM-DUNK snp with default parameters and -f 0.2. Transition rates were calculated using SLAM-DUNK count with default parameters, providing the SNP annotation of unlabeled samples (-v). If more than one 3′ UTR per gene remained, they were collapsed using SLAM-DUNK collapse64. Only genes on canonical chromosomes 1–19 and X were considered.

Principal component analysis

Principal component analysis (PCA) of SLAM-seq data was performed by estimating size factors on the basis of read counts using the R/Bioconductor package DESeq2 (ref. 65) (v1.26.0) in an R environment (v3.6.0). PCA was based on the number of T-to-C reads per gene for 500 genes with the highest variance, corrected by the estimated size factors.

Incorporation rate

s4U incorporation rates were calculated by dividing the number of T-to-C conversions on Ts for each 3′ UTR by the overall T coverage.

Half-life calculation

To calculate mRNA half-lives, T-to-C background conversion rates (no s4U labeling) were subtracted from T-to-C conversion rates of s4U-labeled data. Only 3′ UTRs with reads covering over 100 Ts (T-coverage > 100) were kept (Extended Data Fig. 2d). For each time point, T-to-C conversion rates were normalized to the time point after 24 h of s4U labelling (that is, the onset of the uridine chase), which corresponds to the highest amount of s4U incorporation in the RNA (24 h s4U labelling, T0) and fitted using an exponential decay model for a first-order reaction using the lm.package (as described in ref. 28, adapted from ref. 66). Half-lives of >18 h (1.5 times of the last time point) and <0.67 h, as well as fitted values with a residual s.e. of >0.3, were filtered out (Extended Data Fig. 2e). Only transcripts with a valid half-life calculation in both conditions were kept for further analysis. For statistical analysis of half-life fold changes, see Supplementary Methods.

RNA-seq library preparation and data processingRNA-seq library preparation

RNA-seq library preparation was performed with Illumina’s Stranded mRNA Prep Ligation Kit following the Stranded mRNA Prep Ligation Reference Guide (June 2020) (document no. 1000000124518 v00). Libraries were profiled on a 2100 Bioanalyzer (Agilent Technologies) and quantified using the Qubit dsDNA HS Assay Kit (Thermo Fisher Scientific, Q32851) in a Qubit 2.0 Fluorometer (Life Technologies), following the manufacturer’s recommended protocols. Samples were pooled in equimolar ratios and sequenced on an Illumina NextSeq 500 sequencing device with one or two dark cycles upfront as 79-, 80- or 155-nt single-end reads.

Data processing

Basic quality controls were done for all RNA-seq samples using FastQC (v0.11.8) (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Prior to mapping, possible remaining adapter sequences were trimmed using Cutadapt67 (v1.18). A minimal overlap of 3 nt between read and adapter was required, and only reads with a length of at least 50 nt after trimming (--minimum-length 50) were kept for further analysis. For samples sequenced with only one dark cycle at the start of the reads, an additional 1 nt was trimmed at their 5′ ends (--cut 1).

Reads were mapped using STAR68 (v2.7.3a), allowing up to 4% of the mapped bases to be mismatched (--outFilterMismatchNoverLmax 0.04 --outFilterMismatchNmax 999) and a splice junction overhang (--sjdbOverhang) of 1 nt less than the maximal read length. Genome assembly and annotation of GENCODE63 release 31 (human) or release M23 (mouse) were used during mapping. In the case that ERCC spike-ins were added during library preparation, their sequences and annotation (http://tools.thermofisher.com/content/sfs/manuals/ERCC92.zip) were used in combination with those from GENCODE. Subsequently, secondary hits were removed using SAMtools69 (v1.9). Exonic reads per gene were counted using featureCounts from the Subread tool suite70 (v2.0.0) with non-default parameter --donotsort -s2.

Differential gene expression analysis

Differential gene expression between conditions was performed using the R/Bioconductor package DESeq2 (v1.34.0)65 in the R environment (v4.1.2; https://www.R-project.org/). DESeq2 was used with a significance threshold of adjusted P value < 0.01 (which was also used to optimize independent filtering). Because normalization to total transcript abundance can introduce biases, especially when the majority of genes are affected by the treatment, we included spike-ins in our initial RNA-seq dataset. As an alternative normalization strategy to spike-ins, we tested 100 randomly chosen genes without any m6A sites but noticeable expression (reads per kilobase per million fragments mapped (RPKM) > 10) for normalization. To validate this normalization approach, the calculated fold changes were compared with spike-in-normalized data. Because the correlation between both normalization strategies was very high, we used the 100 genes for normalization in all further analyses (Extended Data Fig. 3b). For RNA-seq expression change analysis, see Supplementary Methods and Supplementary Table 5.

miCLIP2

miCLIP2 experiments were performed as described in ref. 33. For a detailed description of analyses, see Supplementary Methods.

Quantification of m6A sites in transcripts

m6A sites from miCLIP2 for male mESCs, mouse heart samples, mouse macrophages, and human HEK293T and C643 cells were taken from ref. 33 (Gene Expression Omnibus (GEO) accession number GSE163500). m6A sites were predicted using m6Aboost as described in ref. 33. For miCLIP2 mouse heart data, only m6A sites that were predicted by m6Aboost in both datasets (1 µg and 300 ng) were considered for the analysis.

Comparison of m6A sites per transcripts

Numbers of m6A sites were counted for each protein-coding transcript. Only transcripts on canonical chromosomes 1–19 and X were considered. To account for expression differences, transcripts were stratified according to their expression levels on the basis of the respective miCLIP2 data. Expression levels were estimated using htseq-count71 (v0.11.1) and genome annotation of GENCODE63 release M23 on the truncation reads from miCLIP2 data (noC2T reads)33. The derived transcript per million (TPM) values for all replicates (n = 3) were averaged, log10-transformed, and used to stratify all transcripts into 12 equal-width bins (step size of log10(TPM) = 0.25), collecting all transcripts with log10(TPM) < 0.5 or > 3 into the outer bins (Extended Data Fig. 6a). A minimum of TPM > 1 was set. For each expression bin, the mean and 95% confidence interval of the number of m6A sites per transcript were calculated (Fig. 3a–c and Extended Data Fig. 6c). To estimate the fold change of m6A sites per chromosome compared with all other chromosomes (Fig. 3d,f,g), only transcripts with intermediate expression (bins 3–8) were taken into account (mouse). For HEK293T data, bins 4–9 were used, and for C643 data, bins 5–10 were used. For each bin, the difference of m6A levels of a chromosome relative to all chromosomes was calculated. For this, the mean number of m6A sites on transcripts of the chromosome was divided by the mean number of m6A sites on transcripts of all chromosomes in the given bin (for example, orange dots (X chromosome) over gray dots (all transcripts) in Figure 3b). This resulted in a fold change of m6A sites of each chromosome over all chromosomes for each of the six considered bins (Extended Data Fig. 6d). For comparison with other chromosomes (Fig. 3d,f,g), the mean fold change per chromosome over all expression bins was calculated (Extended Data Fig. 6d, red dot).

Control for transcript-length biases

To exclude biases from different transcript lengths, we repeated the analysis using only m6A sites within a 201-nt window (−50 nt to +150 nt) around the stop codon, in which a large fraction of m6A sites accumulate23. To obtain stop codon positions, transcript annotations from GENCODE63 release M23 were filtered for the following parameters: transcript support level ≤ 3, level ≤ 2, and the presence of a Consensus Coding Sequence (CCDS) ID (ccdsid). If more than one transcript per gene remained, the longer isoform was chosen. Repeating the analyses with this subset, as described above, supported our observation that X-chromosomal transcripts harbor fewer m6A sites without being influenced by differences in transcript lengths (Extended Data Fig. 6e).

Subsampling of transcripts in expression bins

To account for potential biases from different numbers of transcripts in the expression bins for each chromosome, we randomly picked 30 genes for each expression bin (using bins 3–5, 90 genes in total) and calculated the fold change of m6A content on transcripts for each chromosome compared with all other chromosomes, as described above. The procedure was repeated 100 times. The distribution of resulting fold change values supports that X-chromosomal transcripts harbor fewer m6A sites, regardless of the number of transcripts considered (Extended Data Fig. 6f).

Statistical analysis of m6A sites in transcripts

See Supplementary Methods and Supplementary Table 6.

Analysis of published m6A-seq2 data

Published m6A-seq2 data for wild-type and Mettl3 KO mESCs were retrieved from ref. 36. We used the ‘gene index,’ that is, the ratio of m6A IP values over IP for whole genes, as a measure of the transcripts methylation level, as described in ref. 36 (Fig. 3e). Chromosome locations of the genes (n = 6,278) were assigned using the provided gene name in the R/Bioconductor package biomaRt in the R environment72,73.

DRACH motif analysesGGACH motifs in mouse transcripts

Mouse transcript annotations from GENCODE63 release M23 were filtered for the following parameters: transcript support level ≤ 3, level ≤ 2, and the presence of a CCDS ID. If more than one transcript annotation remained for a gene, the longest transcript was chosen. Different transcript regions (3′ UTR, 5′ UTR, CDS) were grouped per gene, and GGACH motifs were counted per base pair in different transcript regions, for example, the sum of GGACH motifs in CDS fragments of a given gene was divided by sum of CDS fragment lengths.

GGACH motifs in chicken, opossum, and human orthologs

Orthologs of mouse genes in chicken (Gallus gallus), human (Homo sapiens), and opossum (Monodelphis domestica) were retrieved from the orthologous matrix (OMA) browser74 (accessed on 21 March 2022, for opossum 28 July 22). Only one-to-one orthologs were kept. Genes were filtered to have orthologs in all three species (n = 6,520). Then, numbers of GGACH motifs per base pair of all protein-coding exons were quantified on the basis of GENCODE annotation (release 31)63 for human and ENSEMBL annotation (release 107, genome assembly GRCg6a)75 for chicken and opossum annotation (ASM229v1). GGACH motifs per base pair were quantified and visualized as described above.

Estimation of methylation levels

See Supplementary Methods.

GGACH in gene sets from literature

Independently evolved gene sets and genes with or without an ortholog on the human X chromosome were taken from ref. 39. Escaper genes were taken from ref. 16. Testis-specific genes were taken from ref. 5. Genes from the X-addedXAR and XCR were annotated by identifying X-chromosomal genes in mouse with the location of chicken orthologs on chromosome 1 (XAR) and chromosome 4 (XCR).

ChIP–seq analysis

ChIP–seq peaks were obtained from ref. 37. The numbers of peaks per chromosome were divided by chromosome lengths. To calculate the peak ratio per chromosome compared with all other chromosomes, the normalized peak number per chromosome was divided by the median peak number of all chromosomes.

GO analysis

GO term enrichment was performed using the enrichGO function of clusterProfiler76 (v.4.2.2). Cellular components (ont = ’CC’) were enriched using a P value cut-off of 0.01 and a q value cut-off of 0.05, and P values were corrected using Benjamini–Hochberg correction (pAdjustMethod = ‘BH’).

DNA-seq to determine copy number variation

See Supplementary Methods.

Statistics and reproducibility

All statistical analyses were performed using R. All boxplots in this study are defined as follows: boxes represent quartiles, center lines denote medians, and whiskers extend to most extreme values within 1.5 × interquartile range. All statistical tests performed in this study were two-tailed. All indicated replicate numbers refer to independent biological replicates. No statistical method was used to predetermine sample size. The experiments were not randomized. No data were excluded from the analysis, unless stated otherwise. The investigators were not blinded during allocation in experiments or to outcome assessment.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

留言 (0)

沒有登入
gif