scRepli-Seq: A Powerful Tool to Study Replication Timing and Genome Instability

Advances in “omics” technology have made it possible to study a wide range of cellular phenomena at the single-cell level. Recently, we developed single-cell DNA replication sequencing (scRepli-seq) that measures replication timing (RT) by copy number differences between replicated and unreplicated genomic DNA in replicating single mammalian cells. This method has been used to reveal previously unrecognized static and dynamic natures of several hundred kilobases to a few megabases-scale chromosomal units called RT domains. Because RT domains are highly correlated to A/B compartments detected by Hi-C, scRepli-seq data can be used to predict the 3D organization of the genome in the nuclear space. scRepli-seq, which essentially measures the copy number, can also be applied to study genome instability.

© 2022 S. Karger AG, Basel

Introduction

DNA replication is a fundamental cellular process for proper duplication of genetic information. Early studies in mammalian cells revealed a temporal order of replication along chromosomes [Berezney et al., 2000]. Replication takes place as a large (often Mb in size) chromosomal unit, and inactive X chromosomes in female cells replicate later than their active counterparts [Taylor, 1960; Huberman and Riggs, 1968; Berezney et al., 2000]. However, these observations were carried out mainly by microscopic approaches, which only provided a low-resolution view of genome replication. This changed with the advent of genome-wide analysis methods established in the late 2000s, allowing precise mapping of the temporal order of replication against genomic sequences [Takebayashi et al., 2017; Hulke et al., 2020; Vouzas and Gilbert 2021]. One such method called E/L Repli-seq employs immunoprecipitation of 5-bromo-2ʹ-deoxyuridine (BrdU)-labeled replicating DNA from FACS-sorted early and late S-phase cells, which determines the enrichment ratio between them (early vs. late) along chromosomes using microarray or next-generation sequencing (NGS) technology (Fig. 1a) [Ryba et al., 2011; Marchal et al., 2018; Takebayashi et al., 2018; Zhao et al., 2020; Hayakawa et al., 2021; Rivera-Mulia et al., 2022].

Fig. 1.

Replication timing (RT) domain organization revealed by genome-wide studies. a Experimental overview of E/L Repli-seq analysis. BrdU-labeled cells are sorted into early and late S-phase fractions by FACS, and isolated BrdU-substituted DNA samples from each fraction are subjected to whole-genome amplification followed by next-generation sequencing (NGS). b RT profiles of chromosome 6 in mouse embryonic stem cells (ESCs). The chromosomal position is shown on the x axis and relative enrichment of early- and late-replicating DNA (log2 early/late) is shown on the y axis. Mapped NGS reads from early and late S-phase cells counted separately in sliding windows of 300 kb at 40-kb intervals were used to generate log2 RT data. Early and late RT domains are indicated as blue and yellow regions, respectively. c RT profiles of chromosome 16 before and after mouse ESC differentiation (upper two panels) and their comparison with A/B compartment profiles from Hi-C data (lower two panels). A and B compartments are indicated as green and red regions, respectively. Early-to-late (EtoL) and late-to-early (LtoE) RT changes correlate with A-to-B and B-to-A compartment changes. The data were obtained from GSE108556 and GSE113985.

/WebMaterial/ShowPic/1471778

These novel methods delineate the genome-wide distribution of Mb-sized replication units called early and late replication timing (RT) domains (Fig. 1b) and reveal the cell type-specific distribution of RT domains along chromosomes [Hiratani et al., 2008, 2010; Hansen et al., 2010; Yaffe et al., 2010; Koren et al., 2014; Rivera-Mulia et al., 2015]. Although changes in RT occur in some cell type-specific gene loci, such as β-globin [Dhar et al., 1988; Hiratani et al., 2004], the finding that approximately 50% of the genome undergoes RT changes as revealed by genome-wide studies was unexpected and surprising [Hiratani et al., 2010; Ryba et al., 2010]. On the basis of the RT domain profile, it is feasible to distinguish mouse embryonic stem cells from a closely associated cell type called epiblast stem cells [Ryba et al., 2010]. During cellular reprogramming, reorganization of RT domains from somatic to pluripotent stem cell patterns is induced [Hiratani et al., 2008; Takebayashi et al., 2013]. Several RT domains harboring pluripotency-associated genes appear to be particularly resistant to reprogramming back toward the pluripotent state because these domains do not acquire pluripotency-specific early RT in partially reprogrammed induced pluripotent stem cells [Hiratani et al., 2010]. However, probably the most important finding about RT domains is that early and late RT domains are largely consistent with A and B compartments revealed by a chromosome conformation capture technique (Fig. 1c) [Ryba et al., 2010; Takebayashi et al., 2012a; Miura et al., 2019]. Reorganization of RT domains during cell differentiation is closely linked to changes in A/B compartments. Because A and B compartments are associated with transcriptionally active and silent chromatin modifications [Lieberman-Aiden et al., 2009], reorganization of RT domains is thought to reflect changes in transcriptional competence. The hypothesis that RT domains represent a fundamental unit of the mammalian chromosome structure is quite attractive and is supported by the close correlation between late RT domains and lamina-associated domains representing chromosomal units that interact with the nuclear lamina [Guelen et al., 2008; Takebayashi et al., 2012b; Miura et al., 2019].

It is important to note that genome-wide RT assays generally require many S-phase cells as starting material (conventional E/L Repli-seq requires several thousand cells for effective BrdU-immunoprecipitation). Therefore, it is greatly possible that biologically relevant phenomena are hidden in bulk measurements. For example, cell population-based assays have revealed the existence of Mb-sized RT domains and their periodic distribution along chromosome arms, but it is unclear how much these averaged views reflect the regulation of RT domains in each individual cell. To overcome this situation, we and others have recently developed novel techniques to map genome-wide RT domains in single mammalian cells [Dileep and Gilbert, 2018; Takahashi et al., 2019; Miura et al., 2020; Bartlett et al., 2022]. In this review, we outline single-cell DNA replication sequencing (scRepli-seq) developed by our group and discuss its applications in DNA replication studies and beyond.

Outline of the scRepli-Seq Procedure

We have published a step-by-step scRepli-seq experimental protocol including computational pipelines for quality control and quantification of data [Miura et al., 2020]. Because the protocol provides sufficient details for its broad and easy use in external laboratories, here, we summarize the characteristic features of the method and provide some examples of relevant analyses.

Simple Experimental Workflow

The procedure consists of (1) single cell collection (either live or ethanol-fixed cells are acceptable), (2) whole-genome amplification (WGA) from a single cell, and (3) library preparation and sequencing by NGS. To delineate RT domains, it is ideal to use mid-S-phase cells in which each early RT domain has 2 copies of DNA, whereas each late RT domain has a single copy, and FACS-mediated sorting of mid-S-phase cells by DNA content is the first choice for this purpose (Fig. 2a). WGA of a single cell is a prerequisite to obtain enough DNA for library preparation, and uniform amplification across the entire genome is critical for the success of copy number analysis. The original protocol employs a SeqPlex Enhanced DNA Amplification Kit that amplifies DNA by degenerate oligonucleotide-primed (DOP)-PCR. Library preparation from the amplified DNA for the Illumina sequencing platform can be performed using commercially available kits such as the KAPA LTP Library Preparation Kit and NEBNext Ultra II DNA Library Prep Kit. Several G1-phase cells also need to be processed for scRepli-seq, because they serve as the control to normalize amplification bias as well as mappability bias in the data analysis step.

Fig. 2.

Replication timing (RT) domain organization in single cells revealed by scRepli-seq. a Experimental overview of scRepli-seq analysis. Mid-S-phase cells are collected by FACS, and genomic DNA samples isolated from single cells are subjected to whole-genome amplification followed by next-generation sequencing (NGS). Using mapped reads, copy number differences that arise between replicated and unreplicated DNA in mid-S-phase cells are determined to map RT domains. b RT profiling by copy number analysis. RT profiles of chromosome 11 from 3 single human retinal pigment epithelium (RPE) cells. Mapped NGS reads of mid-S-phase cells were counted in sliding windows of 200 kb at 40-kb intervals. Mappability was corrected using G1 samples, and the numbers were further divided by the median read count (median centering) to generate a log2 [counted reads/genome-wide median of counted reads] (designated as “log2 median”) RT plot. Top panel shows a population RT profile (E/L Repli-seq data) from RPE cells. The data were obtained from GSE108556.

/WebMaterial/ShowPic/1471776Robust Data Analysis Pipeline

Another important feature is that the scRepli-seq protocol includes an extensive computational approach for quality control and data quantification. For mainly visual inspection purposes, mapped NGS reads from each scRepli-seq sample are counted in sliding 200-kb windows at 40-kb intervals and divided by their genome-wide median (represented as log2 [counted reads/genome-wide median of counted reads] values in Fig. 2 and 3, and hereafter abbreviated as “log2 median”). For quantitative analysis such as clustering and cell-to-cell heterogeneity detection, NGS reads are counted in non-overlapping 200-kb windows and used to determine the copy number state (replicated or unreplicated) at each genomic bin (represented as “binarized” in Fig. 3). Low quality (e.g., low read coverage) samples due to failure in WGA and contamination of non-mid-S-phase samples because of the technical limitation of cell sorting are the most likely problems and significantly affect the outcome of experiments. Such problematic samples can be eliminated by computational filtering. For example, the percentage replication score determined by the ratio of replicated/unreplicated bins from binarized data is used to determine what percentage of the genome is replicated in each collected single cell. Mid-S-phase cells are defined as cells with percentage replication scores between 40 and 70% (Fig. 3), and cells outside this range can be excluded as non-mid-S-phase cells. Non-mid-S-phase cell and low-quality samples can also be eliminated by the MAD that is the median of the “absolute deviations from the median” and used to estimate the variability of data points around the median value of a dataset. For each single-cell sample, a MAD score (ranging from 0 to 1) is calculated using NGS reads counted in non-overlapping 200-kb genomic bins. scRepli-seq datasets from mid-S-phase cells have relatively high variability in read numbers across bins (MAD scores of >0.4 and <0.8), whereas those from non-mid-S-phase cell and low-quality samples have low variability (MAD score of <0.3).

Fig. 3.

Haplotype-resolved replication timing (RT) domain mapping by scRepli-seq. CBMS1 ESCs derived from a cross between female CBA (Mus musculus domesticus) and male MSM/M (M. musculus molossinus; hereafter MSM) mice allowed us to identify allele-specific reads by SNP information. Using allele-specific reads, copy-number differences between replicated and unreplicated DNA in mid-S-phase cells were determined to map RT domains. Haplotype-resolved single-cell replication profiles of chromosome 5 before and after binarization are shown (n = 31). In log2 median replication profiles before binarization (sliding windows of 200 kb at 40-kb intervals), blue and yellow represent early (RT score: >0) and late (RT score: <0) replication, respectively. In binarized replication profiles (non-overlapping 100-kb windows), blue and yellow represent replicated and unreplicated regions, respectively. The percentage replication score of each cell (the percentage of replicated bins/total bins, indicating the progress of replication in an individual cell) is shown on the right of the corresponding binarized data, and cells are ordered by their percentage replication scores. RT profiles from population analysis are shown on the top for comparison. Regions with asynchronous replication between homologous chromosomes are highlighted by red rectangles.

/WebMaterial/ShowPic/1471774Cost Effectiveness

Only several million NGS reads per scRepli-seq sample are sufficient to generate early/late RT domain profiles, which enables sequencing of pooled libraries from many single-cell samples. This is because RT domains are regulated as large chromosomal units through synchronous firing of multiple replication origins over several hundred kilobases to a few megabases [Berezney et al., 2000] and copy number analysis by read counting in a large genomic bin (200-kb bins for scRepli-seq) is effective with relatively sparse genome coverage. We have previously shown that simply increasing the read count from 4 million to 18 million does not lead to increase in RT profile resolution [Takahashi et al., 2019]. Thus, scRepli-seq is suitable for high-throughput single-cell analysis.

Single-Cell RT Domain Profiling by scRepli-Seq

Figure 2b shows typical scRepli-seq data (log2 median) obtained from single human retinal pigment epithelium (RPE) cells compared with population RT domain data from E/L Repli-seq. The most surprising aspect is that RT domain organization is remarkably conserved among individual cells, and RT domain organization seen in population data is quite similar to that seen in each individual cell [Takahashi et al., 2019]. While RT domains are a highly stable structural unit, extensive computational analysis of single-cell data also revealed that cell-to-cell variation tends to be higher in genomic regions undergoing developmental RT changes [Takahashi et al., 2019]. If SNP information is available for the cell line used, it is feasible to perform haplotype-resolved scRepli-seq that distinguishes maternal and paternal chromosomes within a single cell (Fig. 3) [Takahashi et al., 2019]. Some genomic regions display RT differences between homologous chromosomes, and haplotype-resolved scRepli-seq is sensitive enough to detect such differences (Fig. 3). If applied to differentiated female mammalian cells, the inactive X chromosome is clearly detected as almost entirely late replicating, which is distinct from its active counterpart [Takahashi et al., 2019]. Population analysis has shown that the organization of RT domains is largely unaffected by mutation of several epigenetic modifiers including DNA methyltransferases [Yokochi et al., 2009; Takebayashi et al., 2021]. However, recent scRepli-seq analysis has revealed that cell-to-cell heterogeneity of RT increases in DNA hypomethylated mutant cells, highlighting the effectiveness of the single-cell approach [Du et al., 2021]. Recently, optical mapping or direct sequencing of replication-labeled single-DNA molecules has been used to analyze replication processes such as origin firing and fork progression [Müller et al., 2019; Hennion et al., 2020; Wang et al., 2021]. These approaches allow analysis of intact DNA molecules without any amplification steps, while there is a technical limitation in preparing Mb-sized DNA molecules. scRepli-seq, together with other single molecule approaches, will provide deeper insights into the regulation of genome replication.

Application of scRepli-Seq to Chromosome Instability Analysis

The scRepli-seq methodology relies on DNA copy number measurement, which indicates that it should be directly transferable to study genome instability. The scRepli-seq data analysis pipeline includes a step of selecting multiple control G1 cells with identical karyotypes to normalize data from single mid-S-phase cells as described above. The step is performed using the findCNVs command in AneuFinder with read count data, and karyogram plots are obtained as the output [Bakker et al., 2016; Miura et al., 2020]. Figure 4a (top) shows an example of a karyogram plot from a single RPE G1-phase cell. RPE cells have a normal diploid karyotype except for trisomy of 10q, which is clearly detected in a karyogram plot.

Fig. 4.

Analysis of chromosome instability by scRepli-seq. a Karyotyping of RPE cells using scRepli-seq data. Four G1 cells sorted by FACS based on DNA content were subjected to scRepli-seq and karyogram plots were generated by copy number analysis with AneuFinder (https://github.com/ataudt/aneufinder). A typical example of a karyogram from 1 out of 4 single cell samples is shown at the top. A summary of karyotype analysis in which each row represents a single cell (n = 4) is shown at the bottom. All 4 cells display trisomy of chromosome 10q (indicated as red regions). b Karyotyping of IMR90 cells treated with a low dose of aphidicolin (0.4 μM) for 72 h using scRepli-seq data. Sorted G1 cells were used for the analysis as described above. An example of a karyogram displaying partial monosomy of chromosome 3 is shown at the top. A summary of karyotype analysis in which each row represents a single cell (n = 45) is shown at the bottom. Some cells had partial loss of chromosome 3q (monosomy shown as purple regions). Using log2 median values (bottom left, sliding windows of 200 kb at 40-kb intervals), copy number loss was found to occur in the LSAMP gene locus. It should be noted that both RPE and IMR90 cells are female in origin, and therefore, the Y chromosome is classified as zerosomy (gray).

/WebMaterial/ShowPic/1471772

Common fragile sites (CFSs) are specific genomic loci that are prone to form gaps or breaks under replication stress and are hotspots for chromosome instability [Sarni and Kerem, 2016]. These sites are cytogenetically detected as gaps or breaks on metaphase chromosomes prepared from cells exposed to low doses of replication inhibitors such as aphidicolin (a DNA polymerase alpha inhibitor). Giemsa (G)-banding combined with FISH is often performed to identify chromosomes expressing CFSs in metaphase, which is laborious and has low resolution. Conventional cytogenetic analysis of human fetal lung-derived fibroblasts (IMR90) treated with a low dose of aphidicolin revealed that known CFSs 1p31.1 and 3q13.3 are expressed at a rate of 5.3% and 26.8%, respectively [Maccaroni et al., 2020]. When we performed scRepli-seq of G1-phase IMR90 cells treated with a low dose of aphidicolin, these two CFSs were detected as boundaries of the copy number change at a rate of 2.2 and 11.1%, respectively (purple in Fig. 4b indicates loss of one copy). By carefully examining these boundaries in log2 median data, they overlap with very large genes associated with CFSs (NEGR1 and LSAMP). Intriguingly, the site of boundaries formed on 3q13.3 appear to slightly differ from cell to cell, which may reflect some variability in breakpoints within the LSAMP gene locus. It should be noted that various aneuploidies (both copy number gain and loss) were also detected in addition to the above two CFSs (Fig. 4b). Although conventional cytogenetic analysis is effective to evaluate chromosomal instability in metaphase chromosome spreads, there are technical limitations in analyzing non-dividing (interphase) cells. Because damage to chromosomal DNA leads to cell cycle arrest at interphase, use of scRepli-seq is greatly advantageous by evaluating chromosome instability in interphase cells.

Detection of Copy Number Variation during Placental Development by scRepli-Seq

Trophoblast giant cells (TGCs) derived from mouse extraembryonic tissues have polyploidy through endoreplication cycles, but previous studies have shown that copy number amplification does not occur uniformly across the genome and a subset of genomic regions called underrepresented (UR) domains are particularly resistant to endoreplication activity [Hannibal et al., 2014]. Because this is an average view of many cells, we applied scRepli-seq to detect UR domains in single TGCs (Fig. 5). Similar to late RT domains, UR domains are delineated as genomic regions with a low copy number in log2 median data from scRepli-seq analysis, indicating that UR domain organization is remarkably conserved between individual cells and is well aligned with the previously revealed population UR domain organization (pink highlighted regions in Fig. 5b). High conservation in the UR domain distribution across each single cell suggests the existence of cis-regulatory elements defining UR domain formation, but further studies are necessary to address this point.

Fig. 5.

Analysis of polyploid trophoblast giant cells (TGCs) by scRepli-seq. a Schematic diagram of TGC isolation from mouse extraembryonic tissues. TGCs are isolated by dissecting and trypsinizing extraembryonic tissues of E9.5 embryos. A micromanipulator is used to collect single TGCs, because TGCs are easily distinguishable from other cell types by their size under a microscope. DAPI-stained TGCs in comparison with fetal cells (scale bar, 50 μm) are shown. b In log2 median profiles (sliding windows of 200 kb at 40-kb intervals), blue and yellow represent amplified regions (log2 median values >0) and regions resistant to endoreplication (log2 median values <0), respectively. Pink highlighted regions indicate the location of underrepresented (UR) domains identified by population analysis. Plots below are zoomed to highlight some UR domains.

/WebMaterial/ShowPic/1471770Future Perspective

Because of its technical simplicity and cost efficiency, scRepli-seq is suitable for genome-wide analysis of RT and genome instability. A possible future direction of scRepli-seq methodology is parallel sequencing of DNA and RNA from the same single cell to connect variation in scRepli-seq data with phenotypic variation. Among many available scRNA-seq protocols, it would be important to choose one compatible with scRepli-seq methodology. Another future direction is to improve the resolution of copy number analysis. Considering that the resolution of single-cell copy number analysis is limited to 40–50 kb because of amplification noise of the WGA method (DOP-PCR) used in the scRepli-seq methodology [Chen et al., 2017], it is difficult to detect copy number changes occurring below this resolution, and further substantial improvement in resolution would not be expected by simply increasing the read depth from the current level. DOP-PCR was among the best options for WGA when we developed the scRepli-seq methodology, but the use of other WGA methods with significantly reduced amplification noise would be effective to improve the resolution. One such candidate WGA method is linear amplification via transposon insertion (LIANTI) which has been shown to detect copy number variations in single cells with kilobase resolution [Chen et al., 2017]. Any technique could be soon replaced with a more effective technique. Even if the resolution of scRepli-seq is dramatically improved by novel WGA technologies, the core of the experimental and analytical protocols of scRepli-seq would not be changed significantly. We are still understanding how much copy number variations, including loss or gain of specific chromosomes, contribute to genetic disease and normal development, but we believe that scRepli-seq will be a powerful tool to address this issue.

Acknowledgements

We thank K. Okumura, T. Hayakawa, and R. Suzuki for their help.

Conflict of Interest Statement

The authors have no conflicts of interest directly relevant to the content of this article.

Funding Sources

This work was supported by a Grant-in-Aid for Scientific Research on Innovative Areas (grant number 22H02599 to S.T.) from the Ministry of Education, Culture, Sports, Science and Technology of Japan and a Grant from the Japan Science and Technology Agency (CREST) to S.T.

Author Contributions

M. Sakamoto, S. Hori, A. Yamamoto, T. Yoneda, K. Kuriya, and S.I. Takebayashi all analyzed data, prepared figures, and wrote the manuscript.

Copyright: All rights reserved. No part of this publication may be translated into other languages, reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, microcopying, or by any information storage and retrieval system, without permission in writing from the publisher.
Drug Dosage: The authors and the publisher have exerted every effort to ensure that drug selection and dosage set forth in this text are in accord with current recommendations and practice at the time of publication. However, in view of ongoing research, changes in government regulations, and the constant flow of information relating to drug therapy and drug reactions, the reader is urged to check the package insert for each drug for any changes in indications and dosage and for added warnings and precautions. This is particularly important when the recommended agent is a new and/or infrequently employed drug.
Disclaimer: The statements, opinions and data contained in this publication are solely those of the individual authors and contributors and not of the publishers and the editor(s). The appearance of advertisements or/and product references in the publication is not a warranty, endorsement, or approval of the products or services advertised or of their effectiveness, quality or safety. The publisher and the editor(s) disclaim responsibility for any injury to persons or property resulting from any ideas, methods, instructions or products referred to in the content or advertisements.

留言 (0)

沒有登入
gif