Massively parallel genomic perturbations with multi-target CRISPR interrogates Cas9 activity and DNA repair at endogenous sites

SpCas9 purification

SpCas9 purification was done using BL21-CodonPlus (DE3)-RIL competent cells (Agilent Technologies 230245) that were transformed with Cas9 plasmid (Addgene, #67881). Bacteria were grown in 1 L of LB medium, induced with isopropyl-β-d-thiogalactoside overnight and then lysed. The supernatant was clarified and then purified using Ni-NTA beads. A detailed description can be found in ref. 56.

Cell culture

HEK293T cells (ATCC® CRL-3216) and HeLa (ATCC CCL-2) cells were cultured at 37 °C under 5% CO2 in Dulbecco’s modified Eagle’s medium (DMEM, Corning) supplemented with 10% FBS (Clontech), 100 units/mL penicillin and 100 µg ml−1 streptomycin (DMEM complete). Cells were tested every month for mycoplasma.

A human iPSC, WTC11 cell line57 was used for all iPSC experiments in this study. We followed the guidelines of Johns Hopkins Medical Institute for the use of this human iPSC line. Briefly, frozen WTC11 cells were first thawed in 37 °C water bath and washed in Essential 8 Medium (E8; Thermo Fisher Scientific, #A1517001) by centrifugation. After resuspension, WTC cells were plated onto a 6 cm cell culture dish pre-coated with human embryonic cell-qualified Matrigel (1:100 dilution, Corning, #354277). Plate coating should be performed for at least 2 h. Subsequently, 10 µM ROCK inhibitor (Y-27632; STEMCELL, #72308) was supplemented into the E8 medium to promote cell growth and survival. For subculture, WTC11 cells were dissociated from the plate using accutase (Sigma, #A6964) and passaged every 2 days. WTC11 cells were maintained in an incubator at 37 °C with 5% CO2.

Electroporation of Cas9 ribonucleoprotein

A Cas9:mgRNA ribonucleprotein was assembled and electroporated into HEK293T or WTC-11 iPSC cells using 4D-Nucleofector Kits (Lonza, SF Cell Line kit for HEK293 and P3 Primary Cell kit for WTC11) following the manufacturer’s instruction. Oligos used for trans-activating CRISPR RNA (tracrRNA) and CRISPR RNAs (crRNAs) are presented in Supplementary Table 2. More details can be found in ref. 56.

Chromatin immunoprecipitation sequencing

The ChIP protocol was adapted from previous literature28. Oligonucleotide sequences for library preparation are in Supplementary Table 3. A detailed protocol can be found in ref. 56. Briefly, protein A beads were washed twice using BSA buffer and incubated with the antibody for 1–3 h with rotation. Bead–antibody mixtures were washed twice with BSA buffer right before ChIP. Cells were collected and fixed with formaldehyde (1% final) at room temperature. The reaction was quenched using glycine (130 mM final). Cells were then lysed sequentially using three different buffers, sonicated and spun down. The supernatant was collected, and the bead–antibody mixture was added. The ChIP reaction incubated overnight. Bead mixtures were then washed on a magnet seven times, resuspended in reverse crosslink buffer and incubated at 65 °C for 6+ hours. After proteinase K and RNAse A treatments, the DNA was column purified. To prepare ChIP–seq libraries, we performed end repair/dA-tailing reaction, followed by adapter ligation and PCR using PE_i5 and PE_i7XX primer pairs. Final DNA was purified using AMPure beads, quantified via Qubit, pooled and sequenced on a NextSeq 500 (Illumina).

Genome-wide DSB detection with BLISS

The BLISS protocol was adapted from previous literature39. All oligonucleotide sequences are provided in Supplementary Table 4. A detailed protocol can be found in ref. 56. In short, BLISS adapters were annealed and phosphorylated RA3 oligonucleotides were adenylated. In total, 400,000 cells were seeded into a 24-well plate for each reaction, washed once with PBS, fixed with 4% paraformaldehyde for 10 min, then washed three times with PBS. Cells were then subjected to a first round of lysis, followed by a PBS wash, a second round of lysis and two PBS washes. Cells were then washed twice with CutSmart Buffer (NEB), and subjected to DNA end-blunting reaction. Cells were then washed twice with CutSmart Buffer followed by adenylation of DNA ends. Cells were washed twice with CutSmart Buffer and with T4 Ligase Buffer, followed by in situ adapter ligation. Samples were then washed four times with high-salt buffer to remove unligated adapters. DNA was extracted by adding extraction buffer and proteinase K, incubating at 55 °C overnight and column purifying DNA the day after. DNA was then sonicated, in vitro transcribed and purified. RA3 adapter was ligated to the purified RNA, and the product was purified. Samples were reverse transcribed and PCR amplified, and the final DNA was purified using AMPure beads. Samples were pooled, quantified with QuBit, Bioanalyzer and qPCR, then sequenced on a NextSeq 500 using high-output paired sequencing, with 64 bp for read 1 and 36 bp for read 2. Only the subset of reads with the correctly matching 13 bp constant adapter region (CGCCATCACGCCT) in read 1 was used for subsequent analysis.

Measurements of mutations at mgRNA targets

A PiggyBac system was used to transpose HeLa cells with a vector carrying Cas9 under the control of a Tet-On inducible promoter and a puromycin resistance gene. Two days after transposition, clonal cell lines were isolated and grown in presence of 2 μg ml−1 of puromycin. Vectors carrying 10-target or 20-target mgRNAs were made by cloning forward and reverse mgRNA oligos (carrying respectively a 5′-CACCG and a 5′-CAAA and 3′-C overhang; Supplementary Table 5) into the LentiGuide-Hygro plasmid (Addgene #139462). Plasmid was digested using BsmBI-v2 (NEB, #R0739), gel-extracted and then ligated overnight with the pre-annealed phosphorylated forward and reverse mgRNA oligos. Cells (NEB, #C2987) were transformed with the ligation product and plated following the manufacturer’s instructions. The following day, individual colonies were selected and grown in selection media; plasmids were purified the next day using QIAprep Kit (Qiagen, #27106). Correct insertion of the mgRNA was verified via Sanger sequencing. For lentivirus production, Lenti-X 293T cells (takarabio, #632180) were grown in 10 cm dishes up to ~70% confluency. Then, 5.25 μg of transfer plasmid was mixed with 0.75 μg of pMD2.G (Addgene, #12259) and 1 μg of psPAX2 (Addgene, #12260), and with 21 μl of TransIT-Lenti (Mirus, #6603). The mixture was incubated for ~15 min and added dropwise to the cells. The viral supernatant was collected at 36 h, 48 h and 60 h, and filtered and concentrated using Lenti-X Concentrator (takarabio, #631232), according to the manufacturer’s instructions. Doxycycline (Dox)-inducible Cas9 monoclonal cells were grown to ~60 % confluency in six-well plates. Cells were exposed to virus carrying mgRNA (~0.3–0.5 multiplicity of infection) and 8 μg ml−1 polybrene for 24 h. Two days after infection, cells were exposed to 100 μg ml−1 hygromycin and kept under such selection conditions for all subsequent experiments. Death of half of the cells confirmed successful plasmid integration at the estimated multiplicity of infection. An initial set of stably transduced cells were collected before Dox addition as timepoint zero. Cells were then grown in 24-well plates under exposure to 2 μg ml−1 of Dox. At different timepoints after induction, a number of cells were collected during passaging and their gDNA was extracted. For the ten-target mgRNA, a no-Dox control experiment was performed in parallel.

gDNA was extracted from using Qiagen DNeasy kit (Qiagen, #69506), eluted in 60 μl of elution buffer and quantified using QuBit (Thermo). One nanogram of gDNA was amplified via three PCRs: two nested PCRs to amplify the target region and a third, indexing PCR to attach the NGS adapters and indices. PCR-1 was run to 20 cycles using the primers presented in Supplementary Table 6. One microlitre of 1:10 dilution of unpurified PCR-1 product was used for PRC-2, which was run to 20 cycles using the primers presented in Supplementary Table 7. The PCR-2 product was purified using 1× volume of AMPure XP beads (Beckman Coulter) and eluted in 15 μl of IDTE buffer (IDT DNA). One microlitre of this product was used for PCR-3, which was run to seven cycles using the primers from Supplementary Table 8. The final product was purified using 0.8× volume of AMPure XP beads, eluted in 15 μl of IDTE and quantified using QuBit. Products from different samples were pooled and sequenced using a MiSeq (Illumina). We found conditions for pooling primers from different targets that yielded a balanced representation of all the sequenced targets among the NGS reads. For the ten-target mgRNA, we pooled all the PCR-1 primers and all the PCR-2 primers in equimolar amounts to a final concentration of 5 μM per oligo. For the 20-target mgRNA, we made three sets of primers per PCR: set 1 with targets 2–6, set 2 with targets 8–11 and set 3 with targets 1, 7 and 12. Targets were then de-multiplexed during the data analysis (see below).

Determining mutation levels and mutation outcomes of mgRNAs

To determine the mutation levels of the different mgRNA targets, we first de-multiplexed these targets (which were amplified in a multiplexed fashion) by aligning the first 50 bp of each PE read to the genome. A given read was considered to contain an mgRNA target if the PE alignment fell within a window of 1,000 bp from the expected genomic location of the target. A mutation was called if the intact theoretical protospacer sequence was not found in the read.

For classification of the mgRNA target mutations, we defined for each target site two key sequences that were, respectively, 20 bp upstream and downstream of the expected genomic location of the cut site. For each read aligning to a target site, these two key sequences were identified and the distance between them was computed. Reads with distances shorter than the expected value were classified as deletions, while reads with distances longer than expected were classified as deletions. Reads with the expected distance between the key sequences but with mutations in the protospacer were classified as single-nucleotide variants (SNVs).

ATAC–seq

ATAC–seq was performed following the Omni-ATAC protocol58 using the amplification protocol and primers described in ref. 59. Primers are also presented in Supplementary Table 9. A detailed protocol can be found in ref. 56. Cells were washed with PBS, collected via scraping and counted. A total of 50,000 cells were used for ATAC. Collected cells were then pelleted, the supernatant was removed and the cells were resuspended in 50 µl of cold lysis buffer, gently mixed and incubated on ice for 3 min. One millilitre of wash buffer was then added and gently mixed. Nuclei were then pelleted, resuspended in 50 µl of transposition reaction and incubated at 37 °C for 30 min. Transposed DNA was column purified and eluted in 21 µl of EB. Samples were pre-amplified, followed by qPCR to determine the number of cycles needed for final amplification (one-third of saturation). Final DNA was purified using AMPure beads and eluted in 32 μl IDTE. Final libraries were quantified using 2% agarose gel, pooled, quantified with QuBit, Bioanalyzer and qPCR, then sequenced on a NovaSeq 500 (Illumina) using paired 2 × 50 bp reads.

CRISPR activation and deactivation

The special cgRNA or pcRNAs were used in the place of normal crRNAs when complexed with tracrRNA. For activation, Cas9/cgRNA was first electroporated into cells, plated onto 12-well plates, then incubated for 12 h to allow stable Cas9 binding but not cleavage. Next, cells were exposed to 1 min of 365 nm light exposure from a handheld blacklight (https://www.amazon.com/JAXMAN-Ultraviolet-365nm-Detector-Flashlight/dp/B06XW7S1CS/). Either one, three or six flashlights were used at once. When multiple flashlights are used, they are conveniently held together using a 3D-printed flashlight holder. (https://github.com/rogerzou/chipseq_pcRNA/blob/master/Jaxman_LED_flashlight_holder_design/files/8zeFECPViSo.stl). Samples were collected without light exposure, or 10 m and 30 m after light exposure.

For deactivation, Cas9/pcRNA was first electroporated into cells, plated onto 12-well plates, incubated for 2 h, then exposed to light of the same dose. Samples were collected during the time of light exposure, or at 1 h, 2 h and 4 h after light exposure.

Immunofluorescence microscopy of 53BP1 foci after multi-target Cas9 activation

The number of endogenous 53BP1 foci in cells was evaluated through immunofluorescence microscopy. One hour after Cas9:cgRNA electroporation, we illuminated the cell samples with 365 nm light for 30 s to trigger Cas9 cleavage. The samples were fixed with 4% of paraformaldehyde in PBS for 10 min at different times (0 min, 10 min, 30 min, 1 h and 3 h) and quenched with glycine in PBS (final of 0.1 M) for 10 min. After rinsing with PBS, 0.5% Triton-X was used to permeabilize cell membrane for 10 min. To passivate the sample for 1 h at room temperature, 2% w/v BSA in PBS was used. Anti-53BP1 antibody (Novus Biological, NB100-304) was diluted 1:1,000 in PBS and added into the chamber. After 1 h incubation, primary antibody was removed and the sample was washed three times with PBS. Alexa647 (Thermo Fisher Scientific, A-21235) conjugated secondary antibody was diluted in 1:1,000 and applied to the sample for 1 h. Finally, the sample was rinsed three times and mounted with Prolong Diamond mounting medium (Thermo Fisher Scientific) overnight. We imaged all cell samples using Nikon Ti-E fluorescence microscope equipped with Hamamatsu CMOS camera and an objective of 40× magnification. Cell samples were scanned in z-stack with a total depth of 5 μm such that all 53BP1 foci within the cell nuclei (DAPI) were captured. Three-dimensional image datasets were first processed into 2D datasets in FIJI using maximum intensity projection. The number of 53BP1 foci per nuclei was analysed with a custom-built CellProfiler3 pipeline.

Discovery and characterization of mgRNA sequences

Starting from a 280 bp SINE sequence, for all 20 bp substrings in both the forward the reverse complement direction, we obtained all 20 bp sequences with up to three mismatches from template restricted to the nine most PAM-proximal nucleotides. GC content was restricted to 40–70%. This resulted in 75,626 unique target sequences. To determine the number alignments for each target, we outputted each gRNA + PAM into a FASTA file and ran bowtie2 with ‘-k 1000’ mode, which searches up to 1,000 alignments for each line in the FASTA, that is, each target sequence.

bowtie2 -k 1000 -f -x [path to genome] -U [path to input FASTA file] -S [path to output SAM file]

We iterated through all alignments (up to 1,000) for each gRNA, then determined whether each alignment was within a RefSeq gene annotation and the ChromHMM epigenetic labelling60. As HEK293T ChromHMM was not available, we curated ChromHMM annotations from A549 (E114), GM12878 (E116), HeLa-S3 (E117) and K562 (E123), and the final ChromHMM annotation for each target was the consensus of these four annotations. Annotation data were obtained from https://egg2.wustl.edu/roadmap/web_portal/index.html.

Ambiguous read proportions from simulated ChIP–seq reads

For gRNA with 100–300 on-target sites in the genome, we simulated 100 PE 200–600-bp-long (uniform distribution) sequencing reads. The reads were randomly chosen to either span the cut site, reside PAM-distal or reside PAM-proximal to the cut. For PAM-distal or PAM-proximal reads, the distance from the edge of the DNA to the cut site was drawn from an exponential distribution. Both 2 × 36 PE reads and 2 × 75 PE reads were simulated.

The PE reads were outputted to FASTA files (read 1 and read 2), and bowtie2 was used to determine up to ten alignments for each simulated read pair:

bowtie2 -f -p 9 –local -k 10 -X 1000 –no-mixed –no-discordant -x [path to genome] -1 [path to read1] -2 [path to read2] -S [path to output SAM]

The code subsequently determines whether the original position of the read pairs matches the best alignment based on bowtie2, and whether this best alignment has the uniquely best alignment score. The proportion of reads that satisfy these requirements represent the proportion of uniquely best alignments. The proportion of ambiguous alignments is 1 minus this value.

Ambiguous read proportions from real ChIP–seq reads

We used all dCas9 binding positions for analysis. For each binding position, we converted PE ChIP–seq reads found within a specified window width centred at the Cas9 binding site into FASTA read 1 and read 2 file formats. Then the section ‘Ambiguous read proportions from simulated ChIP–seq reads’ was followed, starting with use of bowtie2. Window widths of 1500 bp were used for Cas9 ChIP–seq, and 2,500 bp for MRE11.

Nucleotide composition analysis of region surrounding gRNA on-target sites

The local genomic sequences for each expected on-target site for ‘CT’, ‘GG’ and ‘TA’ gRNAs were obtained, then aligned by the Cas9 cut site (PAM oriented downstream of the cut). At each base-pair position relative to the cut site, the nucleotide was tallied and/or displayed. This analysis was performed ±500 bp from cut sites.

General data pre-processing for ChIP–seq, BLISS and ATAC–seq

Reads were demultiplexed after sequencing using bcl2fastq. PE reads were aligned to hg19 or hg38 using bowtie2. Samtools was used to filter for mapping quality ≥25, remove singleton reads, convert to BAM format, remove potential PCR duplicates and index reads.

Calculating enrichment for MRE11, Cas9, γH2AX and 53BP1 ChIP–seq

We determined the reads per million (RPM) in specific window widths centred at all cut sites. We used a window of 200 kb for both 53BP1 and γH2AX, 2,500 bp for MRE11 and 1,500 bp for Cas9. For MRE11 and Cas9, additional code analyses the exact read positions and determines if a PE sequencing read fragment spans the cut site (‘span’), or if a sequenced DNA fragment begins within 5 bp from the cut site (‘abut’). To determine ‘dist + 4’, ‘dist − 4’, ‘prox + 4’, or ‘prox − 4’, we analysed the DNA fragment position according to the rules specified for these read species.

Enrichment profiles for MRE11 and Cas9 ChIP–seq (also spanning ATAC–seq) at base-pair resolution

At each genomic position in a window centred at each cut site, each PE read within this window is retrieved. The number of PE reads that map to each base pair is tallied. The middle region of PE read fragment that is not likely to be sequenced is also included in this tally. We used a window of 2,500 bp for MRE11, 1,500 bp for Cas9 and 3 kb for ATAC–seq.

Enrichment profiles for γH2AX, 53BP1 and ATAC–seq at window widths

To obtain profiles of γH2AX and 53BP1, we calculated the number of sequencing reads (RPM) in each 10 kb window from the cut site, extending to 2 mb both upstream and downstream of cut sites. For ATAC–seq, we calculated RPM in a 4 bp sliding window incremented every 1 bp, extending to 1.5 kb both upstream and downstream of cut sites.

To determine wider levels of potential ATAC-seq enrichment, we used the same function to calculate RPM in each 1 kb window from the cut site, extending to 50 kb both upstream and downstream of cut sites.

Genome-wide Cas9 binding from dCas9 ChIP–seq

We used macs2 to find all dCas9 binding peaks, using a no-Cas9 sample for negative control, via the command:

macs2 callpeak -t [path/to/sample] -c [path/to/negctrl]–outdir [path/to/output] --name [name/of/output] -f BAMPE -g hs

Next, for each macs2 discovered peak with fold enrichment ≥4, a custom algorithm attempts to identify the target sequence position for Cas9 binding or cleavage that best explains the peak. This may be problematic for target sites with multiple mismatches. We use the following assumption to simplify the problem: (1) there is only one correct Cas9 binding/cleavage sequence within the 400 bp window of the macs2-predicted peak centre, and (2) the correct Cas9 binding/cleavage sequence is one with the fewest mismatches.

Enrichment measurements of epigenetic markers

Datasets used are indicated in ‘Data availability’. For enrichment, we use a 50 kb radius for RNA-seq, H3K4me1, H3K4me3, H3K9me3, H3K27ac and H3K36me3, a 50 bp radius for DNase I and ATAC–seq, and a 10 bp radius for micrococcal nuclease digestion with deep sequencing (MNase–seq). The number of reads that are found in each specified window width is outputted, normalized by the total RPM.

Machine learning model

We used the random forest regressor from scikit-learn61. For mismatch information, features were obtained from one-hot encoding of mismatch state at each position along the protospacer. For epigenetic information, the RPM enrichment was directly used as features. The predicted output is the level of dCas9 binding or MRE11 enrichment, also measured as RPM. The machine learning model was trained using five-fold cross-validation on a training dataset composed of a random 70% of the total dataset. The remaining 30% was used for evaluation and featured in these figures comparing predicted versus actual values.

ATAC–seq read length distributions

For each PE ATAC–seq read fragment in a 3 kb window centred at all Cas9 on-target sites, its length was recorded. The distribution of DNA length across all target sites, along with exponential decay curve fitting, was computed in Microsoft Excel.

Statistics and reproducibility

ChIP–seq, ATAC–seq, amplicon sequencing and BLISS experiments were performed in biological replicates. No statistical method was used to pre-determine sample size. No data were excluded from the analyses. The experiments were not randomized. The investigators were not blinded to allocation during experiments and outcome assessment.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

留言 (0)

沒有登入
gif