Single-cell epigenomic reconstruction of developmental trajectories from pluripotency in human neural organoid systems

The use of human embryonic stem (ES) cells for the generation of brain organoids was approved by the ethics committee of northwest and central Switzerland (2019-01016) and the Swiss federal office of public health. Under the Swiss Human Research Act, research performed with fully anonymized human specimens does not require an institutional review for research as long as consent was approved in the first place.

Cell and organoid culture

Cell lines used in the study were derived from different sources:

409b2-iCRISPR (female), originally from the RIKEN BRC cell bank, modified by the S. Pääbo laboratory51

HOIK1 (female), HipSci resource52

WIBJ2 (female), HipSci resource52

WTC (GM25256) (male), Coriell Institute53

B7 (female), B. Roska laboratory54

For culturing, cells were grown on Matrigel (Corning, 354277) coated six-well dishes in mTeSR Plus (STEMCELL Technologies, 100-0276) supplemented with penicillin–streptomycin (1:200, Gibco, 15140122). To propagate the cells, they were dissociated with TryplE (Gibco, 12605010) or EDTA in DPBS (final concentration 0.5 mM) (Gibco, 15575020) and kept on Rho-associated protein kinase (ROCK) inhibitor Y-27632 (final concentration 5 μM, STEMCELL Technologies, 72302) for 1 d. Cells were stored in liquid nitrogen in mFreSR (STEMCELL Technologies, 05855) and tested for mycoplasma (Venor GeM Classic, Minerva Biolabs) after each thawing cycle.

To generate brain organoids, cells were grown to a confluency of approximately 50% and dissociated with TryplE. In total, 2,000–3,000 cells were aggregated in 96-well ultra-low attachment plates (Corning, CLS7007) to form embryoid bodies (EBs). We followed an unguided protocol to obtain brain organoids55, with a few modifications. EBs were aggregated and cultured in mTeSR Plus, and neural induction medium was added when the EBs had reached around 400–500 µm (usually day 5). Retinoic acid containing neural differentiation medium was added only from day 40 onward37. Cerebral organoids were grown shaking in 6-cm dishes for up to 8 months. To generate retinal organoids, we applied a protocol that allows the simultaneous aggregation of hundreds of EBs in agarose molds54. After neural induction, these are transferred (1 week) to Matrigel-coated six-well plates and allowed to attach. The neural induction medium used in both protocols is highly similar, allowing comparability of the neuroepithelium stage (both media contain DMEM/F12 (Gibco, 31331‒028), 1× N2 supplement (Gibco, 17502‒048), 1% NEAA solution (Sigma-Aldrich, M7145) and 2 mg ml−1 heparin (Sigma-Aldrich, H3149‒50KU) (1 µl ml−1 for brain organoids); in case of brain organoids, 1% GlutaMAX was added). After 2 weeks, neural differentiation medium was added, and retinal structures were scraped off after 4 weeks.

Drug treatment

We targeted the H3K27me3-reader EED56 (A395, MedChemExpress, HY-101512) that prevents allosteric activation of the PRC2 complex. For A395, concentrations between 1 µM and 10 µM were tested, and H3K27me3 was depleted in bulk experiments at 1 µM as shown by western blot. EED was inhibited from days 0–15 (experiment 1) and days 0–18 (experiment 2), by adding the inhibitor when seeding the EBs. This time window coincides with the formation of the neural epithelium. The medium was changed every other day, and a fresh dose of inhibitors was addded. We treated two different cell lines (HOIK1 and WIBJ2). We used single-nucleotide polymorphisms (SNPs) to assign the cells bioinformatically to each cell line. Full blinding of the experiments on inhibitor-treated organoids was not possible due to the phenotypes evident from the development of organoids.

Preparation of single-cell suspensions

Brain organoids were generated in batches. In each batch, five different stem cell lines (409b2-iCRISPR, B7, WTC, HOIK1 and WIBJ2) were used and dissociated together. The cell lines were demultiplexed using SNPs. For retinal organoids, only B7 was used. We sampled developmental transitions from the pluripotent EB (day 5) and neuroepithelium (day 15) to neuronal differentiation in brain (days 35, 60, 120 and 240) and in retina organoids (days 45 and 85). Organoids were cut into pieces using a scalpel and thoroughly washed with HBSS buffer without Ca2+/Mg2+ (STEMCELL Technologies, 37250).

A papain-based neural dissociation kit (Miltenyi Biotec, 130-092-628) was used to obtain single-cell suspensions. Then, 1,900 µl of pre-warmed buffer X with 50 µl of Enzyme P was added to the organoids and incubated for 15 min at 37 °C. A mix of 20 µl of buffer Y and 10 µl of DNase was added to each sample before tituration (10 times with a p1000 wide bore tip). The samples were then incubated twice for another 10 min and titurated with a p1000 and a p200 in between.

The reaction was stopped with HBSS buffer without Ca2+/Mg2+, and the cells were filtered at 30 µm. After an additional wash, the cells were stored in CryoStor CS10 (STEMCELL Technologies, 07930) or processed for further experiments. For all scCUT&Tag experiments, cells were kept from the same cell suspension to perform scRNA-seq. Cell viabilities were between 80% and 95%.

We performed the data collection on independent organoid batches from different cell lines. We demultiplexed them based on SNPs and treated them, therefore, as replicates. For day 60, we processed independent biological replicates, and, for day 120, we used the same cell suspension as a technical replicate. An overview of all the experiments is shown in Extended Data Fig. 1c. No statistical methods were used to pre-determine sample sizes, but our sample sizes are similar to those reported in previous publications17,18,19. No data points were excluded from the analysis. Organoids for each timepoint were picked blinded. First computational analysis and inspection of the data were performed blinded.

Preparation of single-cell suspensions from human fetal brain

Human fetal brain tissue (19 gw) was obtained after elective pregnancy termination and informed written maternal consent from Advanced Bioscience Resources. The donor consented to the research use of the tissue without restrictions. The tissue was fully anonymized, and all experiments were performed in accordance with relevant guidelines and regulations. The estimated age of the fetus was calculated using clinical information, such as the last menstrual period and anatomical data obtained through ultrasound measurements. Dissected fetal brains were kept in DMEM + antibiotics on wet ice until preparation of the single-cell suspension for up to 48 h. Single-cell suspensions were prepared following the same protocol as for organoids. At this developmental timepoint, the cortex has already expanded and comprises the majority of the cells in the population. We tried to enrich cortical material by separating larger pieces from the material. The resulting single-cell suspensions were cryopreserved until further use. Nuclei suspensions for 10x multiome experiments were prepared following the scCUT&Tag protocol, and the nuclei were processed following the manufacturers’ instructions.

Cloning and purification of Tn5

Plasmids no. 123461 (pA/G-MNase) and no. 124601 (3XFlag-pA-Tn5-Fl) were ordered from Addgene. Protein A and Protein G were amplified using the primer pairs (FZ461_ProtA_rev, FZ462_HindIII_ProtA_fw and FZ459_EcoRI_ProtG_rev, FZ460_ProtG_fw) and fused by polymerase chain rection (PCR) (below). Protein A in the original vector (3×Flag-pA-Tn5-Fl, Addgene, 124601 (ref. 57)) was then replaced with the fusion product through EcoRI and HindIII restriction digest.

The final plasmid was transformed into chemically competent Rosetta cells to express the protein. The bacteria were grown to an optical density at 600 nm (OD600) of 0.4–0.6; expression was induced with 0.25 mM IPTG; and the protein was expressed at 18 °C overnight. Cells were harvested and stored at −80 °C until further processing. The purification was performed on Chitin resin (New England Biolabs, S6651S) as described58, with small modifications. The cells were lysed using the Diagenode Bioraptor Plus at the high setting for 15 cycles, 30 s on/30 s off.

After dialysis and concentration using Amicon Ultra-4 Centrifugal filters (Millipore, UFC803024), the protein was diluted to 50% glycerol final and loaded with adapters before use (below).

FZ459_EcoRI_ProtG_revgaattctttatcgtcatctacggctggcgtcaactcagacgcg

FZ460_ProtG_fwaaaaagctaaacgatgctcaagcaccaaaaacaacttataaattagtcatcaacggg

FZ461_ProtA_revaatttataagttgtttttggtgcttgagcatcgtttagctttttagcttctgc

FZ462_HindIII_ProtA_fwccaagcttaaaagatgacccaagccaaagtgctaacc

FZ444_Tn5MErev[phos]CTGTCTCTTATACACATCT

FZ445_Tn5ME-ATCGTCGGCAGCGTCAGATGTGTATAAGAGACAG

FZ445_Tn5ME-BGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG

scCUT&Tag

Starting from 1.5–3 Mio cells, nuclei were isolated following the 10x Genomics CG000365 Demonstrated Protocol. The 0.1× buffer was used for all experiments, adjusting the final concentration of Digitonin to 0.01% (Thermo Fisher Scientific, BN2006). In general, we followed the bulk CUT&Tag protocol, with a few adjustments57.

After lysis, the nuclei were directly transferred into scCUT&Tag wash buffer (20 mM HEPES (pH 7.5) (Jena Bioscience, CSS-511), 150 mM NaCl (Sigma-Aldrich, S6546), 0.5 mM Spermidine (Sigma-Aldrich, S0266), 1% BSA (Miltenyi Biotec, 30-091-376), 1 mM DTT (Sigma-Aldrich, 10197777001), 5 mM sodium butyrate (Sigma-Aldrich, 303410), Roche Protease Inhibitor (Sigma-Aldrich, 11873580001)) and washed once for 3 min at 300g. The buffer was supplemented with 2 mM EDTA before the antibodies were added (see Supplementary Table 21 for further details). The samples were incubated on a rocking platform at 4 °C overnight.

The next morning, the cells were washed once, and the secondary antibody was added to the suspension for 1 h at 20 °C on a rocking platform or Eppendorf thermomixer. The cells were then washed twice and transferred into scCUT&Tag med buffer (20 mM HEPES (pH 7.5) (Jena Bioscience, CSS-511), 300 mM NaCl (Sigma-Aldrich, S6546), 0.5 mM Spermidine (Sigma-Aldrich, S0266), 1% BSA (Miltenyi Biotec, 130-091-376), 1 mM DTT (Sigma-Aldrich, 10197777001), 5 mM sodium butyrate (Sigma-Aldrich, 303410), Roche Protease Inhibitor (Sigma-Aldrich, 11873580001)) including 2 µg of homemade Tn5. The cells were incubated for 1 h at 20 °C and then washed again twice with scCUT&Tag med buffer. To induce the cutting of the Tn5, 10 mM final MgCl2 (Sigma-Aldrich, M1028) was added, and the sample was incubated for 1 h at 37 °C. After the incubation, the reaction was stopped with 15 mM EDTA final, and the sample was filled up to 600 µl with diluted nuclei buffer of the 10x Genomics scATAC kit v1.1 supplemented with 2% BSA. The nuclei were filtered through a 40-µm Flowmi filter (Sigma-Aldrich, BAH136800040) and washed with diluted nuclei buffer supplemented with 2% BSA.

The final nuclei suspension was quality controlled and counted with a trypan blue assay on the automated cell counter Countess (Thermo Fisher Scientific). Finally, 15,000–20,000 nuclei were loaded per experiment. In cases where fewer nuclei were recovered, all nuclei were loaded. To run the Chromium Chip, 5 µl of cell suspension was mixed with 3 µl of PBS and 7 µl of ATAC buffer from the kit.

Libraries were prepared following the manufacturer’s instructions, except that two PCR cycles were added to the barcoding PCR, and, after 10 cycles of indexing PCR, 5 µl of the library was used to determine the final number of cycles in a Roche LightCycler59. Usually, scCUT&Tag libraries required 12–16 PCR cycles during the indexing. After SPRIselect clean-up (Beckman Coulter, B23318), the libraries were quality controlled and sequenced following the 10x Genomics scATAC v1.1 sequencing recommendations. Usually, 50–100 Mio reads per library were enough to cover the complexity.

H3K27me3 rabbit, Diagenode, C15410195, A0824D

H3K27ac rabbit, Diagenode, C15410196, A1723-0041D

H3K27ac mouse, monoclonal (MABI0309), GeneTex, GTX50903

H3K4me3 rabbit, Diagenode, C15410003, A1052D

H3 mouse, Active Motif, 39763, 20418023

β-Catenin mouse, 1:5,000, BD Biosciences, 610154

Guinea pig anti-rabbit antibodies online, ABIN101961

Alexa Fluor–conjugated antibodies, Thermo Fisher Scientific

HRP-conjugated antibodies, Jackson ImmunoResearch

Bulk CUT&Tag

Starting with 0.1–1 Mio cells after dissociation, cells were transferred into CUT&Tag wash buffer (20 mM HEPES (pH 7.5) (Jena Bioscience, CSS-511), 150 mM NaCl (Sigma-Aldrich, S6546), 0.5 mM Spermidine (Sigma-Aldrich, S0266), 5 mM sodium butyrate (Sigma-Aldrich, 303410), Roche Protease Inhibitor (Sigma-Aldrich, 11873580001)). After 15 µl of BioMag, Concanavalin A beads (Polysciences, 86057-3) in binding buffer (20 mM HEPES (pH 7.5), 10 mM KCl, 1 mM CaCl2, 1 mM MnCl2) were added to the sample and incubated on the wheel for 15 min at room temperature. Subsequently, the cells were collected on a magnet and lysed through the addition of CUT&Tag wash buffer supplemented with 0.01% Digitonin. Lysis was monitored under a microscope with trypan blue staining. After lysis was complete, the nuclei were washed again with CUT&Tag wash buffer. If possible, all samples were split, and H3 or another chromatin mark CUT&Tag was performed on the same starting material, to be used as normalizer. The antibody was added together with 2 mM EDTA final, and the sample was incubated on a rocking platform at 4 °C overnight.

The samples were washed once with CUT&Tag wash buffer, and the secondary antibody was added to the reaction and incubated for 1 h at 20 °C on a rocking platform. After two additional washes, the Tn5 was added (1:100) in CUT&Tag med buffer (20 mM HEPES (pH 7.5) (Jena Bioscience, CSS-511), 300 mM NaCl (Sigma-Aldrich, S6546), 0.5 mM Spermidine (Sigma-Aldrich, S0266), 5 mM sodium butyrate (Sigma-Aldrich, 303410), Roche Protease Inhibitor (Sigma-Aldrich, 11873580001)). Tn5 was allowed to bind for 1 h at 20 °C on a rocking platform. After two additional washes, the cutting was induced through the addition of 10 mM MgCl2 in CUT&Tag med buffer. After 1 h at 37 °C, the reaction was stopped by adding a final of 20 mM EDTA, 0.5% SDS and 10 mg of Proteinase K. The reaction was then incubated at 55 °C for 30 min and finally inactivated at 70 °C for 20 min.

The DNA fragments were purified using the ChIP DNA Clean & Concentrator Kit (Zymo Research, D5205). For the elution from the columns 2 pg of Tn5-digested and purified lambda DNA (New England Biolabs, N3011S) were added to be used as spike-in normalizer for later analysis, when needed. Purified fragments were indexed for 15 cycles (1 × 5 min at 58 °C, 1 × 5 min at 72 °C, 1 × 30 s at 98 °C, 14 × 10 s at 98 °C, 30 s at 63 °C, 1 × 1 min at 72 °C, ∞ at 4 °C) using NEBNext HighFidelty 2× PCR Master Mix (New England Biolabs, M0541S) and Illumina i5 and i7 indices59. The libraries were then purified using AMPure beads (Beckman Coulter, A63881), measured and quality controlled with Qubit DNA HS Assay (Thermo Fisher Scientific, Q32854) and on a TapeStation (Agilent, 5067-4626) and then sequenced (PE, 2 × 50 bp).

Hashing and scRNA-seq

Cells were either processed right after dissociation or recovered after cryostorage. To recover the cells after cryostorage, the cryovials were incubated in a water bath at 37 °C until only a small ice piece was left inside the tube. The cells were then transferred into pre-warmed DMEM/F-12 (Gibco, 31330038) supplemented with 10% FBS final (Merck, ES-009-B). After washing twice with DPBS (Gibco, 14190144) supplemented with 0.5% BSA, the cells were filtered with a 40-µm Flowmi filter and counted using the trypan blue assay on the automated cell counter Countess (Thermo Fisher Scientific). Cell viability was approximately 80–95%. Cells were further diluted to be processed in the 10x Genomics single-cell RNA expression v3.1 assay, strictly following the manufacturer’s guidelines.

To generate single-cell RNA expression libraries of the drug treatment, we turned to cell hashing60 for better comparability of the samples. For hashing, 100,000–300,000 cells were resuspended in 50–100 µl of DPBS + 0.5% BSA. Then, 5 µl of Human TruStain FcX (Fc Receptor Blocking Solution, BioLegend, 422302) was added to the sample and incubated for 10 min on ice. Next, 2 µl of TotalSeq Cell hashing antibodies (BioLegend) was added to the cells and incubated for 30 min on ice with gentle agitation every 10 min.

Replicate 1:

HTO1 - GTCAACTCTTTAGCG - DMSO_ctrl

HTO3 - TTCCGCCTCTCTTTG - 1 µM A395

HTO4 - AGTAAGTTCAGCGTA - 3 µM A395

Replicate 2:

HTO1 - GTCAACTCTTTAGCG - DMSO_ctrl

HTO5 - AAGTATCGTTTCGCA - 1 µM A395

HTO7 - TGTCTTTCCTGCCAG - 3 µM A395

HTO8 - CTCCTCTGCAATTAC - 10 µM A395

After incubation, cells were washed twice with DBPS + 0.5% BSA. Depending on the starting amount, the cells were resuspended in 20–40 µl of DPBS + 0.5% BSA and counted. Then, the cell suspensions were mixed and processed in the 10x Genomics single-cell RNA expression v3.1 assay, strictly following the manufacturer’s guidelines with slight adjustments. A maximum of 20,000 cells were targeted per experiment, and HTO additive primer was added to the cDNA synthesis following the TotalSeq technical protocol (https://www.biolegend.com/en-us/protocols/totalseq-a-antibodies-and-cell-hashing-with-10x-single-cell-3-reagent-kit-v3-3-1-protocol). To generate gene expression and hashing libraries, we followed the protocol CG000206 Chromium Next GEM SingleCell v3.1 Cell Surface Protein and sequenced according to the manufacturer’s guidelines.

Western blot

Next, 1–3 organoids were directly collected into 50 µl of Laemmli sample buffer and homogenized with an electric grinder (Fisherbrand, 12-141-368). DNA was sheared by sonication in the Diagenode Bioraptor Plus (high setting for 15 cycles, 30 s on/30 s off). Samples were subsequently run on SDS-PAGE and transferred to PVDF membrane using Wet-Blot. Then, 2–10 µl of extract was loaded per lane. The ECL signal was recorded using the iBright system (Invitrogen). The signal was compared to antibody stainings of loading controls (H3 and β-Catenin), and the membranes were quality controlled by Ponceau (Sigma-Aldrich, P7170-1L) staining. See Supplementary Table 21 for details on the antibodies.

Immunostaining

Organoids were fixed overnight in 4% paraformaldehyde. The next day, the organoids were washed three times for 5 min with DPBS and then transferred into 30% sucrose in DPBS until they sank to the bottom of the tube. Then, the organoids were transferred into cryomolds (Sakura, 4565) and embedded in Tissue-Tek O.C.T. (Sakura, 16-004004) on dry ice. The organoids were sliced on a Cryostar NX70 (Thermo Fisher Scientific) into 20-µm-thick slices at −17 °C. The slices were transferred to glass slides and washed with PBS. After a quick wash, antigen retrieval was performed for 20 min at 70 °C in 1× pre-heated HistoVT One (Nacalai, 06380). Slides were washed three times for 5 min with PBS + 0.2% Tween and then transferred to blocking and permeabilization (PBS, 0.1% Triton, 5% serum, 0.2% Tween, 0.5% BSA) for 1 h. The antibodies were added to the blocking solution overnight at 4 °C (see Supplementary Table 21 for further details). The next day, the slides were washed again three times for 5 min with PBS + 0.2% Tween, and the secondary antibody was added in PBS supplemented with 2% BSA and 0.2% Tween for 2 h at room temperature. Last, the slides were washed again three times for 5 min with PBS + 0.2% Tween, adding DAPI to the last wash. The slides were then mounted in Prolong Glass Antifade and imaged on a Nikon Ti2 spinning disk or a confocal Zeiss LSM 980 microscope.

Data processing for scRNA-seq

To compute transcript count matrices, sequencing reads were aligned to the human genome and transcriptome (hg38, provided by 10x Genomics) by running Cell Ranger (version 5.0.0) with default parameters. Count matrices were then pre-processed using the Seurat R package (version 3.2)61.

Cells were filtered by unique molecular identifier (UMI), number of detected genes and fraction of mitochondrial genes as follows:

UMIs > 2,000

UMIs < 1.5 × 105

detected genes > 1,000

fraction of mitochondrial reads < 0.2

Transcript counts were normalized by the total number of counts for that cell, multiplied by a scaling factor of 10,000 and subsequently natural-log transformed (NormalizeData()).

Pre-processing and clustering of scCUT&Tag data

We aligned the sequencing reads to the human genome and transcriptome (hg38, provided by 10x Genomics) using Cell Ranger ATAC (version 1.2.0) with default parameters to obtain fragment files and peak calls. The fragment files and the peak count matrices were further pre-processed using Seurat (version 3.2)62 and Signac (version 1.1)61. We removed cells with fewer than 200 (H3K27ac and H3K4me3) or 100 (H3K27me3) fragment counts from the analysis. For quality control, we checked the following metrics using Signac: the transcription start site (TSS) enrichment score (TSSEnrichment()), in particular for activating and promoter marks; the nucleosome signal (NucleosomeSignal()); the percentage of reads in peaks; and the ratio of reads in genomic blacklist regions.

We then created a unified set of peaks from the union of peaks from all samples by merging overlapping and adjacent peaks. The unified set of peaks was re-quantified for each sample using the fragment file (FeatureMatrix()). Peak counts were normalized by term frequency-inverse document frequency (tf-idf) normalization using the Signac functions RunTFIDF(). Latent semantic indexing (LSI) was performed by running SVD (RunSVD()) on the tf-idf-normalized matrix. To visualize the data in two dimensions (2D), UMAP12 was performed on LSI components 2–30. We then called high-resolution clusters using Louvain clustering in each group separately with the following resolutions to obtain similar cluster sizes:

EB: 2

Mid: 5

Late: 10

8 months: 5

Retina: 5

Demultiplexing

We used demuxlet63 to demultiplex cells pooled from different stem cell lines. For B7 and 409B2-iCRISPR, SNPs were called using bcftools based on DNA sequencing52,64 data or downloaded from the HipSci website (HOIK1 and WIBJ2) and the Allen Cell Atlas (WTC). All files were merged using bcftools, and sites with the same genotypes in all samples were filtered out. Demuxlet was run with default settings. For RNA, cells with ambiguous or doublet assignments were removed from the data. Otherwise, the best singlet assignment was considered the lines’ genotype.

Integration and annotation of scRNA-seq data

First, we grouped the dataset into five groups depending on the sample of origin:

EB: 4-day-old brain organoids in EB stage

Mid: 15-day-old brain organoids in neuroepithelium stage

Late: 1–4-month-old brain organoids

8 months: 8-month-old brain organoids

Retina: 6-week-old and 12-week-old retina organoids

Initial integration was based on mid and late groups. We computed the 2,000 most variable features using the Seurat function FindVariableFeatures() and computed cell cycle scores using the Seurat function CellCycleScoring(). Subsequently, the data were z-scaled; cell cycle scores were regressed out (ScaleData()); and principal component analysis (PCA) was performed using the Seurat function RunPCA() based on variable features. We used the first 10 principal components (PCs) to integrate the different timepoints in the dataset using the Cluster Similarity Spectrum (CSS) method65. The missing samples EB, retina and 8 months were then projected into CSS space using the css_project() function. To obtain a 2D representation of the data, we performed UMAP12 using RunUMAP() with spread = 0.8, min.dist = 0.2 and otherwise default parameters.

To annotate the data, we first called high-resolution clusters using Louvain clustering in each group separately with the following resolutions to obtain similar cluster sizes:

EB: 1

Mid: 2

Late: 5

8 months: 2

Retina: 2

Clusters were annotated with cell types and regional identities using VoxHunt66, comparison to reference datasets13,14 and marker expression.

Hashing

Count matrices were generated with CITE-seq-Count (version 1.4.5 running under Python version 3.6.13) and intersected with transcript matrices from Cell Ranger. The hashtag counts were normalized with centered log-ratio (CLR) transformation. Doublets were filtered out using hashtag information.

Calculation of gene activity scores

To enable comparison of gene expression with chromatin modifications in the same feature space, we computed gene activity scores for each gene and chromatin modality. We summarized fragment counts over the gene body plus an extended promoter region (+2 kb). For this, we used the Signac function GeneActivity() with default parameters. Fragment counts representing gene activities were subsequently log normalized with a scaling factor of 10,000.

Annotation of peak regions

To obtain genomic annotations for peak regions from all chromatin modalities, we used the function annotatePeak from the R package CHIPseeker67 with default parameters.

Matching of scRNA and scCUT&Tag

To integrate cell populations between RNA and chromatin modalities, we matched high-resolution clusters based on the correlation of gene expression with gene activity scores. For this, we performed MCMF bipartite matching between the modalities as described in https://github.com/ratschlab/scim (ref. 21). The function get_cost_knn_graph() was used with knn_k = 10, null_cost_percentile = 99 and capacity_method = ‘uniform’. As a distance metric (knn_metric), we used the correlation distance provided by scipy68 for activating marks H3K27ac and H3K4me3 and the negative correlation distance for the repressive mark H3K27me3. Unmatched clusters from either modality were matched based on maximum (or minimum) correlation.

Data representation

In all box plots, the center line denotes the median; boxes denote lower and upper quartiles (Q1 and Q3, respectively); whiskers denote 1.5× the interquartile region below Q1 and above Q3; and points denote outliers. All error bars shown in the manuscript depict the standard deviation. For bar plots, the absolute numbers are given within the plot or the legend.

Graph representation of regional diversification

To visualize differentiation trajectories into regionalized neuronal populations, we constructed a graph representation based on terminal fate probabilities. For this, we first obtained count matrices for the spliced and unspliced transcriptome using kallisto (version 0.46.0)69 by running the command line tool loompy fromfastq from the Python package loompy (version 3.0.6)(https://linnarssonlab.org/loompy/). We subset the dataset to cell populations on the neuronal trajectory from pluripotency (PSCs, neuroepithelium, NPCs and neurons) and computed RNA velocity using scVelo (version 0.2.4)22 and scanpy (version 1.8.2)70. First, 2,000 highly variable features were selected using the function scanpy.pp.highly_variable_genes(). Subsequently, moments were computed in CSS space using the function scvelo.pp.moments() with n_neighbors = 20. RNA velocity was calculated using the function scvelo.tl.velocity() with mode = ‘stochastic’, and a velocity graph was constructed using scvelo.tl.velocity_graph() with default parameters. To order cells in the developmental trajectory, a root cell was chosen randomly from cells of the first timepoint (EB), and velocity pseudotime was computed with scvelo.tl.velocity_pseudotime(). The obtained velocity pseudotime was further rank transformed and divided by the total number of cells in the dataset. Based on the velocity pseudotime, we computed fate probabilities into the following manually annotated terminal cell states: dorsal telencephalon neurons, diencephalon neurons, midbrain neurons, hindbrain neurons and retinal ganglion cells. For this, we used CellRank (version 1.3.0)23. A transition matrix was constructed with a palantir kernel (PalantirKernel()) based on velocity pseudotime. Absorption probabilities for each of the pre-defined terminal states were computed using the GPCCA estimator. Based on the computed fate probabilities, we next constructed a graph representation. We used PAGA to compute the connectivities between clusters (scvelo.tl.paga()) and summarized transition scores for each of the clusters. To find branch points at which the transition probabilities into different fates diverge, we constructed a nearest-neighbor graph between the high-resolution clusters based on their transition scores (k = 10). We further pruned the graph to only retain edges going forward in pseudotime—that is, from a node with a lower velocity pseudotime to a node with a higher velocity pseudotime. Additionally, we removed edges connecting different regional trajectories. The resulting graph is directed with respect to pseudotemporal progression and represents a coarse-grained abstraction of the fate trajectory, connecting groups of cells with both similar transition probabilities to the different trajectories and high connectivities on the transcriptomic manifold.

Differential peak analysis between regional identities

To find peaks with differential enrichment in regional trajectories, we performed differential peak analysis for each chromatin modality. We fit a GLN with binomial noise and logit link for each peak i on binarized peak counts Y with the total number of fragments per cell and the region label as the independent variables:

Y~n_fragments+region

In addition, we fit a null model, where the region label was omitted:

Y~n_fragments

We then used a likelihood ratio test to compare the goodness of fit of the two models using the lmtest R package (version 0.9) (https://cran.r-project.org/web/packages/lmtest/index.html). Multiple testing correction was performed using the Benjamini–Hochberg method.

Analysis of bivalent and switching peaks

To find genomic regions that were marked by both H3K27me3 and H3K4me3 (bivalency), we extended all peaks of both modalities by 2 kb in both directions before intersecting them. For all intersecting peaks, we aimed to find instances where bivalency is resolved during regional diversification. For this, we selected matching peaks that showed differential enrichment in any region with opposite effect sizes. Out of these, we further selected matching peaks where both modalities were detected in more than 5% of cells in any high-resolution cluster in the neuroepithelial stage (bivalent in neuroepithelium). We performed an analogous analysis for H3K27me3 and H3K27ac to find regions where switching occurs upon regional diversification. Here, we selected matching peaks from both modalities for which only one mark was detected in the neuroepithelial stage (>5% detection in any high-resolution cluster).

Embedding and clustering of regulatory regions

From peaks detected in all chromatin modalities, we aimed to reveal and stratify regulatory regions with distinct detection patterns across the developmental timecourse. We first applied rigorous quality control to all peaks in all modalities and retained only peaks that were detected in more than 50 high-resolution clusters and detected in more than 10% of cells in at least one cluster. We then merged peaks from all modalities through intersection and represented them by the detection rate of peaks in high-resolution clusters from all individual modalities. This resulted in a region × cluster matrix, which was z-scaled, and the number of detected clusters was regressed out using the Seurat function ScaleData(). Next, we performed PCA and UMAP embeddings, and Louvain clustering was performed based on 20 PCs. The class of regions was determined through the combination of chromatin marks that were detected in this region across the entire dataset. Regions were labeled as ‘promoters’ if they were within 2 kb of the TSS of a gene and as ‘distal’ if they were farther than 3 kb from a gene body.

Functional enrichment of regulatory regions

Our previous analyses revealed brain region-specific peaks (top 50 differentially enriched peaks per region) and distinct clusters or regulatory regions. To understand if clusters of regulatory regions captured distinct functional enrichments or transcription factor binding sites, we performed functional enrichment with GREAT as well as transcription factor motif enrichment. GREAT enrichment was performed using the R package rGREAT71. We performed the analysis using the local implantation with the ‘GO:BP’ gene set based on TxDB hg38 gene definitions (https://bioconductor.org/packages/release/data/annotation/html/TxDb.Hsapiens.UCSC.hg38.knownGene.html). For transcription factor motif enrichment, we first discovered motifs in each peak using motifmatchr (version 1.14)72 through the Signac function FindMotifs(). Next, we performed a two-sided Fisherʼs exact test to test for differential enrichment of peaks versus background. P values from both analysis were multiple testing corrected using the Benjamini–Hochberg method. As a background for both analyses, we used the combined set of regulatory regions that passed filtering criteria.

Reconstruction of the telencephalic neuron differentiation trajectory from pluripotency

To reconstruct the differentiation trajectory leading up to telencephalic neurons in higher resolution, we first extracted all cells annotated as EB, neuroepithelium, telencephalic progenitors and neurons. We next sought to compute a pseudotime describing the progression along this trajectory for all modalities separately. For all chromatin modalities, we used LSI components 2–10 to compute diffusion maps with the R package destiny35. Ranks along the first diffusion component were used as a pseudotemoral ordering. For RNA, we used the function scvelo.tl.velocity_pseudotime() from scVelo22 to compute a pseudotime based on RNA velocity. To obtain an even distribution of timepoints for all modalities, we next subsampled the trajectory for each timepoint group to the lowest cell number in any modality but a minimum of 100. The subsampled trajectory was then stratified into 50 bins of equal cell number or 20 bins in case of neurogenesis trajectories.

Distribution of chromatin states during differentiation pseudotime

To assess how genomic regions change chromatin states during differentiation, we first selected regions where peaks of the three marks were detected in more than 5% of cells in any pseudotime bin. For each mark, we further determined a detection threshold by computing the median detection for all peaks in these regions in all pseudotime bins. For each modality and chromatin bin, we defined regions with peaks above this detection threshold as detected. If a region was marked by H3K27me3 and H3K4me3 in the same bin, they were annotated as bivalent, and, if they were marked by H3K27me3 and H3K27ac in the same bin, they were annotated as active promoters.

Clustering of pseudotemporal expression patterns

To discover groups of genes with similar pseudotemporal expression patterns, we clustered smoothed expression patterns based on a dynamic time warping distance. For this, genes for clustering were selected by intersecting 6,000 highly variable genes in RNA with genes detected in more than 2% of cells in any pseudotime bin for all chromatin modalities. For these genes, the average log-normalized expression and gene activity was computed for each modality and pseudotime bin. We smoothed the mean expression for each gene’s values using a generalized additive model with a cubic spline (bs = ‘cs’), which was fit using the R package mgcv70. Smoothed expression over pseudotime bins for all modalities was used to compute a dynamic time warping distance between all genes using the R package dtw73, which was further used for k-means clustering with the R package FCPS74. For each of these clusters, we performed GO enrichment against the 6,000 most highly variable genes from RNA. One of these clusters was highly enriched for neuron-related biological processes, and the genes in this cluster were further used in subsequent analyses. Figure 4a shows a subset of the clustering that covers all pseudotemporal patterns. We provide the full list of genes in Supplementary Table 10.

Reconstruction of the neurogenesis trajectory for the 4-month timepoint

To better understand pseudotemoral expression and chromatin modification dynamics during neurogenesis, we sought to further mitigate potential confounding factors, such as distribution of timepoints, data quality and the pseudotime inference procedure. For this, we subset the data to telencephalic NPCs and neurons from the 4-month timepoint and applied the following additional quality filters to remove outliers: H3K27ac: >1,000 peak fragments per cell; H3K4me3: >316 (102.5) peak fragments per cell, <10,000 peak fragments per cell; H3K27me3: <3,162 (103.5) peak fragments per cell. Pseudotime inference was subsequently performed using diffusion maps as described above. For RNA, we re-computed a set of 2,000 variable features, excluding cell cycle genes, further regressed out cell cycle scores (ScaleData()) and performed PCA (RunPCA()). We used the first 20 PCs to compute diffusion maps with the R package destiny35. Analogous to the CUT&Tag data, ranks along the first diffusion component were used as a pseudotemoral ordering. The trajectory was then divided into 10 or 20 bins (depending on the analysis). Supplementary Fig. 3b shows a subset of the clustering that covers the major pseudotemporal patterns. We provide the full list of genes in Supplementary Table 11.

Detection of inflection point for neuronal genes

To examine at which point the CUT&Tag signal at neuronal genes (from clustering analysis described above) changes during neurogenesis relative to their expression, we sought to identify the first pseudotime bin in which the detection rate significantly increases for active histone modifications, chromatin accessibility and RNA expression or decreases for repressive histone modifications (Fig. 4f). For this, we used the neurogenesis pseudotime from the 4-month timepoint divided into 10 bins and defined the first bin as the baseline. For each subsequent bin, we performed differential detection analysis against the baseline using a binomial linear model as described above. Multiple testing correction was performed using the Benjamini–Hochberg method. Based on this, we determined the first bin with false discovery rate (FDR) < 0.05 and log2 fold change > 0.25. We performed a similar analysis in Supplementary Fig. 4i using the multiome data of the 19-gw fetal brain sample. This analysis was performed on all neuronal genes that were selected based on having maximal expression in neurons over NPCs and detection of H3K27ac in more than 5% of cells in any neuronal high-resolution cluster. From this, we computed a ‘pseudotime lag’ as the difference between the first divergent bin in the RNA and chromatin modifications.

Analysis of multiome data of the developing human brain

We aligned the sequencing reads to the human genome and transcriptome (hg38, provided by 10x Genomics) using Cell Ranger Arc (version 2.0.0) with default parameters to obtain fragment files, peak calls and transcript counts. The data were further pre-processed and analyzed using Seurat (version 3.2)62 and Signac (version 1.1)61. Quality control was performed using the following criteria: 1,000–6,000 detected features per cell (RNA), more than 1,500 UMI counts per cell (RNA) and 1,000–50,000 detected peaks per cell (ATAC). Based on RNA expression, we further performed variable feature selection, z-scaling and PCA. The first 20 PCs were used as an input for Louvain clustering and UMAP. The clusters were manually annotated based on expression of marker genes. To specifically analyze the telencephalic neuron trajectory in these data, we extracted clusters corresponding to dorsal telencephalon NPCs, intermediate progenitors and neurons. For this subset of the data, we again performed variable feature selection, z-scaling and PCA and used the first 20 PCs to run UMAP and to compute diffusion maps with destiny35. Ranks along the first diffusion component were used as a pseudotemoral ordering. The velocity pseudotime was further rank transformed and divided by the total number of cells in the dataset.

GRN analysis

To assess how the chromatin modifications shape the GRN during neurogenesis, we first inferred a GRN using Pando based on a dataset with integrated RNA and ATAC modalities over organoid development14. To select genomic regions for GRN inference, we performed region-to-gene linkage using the Signac function Link

留言 (0)

沒有登入
gif