The study was conducted under ethical agreement and permission of the review board in the involved facilities (Riken, Juntendo University, Saitama-Medical University, Chiba Children’s Hospital in Japan). Patients or parents of patients gave written informed consent. All procedures were conducted following the relevant rules and guidelines and in accordance with the Declaration of Helsinki.
In our alliance of multiple hospitals and institute (Juntendo University, Chiba Children’s Hospital, Saitama Medical College Hospital, and Riken) in Japan, we recruited a cohort of 3478 suspected mitochondrial disease patients under an ethical agreement by Institutional Review Board from 2007 to 2024. Patients were of Japanese origin except for one American-Japanese, one Brazilian, one Korean, two Sudanese, and one Vietnamese. Entry and molecular diagnosis were requested by pediatricians and physicians nationwide in Japan with signed informed consents from patients or their parents, motivated by clinical diagnoses of suspected mitochondrial diseases based on clinical characteristics suggestive of mitochondrial diseases, laboratory tests results such as elevated pyruvate or lactate levels in blood and/or cerebrospinal fluid, suggestive pathological findings in muscle biopsy, or compatible imaging abnormalities by brain computed tomography (CT) or magnetic resonance imaging (MRI). Assessments were done in candidate cases by panel sequencing on a Human Gene Mutation Database (HGMD (Supplementary Note 1))-based mitochondrial disease-related in-house list of genes, whole exome sequencing, or whole genome sequencing as well as biochemical evaluation of the oxygen consumption rate (OCR) and enzyme complex activity assays. Of the 2932 undiagnosed patients, 1169 were biochemically proven to have mitochondrial dysfunctions either in OCR or enzyme complex activity assays. As part of our strategy to diagnose such undiagnosed mitochondrial disease cases, 400 cases in total (303 (75.8%) of which were biochemically proven) were investigated by RNA sequencing-based search for candidate causative genes using “OUTRIDER”.
In the undiagnosed mitochondrial disease cohort of 2932 patients, we defined “mitochondrial encephalopathy” patients for this study by filtering our database by clinician’s diagnosis of any kind of “encephalopathy”, “Leigh disease”, or “mitochondrial cytopathy” referring to mitochondrial encephalo-myopathies, but lacking molecular diagnosis after either panel sequencing, whole exome sequencing, or whole genome sequencing. We listed the patients in chronological order and grouped them into two groups because of preparation timing (Group1, n = 314; Group2, n = 147) (We screened the Group1 for the subsequent ultralong PCR and long-read amplicon sequencing) (Table 1). We conducted RP-PCR for both groups along the protocol described in the Methods section. As the controls for the RNA analyses, including NET-CAGE, we utilized data of an NAXE-related mitochondrial encephalopathy patient (Pt2659) verified by panel sequencing to have a novel splicing variant (NM_144772.3:exon6:c.402+3_402+6del) and a known missense variant (NM_144772.3:exon6:c.733 A > C:p.Lys245Gln) in NAXE. Pt2659 was also analyzed by amplicon long-read sequencing as well as western blotting. The two variants were proven to be in different alleles by long-read sequencing (data not shown) and in state of compound heterozygosity. Pt2359 (homozygous for GGGCC repeat expansion) and Pt2659 (compound heterozygous for splicing and missense variants), both having NAXE-related mitochondrial encephalopathy, were unrelated and not from the same local region. For western blotting, two additional patients, namely Pt1615YS (NDUFA4 deficiency) and Pt1753 (NDUFA8 deficiency), who were negative for NAXE pathogenic variants and had normal expression levels, were used as positive controls as well. The family members (the proband (Pt2359), mother, father, and older brother) of Family A, in which NAXE GGGCC repeat expansion was identified, were all subjected to amplicon long-read sequencing and RP-PCR as well.
Patient fibroblast culture and DNA/RNA isolationPatient skin fibroblasts and control fibroblasts (fetal human dermal fibroblast (CA10605f, HDF-fetal, Toyobo, Japan) as well as neonatal normal human dermal fibroblasts (NHDF, Cell Applications, USA)) were cultured in Dulbecco’s modified Eagle’s Medium (DMEM) with 10% fetal bovine serum and 1% penicillin and streptomycin. DNA was purified using the MagAttract HMW DNA kit (Qiagen, Germany). For quality assessment, NanoDrop One (ThermoFisher, USA), Qubit dsDNA kit (ThermoFisher, USA), and TapeStation 4200 (Agilent Technologies, USA) were used. RNA was purified using the Maxwell RSC simplyRNA Cells Kit (Promega, USA).
Short-read whole genome sequencing and data analysisWGS libraries were prepared from 200 ng of genomic DNA using an MGIEasy FS DNA Library Prep Kit v2.0 (MGItech, China) in the proband or v2.1 in the other subjects following the manufacturer’s instructions. Paired-end 150-bp sequencing was performed on DNBSEQ-G400. Single nucleotide variations were assessed along an in-house pipeline. Briefly, after quality check and removal of low-quality reads, the reads were mapped against GRCh38/hg38 using Burrows-Wheeler Aligner40. Recalibration and variant calling was conducted using GATK41. Annotation was carried out using ANNOVAR42. AutoMap was used to detect the homozygous regions of each chromosome. To rule out the pathogenic variants in the genes that cause encephalopathies, the list of genes (Supplementary Table 5) was screened. For evaluation of mutations in the mitochondrial genome, we used Matchclips43 for detecting mitochondrial deletions as well as referring long-read data.
For evaluation of repeat expansion, ExpansionHunter, ExpansionHunter Denovo, and STRling were used. ExpansionHunter was conducted in 366 undiagnosed mitochondrial patients and the four members of Family A. The settings for the calculation were as follows: [LocusID: NAXE, ReferenceRegion: chr1:156591740-156591800, LocusStructure: GGGCC, and VariantType: Repeat]. ExpansionHunter Denovo was conducted in the “case-control mode” comparing Pt2359 and the mother against 10 other patients with undiagnosed mitochondrial disease. STRling was conducted in 366 patients with undiagnosed mitochondrial disease and the four members of Family A.
RNA sequencing and data analysisThe mRNA was purified from total RNA using oligo(dT)-attached magnetic beads. After fragmentation, first-strand cDNA was generated using random hexamers, followed by second-strand cDNA synthesis. The synthesized cDNA was subjected to end repair, 3’ adenylation, and adaptor ligation to complete the library. The dsDNA libraries were heat-denatured and circularized by the splint oligo sequence to generate a single strand circle DNA, followed by rolling circle replication to create DNA nanoballs. Sequencing was performed on an MGI DNBSEQ-T7 platform (BGI, China) using 150-bp paired-end reads. Fastq files were aligned to the GRCh38/hg38 genome using STAR. Gene counting was computed using STAR quantMode as GeneCounts. The outlier mRNA expression analysis was performed using OUTRIDER. Outliers were detected by comparing the mitochondrial disease cases that underwent RNA-seq analysis on the same platform.
Oxygen consumption rate measurementOxygen consumption rate (OCR) was using a with Seahorse XF96 extracellular flux analyzer (Agilent Technologies, USA). Patient skin-derived fibroblasts as well as control fibroblasts were cultured in Dulbefcco’s modified Eagle’s Medium (DMEM) with fetal bovine serum and 1% penicillin and streptomycin. The cells were seeded at 2 × 104 cells/well in a 96-well plate at 37 °C and 5% CO2. After 24 h of culture, the medium was changed for 25 mM glucose or 10 mM galactose- containing DMEM for 1 h. Subsequently, the medium was replaced with an oligomycin (2 μM), carbonyl cyanide 4-(trifluoromethoxy) phenylhydrazone (FCCP, 0.4 μM), and rotenone (1 μM)-containing medium to control the ATP synthesis of the cells to determine the maximal rate of oxygen consumption44.
Measurement of enzyme activity in oxidative phosphorylationOxidative phosphorylation enzyme activities in the fibroblasts were measured using a Cary 300 spectrophotometer (Agilent Technologies, USA) following the manufacturer’s instructions45. The enzyme activities of each complex were presented as the percentage of normal control mean relative to the appropriate reference enzyme activities, such as that of citrate synthase (CS).
Western blottingTo prepare the total cell lysate, cell pellets were lysed with 1× RIPA buffer (08714-04, Nacalai Tesque, Japan) and kept on ice for 15 min. They were then centrifuged at 10,000 g for 10 min at 4 °C and the supernatants were collected. Protein concentration was determined using the BCA Protein Assay Kit (Thermo Fisher Scientific, USA) according to the manufacturer’s instructions. Prepared samples were denatured for 5 min at 95 °C and separated by SDS-PAGE on a 5–20% gradient polyacrylamide gel (EHR-T520L, ATTO, Japan) with the Prestained XL-Ladder Broad (SP-2120, Pharma Foods International, Japan) as a molecular marker. Proteins were transferred to a PVDF membrane and subjected to western blotting. The NAXE (HPA048164, Sigma-Aldrich, USA) and beta-actin (A5441, Sigma-Aldrich, USA) primary antibodies were used.
Repeat-primed PCR (RP-PCR)RP-PCR was carried out as follows: 10 µl of ExPremier 2× premix (TaKaRaBio, Japan), 1 µl of 5´ 6-FAM_reverse primer (10 uM), 1 µl of RP_END primer (10 uM), 1 µl of GGGCC_fwd primer (3.2 uM), 20 ng of template genomic DNA, nuclease-free water up to 20 µl of the total volume in a PCR-tube per sample. Primers were synthesized by Integrated DNA Technologies (IDT, USA) in HPLC purification grade (5´ 6-FAM_reverse primer) in 100 nmol scale or desalt grade (RP_END and GGGCC_fwd primers). The thermal cycles were as follows: 94 °C for 1 min, [98°C for 10 s, 60 °C for 15 s, 68 °C for 1 min] × 35 cycles, 68 °C for 1 min, 4 °C until subsequent dilution for fragment analysis. We used VeritiPro Thermal cycler or GeneAmp PCR system 9700 (Thermo Fisher Scientific, USA). Fragment analysis was conducted by adding HiDi-formamide and GeneScan 600 LIZ dye Size Standard v2.0 (Thermo Fisher Scientific, USA), followed by electrophoresis on GeneticAnalyzer 3500 (Thermo Fisher Scientific, USA). Data analysis was performed using the GeneMapper software (Thermo Fisher Scientific, USA). Primer sequences are described in Supplementary Table 6.
PacBio Sequel2 whole genome long-read sequencing and data analysisLong DNA purified as described above were then sheared at 10-20 kb with Megaruptor 2 (Diagenode, USA) and further used for library construction using the SMARTbell prep kit 3.0 (Pacific bioscience, USA) following the manual’s instructions. The libraries were then sequenced on a PacBio Sequel2 sequencer on 2 single molecule realtime (SMRT) sequencing cells on the proband and 1 on the mother sample. Acquired raw data (subreads.bam) were then transformed into circular consensus (ccs) reads.bam by pbccs 6.4.0 with or without the kinetics information. For the evaluation of CpG methylation status, primrose was used for processing the ccs reads file (.bam). The ccs reads were then mapped against the GRCh38 reference genome. To evaluate methylation in the internal sequences of the repeat stretch, the reads were further mapped against a synthetic sequence derived from pure 244 GGGCC repeats and contiguous sequences. Visual inspection was carried out using IGV. Methylation level at each CpG site was calculated as probability (0 to 100%) and illustrated in blue (0 to 49%) or red (50% to 100%) in IGV. For repeat internal sequence analysis, tandem repeat finder46 was employed. PacBio RepeatAnalysisTools was used to draw the waterfall plot (see Supplementary Note 1).
PacBio Sequel2 amplicon sequencing of the NAXE regionWe used ultralong PCR covering NAXE locus combined with PacBio Sequel2 amplicon sequencing to screen for the candidate pathogenic variants leading to NAXE-related mitochondrial encephalopathy as well as variations in the GGGCC repeat region as candidate risk factors. We aimed to screen 314 samples (Group1) from 461 mitochondrial encephalopathy patients initially examined by RP-PCR. In addition, we added four samples (the proband (Pt2359), older brother, mother, and father) of the NAXE repeat expansion pedigree (Family A), another case (Pt2659) with NAXE-related mitochondrial encephalopathy caused by compound heterozygous variants (missense and splicing; see also the Subjects section), and HapMap sample GM12878 as the control. As a result, 277 out of the 314 Group1 samples, four members of NAXE repeat expanded pedigree, Pt2659, and GM12878 were successfully amplified on either or both of the two 13 to 14 kb-long ultralong PCR amplicons covering the NAXE locus, whereas the other samples were not amplifiable, probably due to slight disintegration of genomic DNA during DNA preservation. A total of 242 samples with 10 or more reads covering the amplicon regions were further analyzed.
We conducted amplicon sequencing using PacBio Sequel2 long-read sequencing with the PacBio M13 targeted amplicon sequencing protocol. This protocol utilizes two-step PCR to add M13 primer sequences in the first PCR and barcode sequences at both ends in the second PCR.
The following amplicons were designed to cover all the gene regions of NAXE based on entry of ENSG00000163382 (Chr1: 156,591,756-156,609,507 on GRCh38.p14): Amplicon 1: chr1:156586942-156601027, 14086 bp, Amplicon 2: chr1:156595861-156609646, 13786 bp.
The first PCR was carried out as follows: ExPremier PCR enzyme (TaKaRaBio, Japan) was used with 10 µl of 2× ExPremier premix, final 0.25 µM of each primer, and 20 ng of template genomic DNA in a total reaction volume of 20 µl, with the following thermal cycles: 94 °C 1 min, [98 °C 10 s, 68°C 7 min] × 30 cycles.
After purification, second step PCR was carried out to add barcodes on both sides using M13_barcode primers as second PCR primers. Each of the M13_bcXXXX_F and _R primers was used to comprise different 16 × 24 ( = 384) sets of primer pairs. The second PCR was carried out as follows: ExPremier PCR enzyme (TaKaRaBio, Japan) was used with 10 µl of 2× ExPremier premix, final 0.25 µM of each primer and 1 ng of template DNA in a total reaction volume of 20 µl, with the following thermal cycles: 94 °C 1 min, [98 °C 10 s, 60 °C 15 s, 68 °C 7 min] × 8 cycles (see Supplementary Table 7 and 8 for information on first PCR and second PCR primers, respectively).
After the second PCR, the PCR products were pooled, purified, repaired, A-tailed, followed by adapter ligation and cleaning up, after which they were nuclease-treated and cleaned up following the PacBio SMARTbell prep kit 3.0 user instruction manual. The completed libraries were pooled and sequenced on 1 single molecule real time (SMRT) cell with the PacBio Sequel2 sequencer (Pacific bioscience, USA). For data analysis, circular consensus reads were produced with the pbccs 6.4.0 tool, demultiplexed, and mapped against GRCh38 using pbmm2. For the analysis of structural variants, sniffles2 was used47. For the analysis of SNVs and small indels, DeepVariant was used, and the variants were annotated using snpEff48.
For single nucleotide variations and small indels, we used DeepVariant49 for variant calling as well as visual inspection by IGV specifically on variants in the GGGCC repeat region of NAXE. We selected the variants with one or more of the following characteristics: 1) “MODARATE” or “HIGH” impact by snpEff annotation, 2) splice region by snpEff annotation, and 3) Delta score of 0.1 or higher in SpliceAI, and 4) variants affecting the NAXE GGGCC repeat sequence. Only variants with allele frequencies of 0.1 or lower in ToMMo54KJPN were considered for those meeting the criteria of 1), 2), and 3). For functional prediction by bioinformatics tools, CADD phred scale score GRCh38, delta score of SpliceAI, and ClinVar classification (if registered) were described based on the Gnomad database (if not registered, hand search by CADD GRCh38-v1.7 Web resources was used). Allele frequencies in public databases (ToMMo54KJPN and Gnomad v4.0) were described. To statistically compare the frequencies of variations in the GGGCC repeat region, chi-square test of independence was conducted by python scripts with significance threshold of p < 0.05.
Southern blottingProbe design: Probes were designed to be >200 bp in length, having no off-target sequence in the human genome by blastn and being within an appropriate fragment, using restriction enzymes (MfeI at chr1:156589300 and BamHI at chr1:156563563). As a result, the following two probes were selected: p2_9F/R: chr1:156591431-156591637 (GRCh38), p3_9F/R: chr1:156592713-156592960 (GRCh38) (For primers for probe preparation, see Supplementary Table 9).
Template preparation: Artificially synthesized respective DNA fragments were cloned in pUC vectors and midi-prepped (Eurofins Japan, Japan). Midi-prepped plasmids were used as templates for probe preparation. PCR DIG Probe Synthesis Kit (Sigma-Aldrich Japan, Japan, Cat. No. 11 636 090 910) was used for DIG-probe preparation from 10 pg of template for each probe following the user instruction manual. PCR products were confirmed by agarose electrophoresis in 1×TBE buffer.
Restriction enzyme digestion, electrophoresis, transfer to membrane, and hybridization: From each sample, 1 µg of genomic DNA was digested by BamHI (NEB) and MfeI-HiFi (NEB) in rCutSmart buffer (NEB) at 37 °C for 2 h and electrophoresed in 1×TAE agarose (0.8%) gel made with SeaKem LE agarose (Lonza, Switzerland) of 14 cm length × 12 cm width format. As molecular markers, 1 kb DNA ladder PLUS (Nippon Genetics, Japan) and DIG-labeled marker VII (Sigma-Aldrich Japan, Japan) were used. After 35 V × 18 h of electrophoresis at room temperature, the gel was taken out and soaked in 0.25 M HCl 200 ml for 40 min for depurination. After two washes in 500 ml MilliQ (Merck-Millipore, USA) water for 3 min each, the gel was soaked in 250 ml of denaturation solution (0.5 M NaOH, 1.5 M NaCl) twice for 15 min each. The gel was then washed with 500 ml of MilliQ water for 5 min and soaked in 250 ml of neutralization solution (0.5 M Tris-Cl pH 7.5, 1.5 M NaCl) for 30 min. For blotting, gravity-assisted blotter (G capillary blotter C set, TAITEC, Japan, cat 0014675-000) was used for overnight blotting at room temperature using 10× SSC buffer to transfer to a positively charged nylon membrane (Sigma-Aldrich, USA). The membrane was then washed in 2× SSC buffer and dried at room temperature. UV crosslinking was conducted for irradiating 120,000 μJ of ultraviolet (UV) light (UV crosslinker CL1000, UVP (Analytik Jena, USA) on the membrane. The membrane was rinsed with 2× SSC and prehybridized in DIG Easy hyb buffer (Sigma-Aldrich, USA) for more than 3 h at 68 °C, followed by hybridization in probes in DIG Easy hyb buffer for 18 h at 42°C. Subsequently, washes with 100 ml of 2× SSC and 0.1% SDS for 15 min twice, 100 ml of 0.5× SSC with 0.1% SDS for 15 min once, and 100 ml of 0.15× SSC with 0.1% SDS for 15 min once were conducted in a hybridization bottle in a rotating incubator set to 68 °C. For detection of hybridized fragments, the membrane was taken out from the hybridization bottle, placed in a hybridization bag, and washed with 10 ml of washing buffer (0.1 M maleic acid, 0.15 M NaCl, pH 7.5, 0.3% Tween20). The buffer was discarded, and 10 ml of 1× blocking solution were used for blocking for 2 h at room temperature. The blocking solution was made from 10× blocking solution containing a blocking reagent (Sigma-Aldrich, USA, cat 11096176001) 10 g in 100 ml of 0.1 M maleic acid, and 0.15 M NaCl; pH 7.5; the mixture was stirred with heating, autoclaved, aliquoted in 15 ml centrifuge tubes, and preserved at -20 °C before use. Anti-DIG-AP Fab fragments (Sigma-Aldrich, USA cat 11093274910) were diluted 1:10,000 in 1× blocking buffer for immunodetection. After 30 min of mild shaking in the antibody solution at room temperature, the membrane was washed three times with 10 ml of washing buffer for 15 min each and rinsed with a detection buffer (0.1 M Tris-HCl, 0.1 M NaCl, pH 9.5) twice. Finally, ready-to-use CDP-star (Sigma-Aldrich, USA, cat 12041677001) was used for chemiluminescence. Images were obtained by FusionSolo S (Vilbert Lourmet, France). The raw image was linearly rescaled between intensities between 4 and 4000 in the software FusionCapt Advance in the gel imager.
NET-CAGE1 × 107cells per sample were used for the extraction of nascently transcribed RNA. Pelleted cells were dissolved in 1 ml of a cell lysis buffer (1527 µl of EZ lysis buffer (Sigma-Aldrich, USA), 40 µl of 1 mM α-amanitin (Fujifilm, Japan), 32 µl of 50× cOmplete protease inhibitor 50× (Sigma-Aldrich, USA), and 1 µl of 20 U/ul SUPERase•In (Thermofisher, USA) for preparation of 1600 µl of cell lysis buffer), incubated on ice for 10 min, and centrifuged for 800 × g 5 min at 4 °C. The pellets were dissolved in 600 µl of the cell lysis buffer and centrifuged for 800 × g 5 min at 4 °C. Nuclear lysis buffer (200 µl; final concentrations of 1% NP-40, 20 mM HEPES, 300 mM NaCl, 2 M urea, 0.2 mM EDTA, 1 mM DTT, 25 µM α-amanitin, 1× cOmplete protease inhibitor, and 20 U SUPERase in nuclease-free water) was added to the pellet and pipet mixed, incubated for 10 min on ice, and centrifuged for 3000 × g 4 min at 4 °C. The pellets were then suspended in 100 µl of nuclease lysis buffer and centrifuged for 3000 × g 2 min at 4 °C, followed by the addition of 50 µl of DNase I mix (5 µl of ×10 DNase buffer (Thermofisher, USA), 5 µl of DNase I (Thermofisher, USA), and 1 µl of 20 U/µl SUPERase•In, and 39 µl of nuclease free water in total volume of 50 µl per sample) and mixed with ThermoMixer (Thermofisher, USA) at 1400 rpm, 30 to 90 min (until pellet was solubilized) at 37 °C with gentle agitation with pipetting. Finally, 600 µl of RLT buffer (QIAGEN, Germany) were added and gently mixed. The resulting solution contained enriched nascently transcribed RNAs (nascent RNA) and was further utilized for the construction of the CAGE library. For the preparation of single strand CAGE library on nascent RNA (3 to 5 µg of nascent RNA per library), a previously described protocol was used (see details in Supplementary Note 1). Libraries were sequenced on Illumina HiSeq 2500 (Illumina, USA) using the 50 bp single-end mode. The sequenced reads were then filtered, mapped against the GRCh38 genome, and normalized using MOIRAI50 and in-house scripts.
留言 (0)