Benchmarking nanopore sequencing and rapid genomics feasibility: validation at a quaternary hospital in New Zealand

Benchmarking samples and truth sets

We acquired CNVPANEL01 as 3 µg genomic DNA (at 100 µg/ml) per sample and GIAB reference samples (i.e. HG002 - HG007) with available truth sets, from the Coriell repository (Coriell Institute for Medical Research, 403 Haddon Avenue Camden, NJ 08103, USA).

Library preparation and nanopore sequencing

DNA samples (1500 ng) were sheared to 10–15 kb using Covaris g-TUBES (Covaris) in a bench-top centrifuge for 1 min at 2000 RCF (room temperature). Nanopore sequencing libraries were prepared according to the genomic DNA Ligation Sequencing Kit V14 (SQK-LSK114) protocol (ONT, Oxford Science Park, OX4 4DQ, UK). Prepared libraries were loaded on PromethION flow cells (R10.4) and sequenced (i.e. depth of between 24-42X) with the PromethION 2 (P2) solo device using Kit 14 chemistry and MinKNOW v23.07.8 (Oxford Nanopore Technologies [ONT], Oxford Science Park, OX4 4DQ, UK).

Base calling of nanopore reads and variant calling

Base calling of raw ONT signal data was completed using Dorado v0.3.3 (https://github.com/nanoporetech/dorado) with the high accuracy (hac) model (dna_r10.4.1_e8.2_400bps_hac@v4.2.0). In addition, base calling of the HG002 sample was also completed with the super accuracy (sup) model (dna_r10.4.1_e8.2_400bps_sup@v4.2.0). The resulting FASTQ files, with a Phred quality score (Q score) > 9, in the fastq_pass folder, were processed with EPI2ME Labs’ wf-alignment pipeline (https://github.com/epi2me-labs/wf-alignment; v0.5.2). Briefly, FASTQ files were aligned to the GRCh38 reference genome using minimap2 (v2.26)34. EPI2ME Labs’ wf-human-variation pipeline (https://github.com/epi2me-labs/wf-human-variation; v1.7.0) was subsequently employed for genomic variant processing, including SNV and small indel calling with Clair3 (v1.0.4)35, SV calling with Sniffles2 (v2.2)36, and CNV calling with QDNAseq (v1.38)37 using default parameters, with a VNTR annotation file provided for accurate SV identification. Repeat expansions were genotyped using Straglr (https://github.com/philres/straglr)38 as implemented in EPI2ME Labs’ wf-human-variation pipeline v1.7.0.

Benchmarking of variant calling

Variant comparison tools (https://github.com/ga4gh/benchmarking-tools)24 are integral to genomic benchmarking as they identify shared variations between ground-truth calls and comparison results (i.e., true positives [TP]), along with variants unique to each set (i.e., false negatives [FN]), and additional variants (i.e., false positives [FP]). We compared called SNVs and small indels with GIAB ground-truth variants (benchmark version v4.2.1)24 using hap.py v0.3.15 (https://github.com/Illumina/hap.py), and each variant was labelled as TP, FP, or FN. Hap.py also provides precision (positive predictive value [PPV]), recall (sensitivity) and F1 scores (harmonic mean of precision and recall) calculated as follows:

$$F1\,=\frac\right)}+\right)}$$

(3)

For SVs, we employed Truvari v4.1.0 (https://github.com/ACEnglish/truvari)39 to benchmark variants with GIAB ground-truth SVs. Each variant was categorized as TP, FP, or FN based on this comparison.

Rarefaction benchmarking analysis

Rarefaction was performed to evaluate the sensitivity and reliability of long-read variant calling across different sequencing depths. Subsampling of the Binary Alignment Map (BAM) files was performed using Samtools40, by randomly selecting subsets of reads from the original alignment files. The subsampled BAMs were then subjected to variant calling analysis as described in the variant calling section. Benchmarking for SNVs and small indels was conducted as detailed in the benchmarking section. Rarefaction curves were generated using python v3.10.8 and the seaborn v0.12.2 library to illustrate the relationship between sequencing depth and the called variants, enabling the evaluation of variant calling performance and reliability across varied sequencing depths.

Benchmarking analysis for challenging clinically relevant genes

We called SNVs and small indels across genomic regions overlapping challenging clinically relevant genes29 using the original BAM files and pipeline outlined in the variant calling section. Benchmarking for SNVs and small indels was conducted as detailed the benchmarking section.

Methylation analysis

Raw ONT signal data in POD5 files (https://github.com/nanoporetech/pod5-file-format) was base called (Dorado v0.5.0) using the high accuracy (hac) DNA base modification model (dna_r10.4.1_e8.2_400bps_hac@v4.2.0_5mCG_5hmCG@v2) to detect modified bases (i.e. 5-methylcytosine [5mC], 5-hydroxymethylcytosine [5hmC]). The modified BAM files (modBAMs) were aligned to the GRCh38 reference genome and modkit v0.2.3 (https://github.com/nanoporetech/modkit) employed to generate genome-wide summary counts of modified and unmodified bases into bedMethyl files. Haplotype-specific 5mC differentially methylated regions (DMRs) in the HG002 genome were identified using ont-methylDMR-kit (https://github.com/NyagaM/ont-methylDMR-kit), which utilizes the Bioconductor DSS (Dispersion Shrinkage for Sequencing) package41. We used the following DSS41 parameters for calling DMRs: delta (threshold for defining DMRs) at 10%, p-value at < 0.01; minimum DMR length of 100 bps, and at least 10 CpG sites per DMR (https://github.com/NyagaM/ont-methylDMR-kit). The pipeline supports haplotype-specific analysis, DMR detection between two samples, group methyl analysis, as well as genomic annotation of significant DMRs (https://github.com/NyagaM/ont-methylDMR-kit).

Benchmarking results visualization

Plots were generated using the seaborn v0.12.2 and matplotlib v3.7.1, and python v3.10.8.

Newborn Genomics Programme (NBG) study design

This is a research study to determine the medical and economic impacts of rapid whole genome sequencing (rWGS) within the New Zealand health care landscape. Ethics approval was obtained from the Northern B Health and Disability Ethics Committee for the study entitled: Newborn Genomics – Te Ira oo Te Arai (Ethics reference: 2023 FULL 15542). Locality approval was obtained from the Research Review Committee Te Toka Tumai Auckland for the project entitled: Newborn Genomics – Te Ira oo Te Arai (Reference A + 9855 [FULL 15542]). This study is registered in ClinicalTrials.gov (Newborn Genomics Programme; NCT06081075; 2023-10-12). The clinical protocol was adopted and modified as per Lunke et al., 202342.

NBG study participants and recruitment criteria

Children with suspected genetic conditions and their families were recruited into the study from the neonatal and paediatric intensive care units (i.e. NICU and PICU, respectively) and the National Metabolic Service at Te Toka Tumai | Auckland City Hospital (New Zealand) between November 2023 and August 2024. Within NICU, participation was limited to proband-parent trios of critically sick neonates with evidence of a suspected genetic condition, without a clear non-genetic aetiology, or who developed an abnormal response to standard therapy for an underlying condition within the preceding seven days. For infants within PICU or under the care of Metabolic Services, participation was limited to proband/parent trios of children with an acute or chronic illness with evidence of a suspected genetic condition without a clear non-genetic aetiology.

All participants continued to receive the standard of care, irrespective of whether they were included in the study.

Potential participants were referred to the geneticist on-call (by telephone) for a formal genetic review, mainly by a neonatologist or a paediatric intensivist or the lead paediatric subspecialist for the patient when a genetic condition was suspected, or when the aetiology of a condition was unclear and a genetic cause needed to be ruled out to guide further clinical management.

Inclusion and exclusion criteria were modified from Dimmock et al.43 and McKeown et al.44.

The inclusion criteria was:

acutely ill inpatient

admitted to NICU or PICU between April 2023 – March 2026

under the care of the National Metabolic Service between April 2023 and March 2026

within 1 week of hospitalization or within 1 week of developing abnormal response to standard therapy for an underlying condition

suspected genetic condition, without a clear non-genetic aetiology

The exclusion criteria was:

Of note, participants were only considered for the study if they were referred to clinical genetics as a part of their standard of care workup.

Following the referral of potential participants to the study, a multidisciplinary meeting (MDM) was convened via video conference to evaluate the eligibility of the referral based on the study’s inclusion and exclusion criteria. At a minimum, the MDM was comprised of a clinical geneticist, genetic counsellor, principal investigator, project manager, representative from the genomic analytical team (bioinformatician, variant curator) and the referring clinician. After agreeing to participate, the patients were registered on RedCap, and a study reference was generated. Subsequently, the clinical geneticists and genetic counsellors completed the clinical information, including phenotypic characterization using HPO terms, and facilitated informed consent.

Parents or guardians of the proposed probands were informed of the details of the study using the HDEC-approved Newborn Genomics Programme Participant Information Sheet (Supplementary Note 1), and had the opportunity to ask questions to an on-call geneticist and genetic counsellor from the Genetic Health Service New Zealand. Written informed consent was obtained from parents or guardians before any study-specific processes were undertaken (Newborn Genomics Programme consent form; Supplementary Note 2).

Clinical geneticists and subspecialists performed clinical phenotyping, which was recorded on RedCap using the Human Phenotype Ontology (HPO) terms (https://hpo.jax.org/app/) to optimize phenotypic data exchange during the curation stages of the analysis. At the same time, a phenotype-focus gene list was generated using PanelApp (Australia [https://panelapp.agha.umccr.org/] and the UK [https://panelapp.genomicsengland.co.uk/]) and shared with the genomic analytical team for inclusion in the Bayesian AI-based clinical decision support tool (Fabric GEM™ software).

NBG sample collection, DNA extraction, library preparation, sequencing, and variant calling

After obtaining consent, duplicate blood samples were collected: 4 mL EDTA blood samples from the mother and father, and 500 µL EDTA blood samples from the child. One set of samples was sent to the Liggins Institute newborn genomics laboratory for sequencing and variant analysis, while the second set was sent to the clinical laboratory, Victorian Clinical Genetics Services (VCGS) in Melbourne, Australia, for a concurrent, independent, short-read-based analysis as described in Lunke et al. 202342.

High molecular weight DNA was extracted from 300 µl of the whole blood using the Puregene DNA extraction Kit (Qiagen) following the manufacturer’s protocol, and the extracted DNA eluted in nuclease-free water (Thermo Fisher Scientific). The quantification and purity assessment of the DNA samples were performed using the Qubit system (Thermo Fisher Scientific) and a spectrophotometer (Implen NanoPhotometer). The library preparation and sequencing procedures were carried out as detailed in the library preparation section. Finally, the base calling of sequenced reads and variant calling analysis was conducted following the methods described in the base calling and variant calling sections. Of note, we have developed a Standard Operating Procedure (SOP) to lock pipelines to specific versions and outline procedures for updates and upgrades since the software for base calling and variant analysis is frequently updated (Supplementary Note 3).

NBG candidate variant prioritization and genomic results reporting

For variant filtering and prioritisation, we used Fabric GEMTM as the primary interpretation platform and QCI Clinical Insights Interpret-translational (QIAGEN Inc., https://digitalinsights.qiagen.com/) as the secondary ‘research-confirmatory’ platform. We retained variants with ≥ 10 reads, a Variant Allele Frequency (VAF) ≥ 20% in the proband, a frequency ≤ 1% in gnomAD v3.1 and those located in the exonic regions or within +/−20 bases of exon/intron boundaries. Additionally, intronic variants beyond ±20 bases from exon start/end predicted to affect splicing by MaxEntScan were also retained.

In the initial analysis, we focused on variants in candidate genes, HPOs and panels (PanelApp Australia45) suggested by clinicians. If no pathogenic or likely pathogenic variants were identified (based on the American College of Medical Genetics and Genomics [ACMG] guidelines46), we expanded the analysis to all genes. Genes in the incidentalome (PanelApp Australia Version 0.30845) were excluded, unless they were relevant to the patient’s phenotype as indicated by the clinician. In so doing, we adopt a judicious approach to the reporting of variants of uncertain significance (VUS) in the acute care setting, only including those that are deemed related to the patient’s phenotype and are typically very close to being classified as ‘likely pathogenic’ according to ACMG criteria46. This stratification of VUS is recommended by the Association for Clinical Genomic Science (ACGS) in the UK.

A multidisciplinary review meeting (MDM) was then held to evaluate the results. The review MDM comprised the same clinicians and study representatives who attended the recruitment MDM. During the meeting, the genomic data analysts presented the quality control report and discussed the prioritized variants, and the evidence for pathogenic or likely pathogenic variants, for genotype-phenotype correlation. The VCGS results were not shared with the NBG team, ensuring they remained blinded to the clinically validated results until the variant review MDM. Finally, the clinical geneticist and genetic counsellor disclosed and discussed the molecular diagnosis based on the accredited acute care genomics service (VCGS) results.

Upon completing the variant review meeting, the NBG study team generated a research report. Simultaneously, the clinical laboratory (VCGS) produced its validated clinical report, which was directly returned to the clinical team for disclosure to the families. Finally, the genetic counsellor communicated the study report findings to the study participants and addressed any discrepancies identified in the reports.

Ethics statement

This study was performed in line with the principles of the Declaration of Helsinki. Ethics approval was obtained from the Northern B Health and Disability Ethics Committee for the study entitled: Newborn Genomics – Te Ira oo Te Arai (Ethics reference: 2023 FULL 15542). Locality approval was obtained from the Research Review Committee Te Toka Tumai Auckland for the project entitled: Newborn Genomics – Te Ira oo Te Arai (Reference A + 9855 [FULL 15542]).

Patient consent statement

Parents of the participating newborns provided written informed consent.

留言 (0)

沒有登入
gif