Sergey Koren1,
Zhigui Bao2,
3,
Andrea Guarracino4,
Shujun Ou5,
Sara Goodwin6,
Katharine M. Jenike7,
Julian Lucas8,
Brandy McNulty8,
Jimin Park8,
Mikko Rautiainen1,
Arang Rhie1,
Dick Roelofs9,
Harrie Schneiders9,
Ilse Vrijenhoek9,
Koen Nijbroek9,
Olle Nordesjo10,
Sergey Nurk10,
Mike Vella10,
Katherine R. Lawrence10,
Doreen Ware6,
11,
Michael C. Schatz7,
Erik Garrison4,
Sanwen Huang3,
12,
William Richard McCombie6,
Karen H. Miga8,
Alexander H.J. Wittenberg9 and
Adam M. Phillippy1
1Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National
Institutes of Health, Bethesda, Maryland 20892, USA;
2Department of Molecular Biology, Max Planck Institute for Biology Tübingen, 72076 Tübingen, Baden-Württemberg, Germany;
3Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture
and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120,
China;
4Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee 38163,
USA;
5Department of Molecular Genetics, Ohio State University, Columbus, Ohio 43210, USA;
6Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA;
7Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218, USA;
8Genomics Institute, University of California Santa Cruz, Santa Cruz, California 95060, USA;
9KeyGene, 6708 PW Wageningen, Netherlands;
10Oxford Nanopore Technologies, Oxford OX4 4DQ, United Kingdom;
11USDA ARS NEA Plant, Soil and Nutrition Laboratory Research Unit, Ithaca, New York 14853, USA;
12State Key Laboratory of Tropical Crop Breeding, Chinese Academy of Tropical Agricultural Sciences, Haikou, Hainan 571101,
China
Corresponding authors: sergey.korennih.gov, adam.phillippynih.gov
Abstract
The combination of ultra-long (UL) Oxford Nanopore Technologies (ONT) sequencing reads with long, accurate Pacific Bioscience
(PacBio) High Fidelity (HiFi) reads has enabled the completion of a human genome and spurred similar efforts to complete the
genomes of many other species. However, this approach for complete, “telomere-to-telomere” genome assembly relies on multiple
sequencing platforms, limiting its accessibility. ONT “Duplex” sequencing reads, where both strands of the DNA are read to
improve quality, promise high per-base accuracy. To evaluate this new data type, we generated ONT Duplex data for three widely
studied genomes: human HG002, Solanum lycopersicum Heinz 1706 (tomato), and Zea mays B73 (maize). For the diploid, heterozygous HG002 genome, we also used “Pore-C” chromatin contact mapping to completely phase
the haplotypes. We found the accuracy of Duplex data to be similar to HiFi sequencing, but with read lengths tens of kilobases
longer, and the Pore-C data to be compatible with existing diploid assembly algorithms. This combination of read length and
accuracy enables the construction of a high-quality initial assembly, which can then be further resolved using the UL reads,
and finally phased into chromosome-scale haplotypes with Pore-C. The resulting assemblies have a base accuracy exceeding 99.999%
(Q50) and near-perfect continuity, with most chromosomes assembled as single contigs. We conclude that ONT sequencing is a
viable alternative to HiFi sequencing for de novo genome assembly, and provides a multirun single-instrument solution for
the reconstruction of complete genomes.
Received March 15, 2024.
Accepted October 8, 2024.
留言 (0)