Pangenome-spanning epistasis and coselection analysis via de Bruijn graphs [METHODS]

Juri Kuronen1, Samuel T. Horsfield2,3, Anna K. Pöntinen1,4, Sudaraka Mallawaarachchi1,5,6, Sergio Arredondo-Alonso1, Harry Thorpe1, Rebecca A. Gladstone1, Rob J.L. Willems7, Stephen D. Bentley8, Nicholas J. Croucher2, Johan Pensar9, John A. Lees3, Gerry Tonkin-Hill1,5,6,10,12 and Jukka Corander1,7,11,12 1Department of Biostatistics, University of Oslo, 0372 Blindern, Norway; 2MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, Imperial College London, London W12 0BZ, United Kingdom; 3European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, United Kingdom; 4Norwegian National Advisory Unit on Detection of Antimicrobial Resistance, Department of Microbiology and Infection Control, University Hospital of North Norway, 9019 Tromsø, Norway; 5Peter MacCallum Cancer Centre, Melbourne, Victoria 3052, Australia; 6Sir Peter MacCallum Department of Oncology, University of Melbourne, Melbourne, Victoria 3052, Australia; 7Department of Medical Microbiology, University Medical Center Utrecht, 3584 CX Utrecht, Netherlands; 8Parasites and Microbes, Wellcome Sanger Institute, Cambridge CB10 1RQ, United Kingdom; 9Department of Mathematics, University of Oslo, 0372 Blindern, Norway; 10Department of Microbiology and Immunology, The University of Melbourne, at the Peter Doherty Institute for Infection and Immunity, Melbourne, Victoria 3052, Australia; 11Helsinki Institute of Information Technology, Department of Mathematics and Statistics, University of Helsinki, 00014 Helsinki, Finland

12 These authors contributed equally to this work.

Corresponding authors: gerrytuio.no, jleesebi.ac.uk Abstract

Studies of bacterial adaptation and evolution are hampered by the difficulty of measuring traits such as virulence, drug resistance, and transmissibility in large populations. In contrast, it is now feasible to obtain high-quality complete assemblies of many bacterial genomes thanks to scalable high-accuracy long-read sequencing technologies. To exploit this opportunity, we introduce a phenotype- and alignment-free method for discovering coselected and epistatically interacting genomic variation from genome assemblies covering both core and accessory parts of genomes. Our approach uses a compact colored de Bruijn graph to approximate the intragenome distances between pairs of loci for a collection of bacterial genomes to account for the impacts of linkage disequilibrium (LD). We demonstrate the versatility of our approach to efficiently identify associations between loci linked with drug resistance and adaptation to the hospital niche in the major human bacterial pathogens Streptococcus pneumoniae and Enterococcus faecalis.

Footnotes

[Supplemental material is available for this article.]

Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.278485.123.

Freely available online through the Genome Research Open Access option.

Received September 7, 2023. Accepted July 25, 2024.

留言 (0)

沒有登入
gif