Identifying genes within pathways in unannotated genomes with PaGeSearch [METHODS]

Sohyoung Won1,2, Jaewoong Yu2,3 and Heebal Kim1,2,4 1Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea, 08826; 2eGnome, Incorporated, Seoul, Republic of Korea, 05836; 3UNGENE, Incorporated, Seoul, Republic of Korea, 14556; 4Department of Agricultural Biotechnology and Research Institute for Agriculture and Life Sciences, Seoul National University, Seoul, Republic of Korea, 08826 Corresponding author: heebalsnu.ac.kr Abstract

In biological research, the identification and comparison of genes within specific pathways across the genomes of various species are invaluable. However, annotating the entire genome is resource intensive, and sequence similarity searches often yield results that are not actually genes. To address these limitations, we introduce Pathway Gene Search (PaGeSearch), a tool designed to identify genes from predefined lists, especially those in specific pathways, within genomes. The tool uses an initial sequence similarity search to identify relevant genomic regions, followed by targeted gene prediction and neural network–based result filtering. PaGeSearch suggests the regions that are most likely the orthologs of the genes in the query and is designed to be applicable for species within five classes: mammals, fish, birds, eudicotyledons, and Liliopsida. Compared with GeMoMa and miniprot, PaGeSearch generally outperforms in terms of sensitivity and positive predictive value, as well as negative predictive value. Also, the exon coverage of gene models from PaGeSearch is higher compared with those in GeMoMa and miniprot. Although its performance shows increased variability when applied to actual biological pathways, it nonetheless maintains an acceptable level of accuracy. Evaluating PaGeSearch across different assembly levels, chromosome, scaffold, and contig shows minimal variation in outcomes, indicating that PaGeSearch is resilient to variations in assembly quality.

Footnotes

[Supplemental material is available for this article.]

Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.278566.123.

Freely available online through the Genome Research Open Access option.

Received September 26, 2023. Accepted April 1, 2024.

留言 (0)

沒有登入
gif