Cloud Computing Enabled Big Multi-Omics Data Analytics

1. Szczerba, M, Wiewiórka, MS, Okoniewski, MJ, Rybiński, H. Scalable cloud-based data analysis software systems for big data from next generation sequencing. In: Japkowicz, N, Stefanowski, J, eds. Big Data Analysis: New Algorithms for a New Society. Cham, Switzerland: Springer International Publishing; 2016:263-283. doi:10.1007/978-3-319-26989-4_11.
Google Scholar | Crossref2. Tomczak, K, Czerwińska, P, Wiznerowicz, M. The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge. Contemp Oncol (Poznan, Poland). 2015;19:A68-A77.
Google Scholar | Medline3. Firebrows . http://firebrowse.org/ (accessed December 10, 2020).
Google Scholar4. Wilson, S, Fitzsimons, M, Ferguson, M, et al. Developing cancer informatics applications and tools using the NCI genomic data commons API. Cancer Research. 2017;77:e15-e18. doi:10.1158/0008-5472.CAN-17-0598.
Google Scholar | Crossref | Medline5. Shilo, S, Rossman, H, Segal, E. Axes of a revolution: challenges and promises of big data in healthcare. Nat Med. 2020;26:29-38.
Google Scholar | Crossref | Medline6. Schüssler-Fiorenza Rose, SM, Contrepois, K, Moneghetti, KJ, et al. A longitudinal big data approach for precision health. Nat Med. 2019;25:792-804.
Google Scholar | Crossref | Medline7. Wu, C, Zhou, F, Ren, J, Li, X, Jiang, Y, Ma, S. A selective review of multi-level omics data integration using variable selection. High Throughput. 2019;8:4.
Google Scholar | Crossref8. Grabowski, P, Rappsilber, J. A primer on data analytics in functional genomics: how to move from data to insight? Trends Biochem Sci. 2019;44:21-32. doi:10.1016/j.tibs.2018.10.010.
Google Scholar | Crossref | Medline9. Perez-Riverol, Y, Zorin, A, Dass, G, et al. Quantifying the impact of public omics data. Nat Commun. 2019;10:3512.
Google Scholar | Crossref | Medline10. Chen, B, Butte, A. Leveraging big data to transform target selection and drug discovery. Clin Pharmacol Ther. 2016;99:285-297. doi:10.1002/cpt.318.
Google Scholar | Crossref | Medline11. Wood, DE, White, JR, Georgiadis, A, et al. A machine learning approach for somatic mutation discovery. Sci Transl Med. 2018;10:eaar7939.
Google Scholar | Crossref | Medline12. Krumm, N, Hoffman, N. Practical cost analysis of genomic data in the cloud. Am J Clin Pathol. 2019;152:S2-S3.
Google Scholar | Crossref13. He, KY, Ge, D, He, MM. Big data analytics for genomic medicine. Int J Mol Sci. 2017;18:412.
Google Scholar | Crossref14. Langmead, B, Nellore, A. Cloud computing for genomic data analysis and collaboration. Nat Rev Genet. 2018;19:325.
Google Scholar | Crossref | Medline15. Halligan, BD, Geiger, JF, Vallejos, AK, Greene, AS, Twigger, SN. Low cost, scalable proteomics data analysis using Amazon’s cloud computing services and open source search algorithms. J Proteome Res. 2009;8:3148-3153.
Google Scholar | Crossref | Medline16. Dalman, T, Dörnemann, T, Juhnke, E, et al. Metabolic flux analysis in the cloud. Paper presented at: ESCIENCE ‘10: Proceedings of the 2010 IEEE Sixth International Conference on e-Science; ; Brisbane, QLD, Australia. doi:10.1109/eScience.2010.20.
Google Scholar | Crossref17. Yahara, K, Suzuki, M, Hirabayashi, A, et al. Long-read metagenomics using PromethION uncovers oral bacteriophages and their interaction with host bacteria. Nat Commun. 2021;12:27. doi:10.1038/s41467-020-20199-9.
Google Scholar | Crossref | Medline18. Murigneux, V, Rai, SK, Furtado, A, et al. Comparison of long-read methods for sequencing and assembly of a plant genome. GigaScience. 2020;9:giaa146. doi:10.1093/gigascience/giaa146.
Google Scholar | Crossref | Medline19. Biswas, N, Chakrabarti, S. Artificial intelligence (AI)-based systems biology approaches in multi-omics data analysis of cancer. Front Oncol. 2020;10:588221. doi:10.3389/fonc.2020.588221.
Google Scholar | Crossref | Medline20. Taylor, RC. An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics. BMC Bioinformatics. 2010;11:S1.
Google Scholar | Crossref | Medline | ISI21. Vasaikar, SV, Straub, P, Wang, J, Zhang, B. LinkedOmics: analyzing multi-omics data within and across 32 cancer types. Nucleic Acids Res. 2017;46:D956-D963. doi:10.1093/nar/gkx1090.
Google Scholar | Crossref22. Boisvert, S, Laviolette, F, Corbeil, J. Ray: simultaneous assembly of reads from a mix of high-throughput sequencing technologies. J Comput Biol. 2010;17:1519-1533.
Google Scholar | Crossref | Medline | ISI23. Simpson, JT, Wong, K, Jackman, SD, Schein, JE, Jones, SJM, Birol, I. ABySS: a parallel assembler for short read sequence data. Genome Res. 2009;19:1117-1123.
Google Scholar | Crossref | Medline | ISI24. Meng, J, Wang, B, Wei, Y, Feng, S, Balaji, P. SWAP-Assembler: scalable and efficient genome assembly towards thousands of cores. BMC Bioinformatics. 2014;15:S2.
Google Scholar | Crossref | Medline25. Decap, D, Reumers, J, Herzeel, C, Costanza, P, Fostier, J. Halvade: scalable sequence analysis with MapReduce. Bioinformatics. 2015;31:2482-2488. doi:10.1093/bioinformatics/btv179.
Google Scholar | Crossref | Medline26. Guo, R, Zhao, Y, Zou, Q, Fang, X, Peng, S. Bioinformatics applications on Apache Spark. GigaScience. 2018;7:giy098.
Google Scholar27. Štufi, M, Bačić, B, Stoimenov, L. Big data analytics and processing platform in Czech Republic Healthcare. Appl Sci. 2020;10:1705. doi:10.3390/app10051705.
Google Scholar | Crossref28. Langmead, B, Schatz, MC, Lin, J, Pop, M, Salzberg, SL. Searching for SNPs with cloud computing. Genome Biol. 2009;10:R134. doi:10.1186/gb-2009-10-11-r134.
Google Scholar | Crossref29. Langmead, B, Trapnell, C, Pop, M, Salzberg, SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10:R25. doi:10.1186/gb-2009-10-3-r25.
Google Scholar | Crossref30. Gu, S, Fang, L, Xu, X. Using SOAPaligner for short reads alignment. Curr Protoc Bioinformatics. 2013;44:11.11.1-11.11.17. doi:10.1002/0471250953.bi1111s44.
Google Scholar | Crossref | Medline31. Zou, Q, Li, X-B, Jiang, W-R, Lin, Z-Y, Li, G-L, Chen, K. Survey of MapReduce frame operation in bioinformatics. Brief Bioinform. 2013;15:637-647. doi:10.1093/bib/bbs088.
Google Scholar | Crossref | Medline32. Pandey, RV, Schlötterer, C. DistMap: a toolkit for distributed short read mapping on a Hadoop cluster. PLoS ONE. 2013;8:e72614.
Google Scholar | Crossref33. Lewis, S, Csordas, A, Killcoyne, S, et al. Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework. BMC Bioinformatics. 2012;13:324.
Google Scholar | Crossref | Medline | ISI34. McKenna, A, Hanna, M, Banks, E, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20:1297-1303.
Google Scholar | Crossref | Medline | ISI35. Niemenmaa, M, Kallio, A, Schumacher, A, Klemela, P, Korpelainen, E, Heljanko, K. Hadoop-BAM: directly manipulating next generation sequencing data in the cloud. Bioinformatics. 2012;28:876-877. doi:10.1093/bioinformatics/bts054.
Google Scholar | Crossref | Medline | ISI36. O’Connor, BD, Merriman, B, Nelson, SF. SeqWare Query Engine: storing and searching sequence data in the cloud. BMC Bioinformatics. 2010;11:S2.
Google Scholar | Medline37. Matthews, SJ, Williams, TL. MrsRF: an efficient MapReduce algorithm for analyzing large collections of evolutionary trees. BMC Bioinformatics. 2010;11:S15.
Google Scholar | Crossref | Medline38. Weber, N, Liou, D, Dommer, J, et al. Nephele: a cloud platform for simplified, standardized and reproducible microbiome data analysis. Bioinformatics. 2018;34:1411-1413. doi:10.1093/bioinformatics/btx617.
Google Scholar | Crossref | Medline39. Vouzis, PD, Sahinidis, NV. GPU-BLAST: using graphics processors to accelerate protein sequence alignment. Bioinformatics (Oxford, England). 2011;27:182-188.
Google Scholar | Crossref | Medline | ISI40. Liu, C-M, Wong, T, Wu, E, et al. SOAP3: ultra-fast GPU-based parallel alignment tool for short reads. Bioinformatics. 2012;28:878-879. doi:10.1093/bioinformatics/bts061.
Google Scholar | Crossref | Medline | ISI41. Leo, S, Santoni, F, Zanetti, G. Biodoop: bioinformatics on Hadoop. Paper presented at: 2009 International Conference on Parallel Processing Workshops; , 2009:415-422; Vienna, Austria. doi:10.1109/ICPPW.2009.37.
Google Scholar | Crossref42. Nordberg, H, Bhatia, K, Wang, K, Wang, Z. BioPig: a Hadoop-based analytic toolkit for large-scale sequence data. Bioinformatics. 2013;29:3014-3019. doi:10.1093/bioinformatics/btt528.
Google Scholar | Crossref | Medline43. Schumacher, A, Pireddu, L, Niemenmaa, M, et al. SeqPig: simple and scalable scripting for large sequencing data sets in Hadoop. Bioinformatics. 2013;30:119-120. doi:10.1093/bioinformatics/btt601.
Google Scholar | Crossref | Medline44. Di Tommaso, P, Chatzou, M, Floden, EW, Barja, PP, Palumbo, E, Notredame, C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017;35:316-319. doi:10.1038/nbt.3820.
Google Scholar | Crossref | Medline45. Köster, J, Rahmann, S. Snakemake—a scalable bioinformatics workflow engine. Bioinformatics. 2012;28:2520-2522. doi:10.1093/bioinformatics/bts480.
Google Scholar | Crossref | Medline46. Mölder, F, Jablonski, K, Letcher, B, et al. Sustainable data analysis with Snakemake [version 1; peer review: 1 approved, 1 approved with reservations]. F1000Res. 2021;10:33. doi:10.12688/f1000research.29032.1.
Google Scholar | Crossref | Medline47. Yang, A, Troup, M, Lin, P, Ho, JWK. Falco: a quick and flexible single-cell RNA-seq processing framework on the cloud. Bioinformatics. 2017;33:767-769. doi:10.1093/bioinformatics/btw732.
Google Scholar | Crossref | Medline48. Mell, P, Grance, T. The NIST Definition of Cloud Computing. Gaithersburg, MD: National Institute of Standards and Technology (NIST); 2011. http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf.
Google Scholar49. Dai, L, Gao, X, Guo, Y, Xiao, J, Zhang, Z. Bioinformatics clouds for big data manipulation. Biol Direct. 2012;7:43; discussion 43.
Google Scholar | Crossref | Medline | ISI50. Stephens, ZD, Lee, SY, Faghri, F, et al. Big data: astronomical or genomical? PLoS Biol. 2015;13:e1002195. doi:10.1371/journal.pbio.1002195.
Google Scholar | Crossref | Medline51. Howe, KL, Achuthan, P, Allen, J, et al. Ensembl 2021. Nucleic Acids Res. 2021;49:

留言 (0)

沒有登入
gif