Improvement of multi-task learning by data enrichment: application for drug discovery

Williams AJ, Pence HE (2017) The future of chemical information is now. Chem Int 39(3):9–14. https://doi.org/10.1515/ci-2017-0304

Article CAS Google Scholar

Tetko IV, Engkvist O, Chen H (2016) Does ‘Big Data’ exist in medicinal chemistry, and if so, how can it be harnessed? Future Med Chem 8(15):1801–1806. https://doi.org/10.4155/fmc-2016-0163

Article CAS PubMed Google Scholar

Nikitina AA, Orlov AA, Kozlovskaya LI, Palyulin VA, Osolodkin DI (2019) Enhanced taxonomy annotation of antiviral activity data from ChEMBL. Database 2019:139. https://doi.org/10.1093/database/bay139

Article CAS Google Scholar

Sosnin S, Karlov D, Tetko IV, Fedorov MV (2019) Comparative study of multitask toxicity modeling on a broad chemical space. J Chem Inf Model 59(3):1062–1072. https://doi.org/10.1021/acs.jcim.8b00685

Article CAS PubMed Google Scholar

Jain S, Siramshetty VB, Alves VM, Muratov EN, Kleinstreuer N, Tropsha A, Nicklaus MC, Simeonov A, Zakharov AV (2021) Large-scale modeling of multispecies acute toxicity end points using consensus of multitask deep learning methods. J Chem Inf Model 61(2):653–663. https://doi.org/10.1021/acs.jcim.0c01164

Article CAS PubMed PubMed Central Google Scholar

Martin EJ, Polyakov VR, Tian L, Perez RC (2017) Profile-QSAR 2.0: kinase virtual screening accuracy comparable to four-concentration IC50s for realistically novel compounds. J Chem Inf Model 57(8):2077–2088. https://doi.org/10.1021/acs.jcim.7b00166

Article CAS PubMed Google Scholar

Martin EJ, Polyakov VR, Zhu X-W, Tian L, Mukherjee P, Liu X (2019) All-assay-Max2 pQSAR: activity predictions as accurate as four-concentration IC50s for 8558 Novartis assays. J Chem Inf Model 59(10):4450–4459. https://doi.org/10.1021/acs.jcim.9b00375

Article CAS PubMed Google Scholar

Sosnin S, Vashurina M, Withnall M, Karpov P, Fedorov M, Tetko I (2018) A survey of multi-task learning methods in chemoinformatics. Mol Inf. https://doi.org/10.1002/minf.201800108

Article Google Scholar

Joshi A, Karimi S, Sparks R, Paris C, MacIntyre CR (2019) Does multi-task learning always help?: an evaluation on health informatics. In: Proceedings of the The 17th annual workshop of the Australasian Language Technology Association. Australasian Language Technology Association, Sydney, pp 151–158

Zhang Y, Yang Q (2021) A survey on multi-task learning. http://arxiv.org/abs/1707.08114 [cs]

Xu Y, Pei J, Lai L (2017) Deep learning based regression and multiclass models for acute oral toxicity prediction with automatic chemical feature extraction. J Chem Inf Model 57(11):2672–2685. https://doi.org/10.1021/acs.jcim.7b00244

Article CAS PubMed Google Scholar

Montanari F, Kuhnke L, Ter Laak A, Clevert D-A (2020) Modeling physico-chemical ADMET endpoints with multitask graph convolutional networks. Molecules 25(1):44. https://doi.org/10.3390/molecules25010044

Article CAS Google Scholar

Lenselink EB, ten Dijke N, Bongers B, Papadatos G, van Vlijmen HWT, Kowalczyk W, IJzerman AP, van Westen GJP, (2017) Beyond the hype: deep neural networks outperform established methods using a ChEMBL bioactivity benchmark set. J Cheminform 9(1):45. https://doi.org/10.1186/s13321-017-0232-0

Yuan H, Paskov I, Paskov H, González AJ, Leslie CS (2016) Multitask learning improves prediction of cancer drug sensitivity. Sci Rep 6(1):31619. https://doi.org/10.1038/srep31619

Article CAS PubMed PubMed Central Google Scholar

Kalakoti Y, Yadav S, Sundar D (2022) Deep neural network-assisted drug recommendation systems for identifying potential drug-target interactions. ACS Omega 7(14):12138–12146. https://doi.org/10.1021/acsomega.2c00424

Article CAS PubMed PubMed Central Google Scholar

Weaver S, Gleeson MP (2008) The importance of the domain of applicability in QSAR modeling. J Mol Graph Model 26(8):1315–1326. https://doi.org/10.1016/j.jmgm.2008.01.002

Article CAS PubMed Google Scholar

Sahigara F, Mansouri K, Ballabio D, Mauri A, Consonni V, Todeschini R (2012) Comparison of different approaches to define the applicability domain of QSAR models. Molecules 17(5):4791–4810. https://doi.org/10.3390/molecules17054791

Article CAS PubMed PubMed Central Google Scholar

Rakhimbekova A, Madzhidov TI, Nugmanov RI, Gimadiev TR, Baskin II, Varnek A (2020) Comprehensive analysis of applicability domains of QSPR models for chemical reactions. Int J Mol Sci 21(15):5542. https://doi.org/10.3390/ijms21155542

Article CAS PubMed PubMed Central Google Scholar

Kar S, Roy K, Leszczynski J (2018) Applicability domain: a step toward confident predictions and decidability for QSAR modeling. In: Nicolotti O (ed) Computational toxicology: methods and protocols. Methods in molecular biology. Springer, New York, pp 141–169. https://doi.org/10.1007/978-1-4939-7899-1_6

Chapter Google Scholar

OECD (2014) Guidance document on the validation of (quantitative) structure-activity relationship [(Q)SAR] models. https://doi.org/10.1787/9789264085442-en

Sheridan RP, Feuston BP, Maiorov VN, Kearsley SK (2004) Similarity to molecules in the training set is a good discriminator for prediction accuracy in QSAR. J Chem Inf Comput Sci 44(6):1912–1928. https://doi.org/10.1021/ci049782w

Article CAS PubMed Google Scholar

Kaneko H, Funatsu K (2014) Applicability domain based on ensemble learning in classification and regression analyses. J Chem Inf Model 54(9):2469–2482. https://doi.org/10.1021/ci500364e

Article CAS PubMed Google Scholar

Tropsha A, Gramatica P, Gombar VK (2003) The importance of being earnest: validation is the absolute essential for successful application and interpretation of QSPR models. QSAR Comb Sci 22(1):69–77. https://doi.org/10.1002/qsar.200390007

Article CAS Google Scholar

Hemmateenejad B, Yazdani M (2009) QSPR models for half-wave reduction potential of steroids: a comparative study between feature selection and feature extraction from subsets of or entire set of descriptors. Anal Chim Acta 634(1):27–35. https://doi.org/10.1016/j.aca.2008.11.062

Article CAS PubMed Google Scholar

Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, Light Y, McGlinchey S, Michalovich D, Al-Lazikani B, Overington JP (2012) ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res 40(Database issue):1100–1107. https://doi.org/10.1093/nar/gkr777. Accessed 8 Jan 2023

Bento AP, Gaulton A, Hersey A, Bellis LJ, Chambers J, Davies M, Krüger FA, Light Y, Mak L, McGlinchey S, Nowotka M, Papadatos G, Santos R, Overington JP (2014) The ChEMBL bioactivity database: an update. Nucleic Acids Res 42(Database issue):1083–1090. https://doi.org/10.1093/nar/gkt1031

Article CAS Google Scholar

Sosnina EA, Sosnin S, Nikitina AA, Nazarov I, Osolodkin DI, Fedorov MV (2020) Recommender systems in antiviral drug discovery. ACS Omega 5(25):15039–15051. https://doi.org/10.1021/acsomega.0c00857

Article CAS PubMed PubMed Central Google Scholar

Landrum G (2016) Rdkit: open-source cheminformatics software

Zhang L, Tan J, Han D, Zhu H (2017) From machine learning to deep learning: progress in machine intelligence for rational drug discovery. Drug Discov Today 22(11):1680–1685. https://doi.org/10.1016/j.drudis.2017.08.010

Article PubMed Google Scholar

Chen H, Engkvist O, Wang Y, Olivecrona M, Blaschke T (2018) The rise of deep learning in drug discovery. Drug Discov Today 23(6):1241–1250. https://doi.org/10.1016/j.drudis.2018.01.039

Article PubMed Google Scholar

Nag S, Baidya ATK, Mandal A, Mathew AT, Das B, Devi B, Kumar R (2022) Deep learning tools for advancing drug discovery and development. 3 Biotech 12(5):110. https://doi.org/10.1007/s13205-022-03165-8

Article PubMed PubMed Central Google Scholar

Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A, Kopf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S (2019) Pytorch: an imperative style, high-performance deep learning library. In: Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R (eds) Advances in neural information processing systems 32. Curran Associates Inc., Red Hook, pp 8024–8035. https://doi.org/10.48550/arXiv.1912.01703

Chapter Google Scholar

Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13(10):281–305

Google Scholar

Sosnina EA, Sosnin S, Fedorov MV (2023) ImprovingMTT. GitHub. https://github.com/ekaterina-sea/ImprovingMTT

Sheridan RP (2013) Time-split cross-validation as a method for estimating the goodness of prospective prediction. J Chem Inf Model 53(4):783–790. https://doi.org/10.1021/ci400084k. Accessed 11 Jan 2023

Schuffenhauer A, Ertl P, Roggo S, Wetzel S, Koch MA, Waldmann H (2007) the scaffold tree—visualization of the scaffold universe by hierarchical scaffold classification. J Chem Inf Model 47(1):47–58. https://doi.org/10.1021/ci600338x. Accessed 11 Jan 2023

Karlov DS, Sosnin S, Tetko IV, Fedorov MV (2019) Chemical space exploration guided by deep neural networks. RSC Adv 9(9):5151–5157. https://doi.org/10.1039/C8RA10182E

Article CAS PubMed PubMed Central Google Scholar

Wainer J, Cawley G (2021) Nested cross-validation when selecting classifiers is overzealous for most practical applications. Expert Syst Appl 182:115222. https://doi.org/10.1016/j.eswa.2021.115222

Article Google Scholar

Lika B, Kolomvatsos K, Hadjiefthymiades S (2014) Facing the cold start problem in recommender systems. Expert Syst Appl 41(4, Part 2):2065–2073. https://doi.org/10.1016/j.eswa.2013.09.005

Article Google Scholar

Sethi R, Mehrotra M (2021) Cold start in recommender systems—a survey from domain perspective. In: Hemanth J, Bestak R, Chen JI-Z (eds) Intelligent data communication technologies and internet of things. Lecture notes on data engineering and communications technologies. Springer, Singapore, pp 223–232. https://doi.org/10.1007/978-981-15-9509-7_19

Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830

Google Scholar

Harris CR, Millman KJ, van der Walt SJ, Gommers R, Virtanen P, Cournapeau D, Wieser E, Taylor J, Berg S, Smith NJ, Kern R, Picus M, Hoyer S, van Kerkwijk MH, Brett M, Haldane A, del Río JF, Wiebe M, Peterson P, Gérard-Marchant P, Sheppard K, Reddy T, Weckesser W, Abbasi H, Gohlke C, Oliphant TE (2020) Array programming with NumPy. Nature 585(7825):357–362. https://doi.org/10.1038/s41586-020-2649-2

Article CAS PubMed PubMed Central Google Scholar

Safari S, Baratloo A, Elfil M, Negida A (2016) Evidence based emergency medicine; Part 5 receiver operating curve and area under the curve. Emergency (Tehran) 4(2):111–113. https://doi.org/10.22037/aaem.v4i2.232

Chicco D, Warrens MJ, Jurman G (2021) The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Comput Sci 7:623. https://doi.org/10.7717/peerj-cs.623

Article Google Scholar

Onyutha C (2021) A hydrological model skill score and revised R-squared. Hydrol Res 53(1):51–64. https://doi.org/10.2166/nh.2021.071

Article Google Scholar

Li Z, Kamnitsas K, Glocker B (2021) Anal

View original article

JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN

Like

分享书签

0 0 0 0 0 0 0

More from this channel

Improvement of multi-task learning by data enrichment: application for drug discovery

留言 (0)