Network pharmacology research relies heavily on a diverse array of databases. In recent years, the rapid advancement of big data and AI has led to the emergence of numerous databases. Most of these databases encompass a vast collection of foundational experimental data, along with potential chemical components and target information predicted through various algorithms [20]. These databases form the bedrock of network pharmacology studies and can be categorized into herbal databases (Table 1), chemical component databases (Table 2), disease databases (Table 3), and network pharmacology analysis platforms (Table 4). This section offers an overview of the key content and functions of frequently utilized databases in network pharmacology research.
Herbal databasesTraditional Chinese Medicine Systems Pharmacology Database and Analysis Platform (TCMSP) [21] includes 500 varieties of herbs from 38 categories as outlined in the 2010 edition of the “Chinese Pharmacopoeia”. It provides related chemical components and pharmacokinetic characteristics such as oral bioavailability (OB), drug similarity (DL), permeability in intestinal epithelial cells, the blood–brain barrier, and water solubility, alongside 3,339 potential targets sourced from the DrugBank database. Accessible at https://tcmsp-e.com/, it supports data queries related to herbs, chemical components, targets, and diseases, enabling users to delve into detailed pages via the corresponding TCMSP ID. One notable feature of TCMSP is its capability to screen and analyze chemical components, allowing users to filter pharmacokinetic parameters like OB and DL to identify target chemical components and their corresponding targets. The filtered information can then be imported into Cytoscape software for visual network analysis.
The Encyclopedia of Traditional Chinese Medicine (ETCM) [22, 23] encompasses 403 kinds of herbs from the 2015 edition of the “Chinese Pharmacopoeia”, along with 3,962 Chinese herbal formulations approved by the China National Medical Product Administration, 7274 chemical components obtained through manual retrieval from the PubChem database, and 3,027 disease entries sourced from HPO, OMIM, DisGeNET, and ORPHANET databases. Available at http://www.tcmip.cn/ETCM/, it offers comprehensive information on herbs, Chinese herbal formulations, chemical components, and corresponding targets, along with GO and KEGG enrichment analysis functions. Users can also predict new drug target information based on known chemical structures, and ETCM includes a systematic analysis function to explore relationships between herbs, formulations, components, gene targets, relevant pathways, and diseases, thereby establishing connected entity networks.
Symptom Mapping (SymMap) [24] includes 499 kinds of herbs from the 2015 edition of the “Chinese Pharmacopoeia”, 1717 manually organized and standardized Chinese medicine symptoms, 961 Western medicine symptoms mapped to Chinese medicine symptoms via the UMLS database, 19,595 drug components from TCMID, TCMSP, and TCM-ID databases, 4302 drug targets from HIT, TCMSP, HPO, DrugBank, and NCBI databases, and 5235 diseases from OMIM, MeSH, and Orphant databases. SymMap contains 6638 connections between herbs and Chinese medicine symptoms, 2978 connections between Western medicine and TCM symptoms, 12,107 connections between Western medicine symptoms and diseases, and 48,372 connections between herbs and components. Available at http://www.symmap.org/, it serves as an integrative database focused on the interplay between TCM and Western medicine. Users can browse and download entity information and interaction relationships from the homepage. Furthermore, the SymMap database presents an association network between entities, where information can also be filtered by parameters such as P values, FDRs (BH), and FDR (Bonferroni).
A Bioinformatics Analysis Tool for Molecular Mechanism of Traditional Chinese Medicine (BATMAN-TCM) [25] features 54,832 Chinese herbal formulations from TCMID, TCM-Suite, LTM-TCM, and ITCM databases, 8404 herbs from TCMID, TCM-Suite, HERB, and ITCM databases, 39,171 chemical components from TCMID, TCM-Suite, HERB, and ITCM databases, 9927 target proteins from the SWISS-PROT database, 217 pathways from the KEGG database, 11,931 functional entries from the Gene Ontology database, 5128 diseases from the OMIM database, and 1504 disease entries from the TTD database. Accessible at http://bionet.ncpsb.org.cn/, it is dedicated to analyzing the action mechanisms of medicine, featuring functions such as chemical component target prediction, functional analysis, and visualization of component–target–disease/pathway networks. Users can input herbal formulations, herb names, PubChem IDs, or InChIKey codes, and BATMAN-TCM will automatically retrieve constituent compounds and targets for further analysis. Additionally, it supports multi-threaded analysis, enabling users to submit multiple tasks, with a Venn diagram displaying target comparisons between tasks and enrichment results shown in a consolidated manner on the functional enrichment analysis page.
Traditional Chinese Medicine Integrative Database (TCMID) [26] compiles 46,914 herbal formulations from literature mining, 8159 herbs from the TCM-ID database and literature sources, 25,210 chemical components of from TCM-ID, HIT, TCM@Taiwan databases, and literature mining, 17,521 targets from HIT, STITCH, OMIM, FrugBank, and literature, 3791 disease entries from the OMIM database, and 6,826 drugs from the DrugBank database. Accessible at http: //wwwmegabionet.org/tcmid/, it aims to establish connections between disease targets and those of herbs, inferring potential therapeutic targets. Its primary focus is on network visualization and predicting unknown drug targets. If two chemical components interact with the same protein or different proteins based on the network, users can infer potential synergistic or antagonistic effects. If a component of herb interacts with a disease target protein, it suggests therapeutic mechanisms, and if the components share action targets with drugs in the DrugBank database, it infers potential therapeutic targets.
A High-throughput Experiment-and Reference-guided Database of Traditional Chinese Medicine (HERB) [27] includes 7263 kinds of herbs and their processed products from SymMap, TCMID, TCMSP, and TCM-ID databases, 49,258 chemical components, 1037 high-throughput sequencing experimental data from the NCBI GEO database, and 1966 references related to herbs and its components from the past decade, along with 12,933 targets from SymMap, HIT, TCMSP, and TCMID databases, and 28,212 diseases from the DisGetNet database. Accessible at,http://herb.ac.cn/ it integrates high-throughput experimental data with literature data mining, offering functions like browsing, searching, viewing, and downloading data related to herbs, components, target genes, diseases, high-throughput experiments, and reference data, while displaying GO and KEGG enrichment results. The high-throughput data within HERB is also visualized, allowing users to select different datasets for display.
Similar databases include TCM Database@TaiWan [28], which features 443 herbs, over 20,000 compounds, and their three-dimensional structural information. Accessible at http://tcm.cmu.edu.tw/, it enables docking of small molecule drugs with macromolecules for computer-aided drug design. The Traditional Chinese Medicine Information Database (TCM-ID) [29], developed by the National University of Singapore, includes 1588 herbal formulations, 1313 herbs, 5669 chemical components, and 3725 three-dimensional structures. Available at https://www.bidd.group/TCMID/, it supports drug target enrichment analysis and visualization of gene expression from individual patient samples. Herbal Ingredients’ Targets Database (HIT) [30] features 1250 varieties of herbs, 1237 chemical components, 2208 targets, 10,031 compound-target activity pairs, 1231 therapeutic targets, and 56 micro-RNA targets. Accessible at http://www.badd-cao.net:2345/, it supports drug target prediction alongside querying functions.
Databases such as TCMSP, ETCM, and HERB represent some of the most widely recognized resources for herbal information. TCMSP exclusively catalogs plant-based herbs, omitting data on minerals and animal-derived substances. In contrast, HERB is considered one of the most comprehensive herbal repositories, incorporating a broad spectrum of data on plant-derived, animal-based, and mineral-origin herbs, alongside information on processed herbal products. Furthermore, specialized databases like SymMap, BATMAN-TCM, and TCMID extend beyond basic herbal data, emphasizing the intricate associations between herbs and various diseases.
Table 1 Herb databases related to network pharmacologyChemical component databasesPubChem (https://pubchem.ncbi.nlm.nih.gov/) [31] is an extensive database that encompasses 118,596,691 compounds, 322,395,335 substances, and 295,360,133 biological activities, along with 41,558,769 literature references, 113,242 gene records, 248,298 protein entries, and 241,163 pathway details. It facilitates the retrieval of compounds through various identifiers such as names and molecular formulas and supports the download of both 2D and 3D structural representations. Users can find detailed information regarding the chemical and physical properties, biological activities, safety and toxicity, patents, and literature citations.
Swiss ADME (http://www.swissadme.ch/) [32] serves as a small molecule drug design platform, enabling researchers to compute physicochemical descriptors and predict ADME parameters, pharmacokinetic properties, drug-like characteristics, and the medicinal chemistry friendliness of single or multiple small molecules, thereby aiding in drug discovery efforts.
ChEMBL (https://www.ebi.ac.uk/) [33] features 2,431,025 distinct compounds, nearly 16,000 targets, and 20,772,701 recorded activities, along with 89,892 publications and 262 deposited datasets. This platform allows users to retrieve comprehensive information about compounds and targets, predict therapeutic targets, and search for structurally similar compounds based on specific structures.
The Drug-Gene Interaction Database (DGIdb) (https://www.dgidb.org/) [34] hosts data on over 10,000 genes and 20,000 drugs linked to nearly 70,000 drug-gene interactions, classified into 43 potentially druggable gene categories. Users can explore drug–gene interactions and input lists of genes to identify all known or potentially druggable genes within their specified sets.
The Comparative Toxicogenomics Database (CTD) (https://ctdbase.org/) [35] includes information on 17,100 chemicals, 54,300 genes, 6100 phenotypes, 7270 diseases, and 202,000 exposure statements. It allows users to investigate the chemical–gene/protein interactions, chemical–disease relationships, and gene-disease relationships.
Search Tool for Interactions of Chemicals Dataset (STTTCH) (http://stitch.embl.de/) [36] is designed for predicting interaction relationships between chemicals and genes. Users can input individual or multiple genes/chemicals, chemical structures, or protein sequences to ascertain predicted chemical compound–gene interactions.
DrugCentral (https://drugcentral.org/) [37] features 4927 active chemical compounds and 112,359 FDA-approved drugs, providing comprehensive information on active ingredients, chemical entities, drug products, mechanisms of action, indications, and pharmacological properties. It also supports drug similarity searches.
DrugBank (https://go.drugbank.com/) [38] hosts a comprehensive database containing 4563 FDA-approved drugs, 6231 investigational compounds, 6231 drug–drug interactions, 2475 drug–food interactions, and 5236 drug-related targets, alongside thousands of pathways. It serves as a powerful tool for searching information on drugs, targets, pathways, and indications, as well as for querying drug–drug and drug–food interactions.
PubChem is the world’s largest repository of chemical compounds, offering comprehensive access to virtually any desired compound. DrugBank, on the other hand, focuses predominantly on chemical drugs, with limited representation of natural small-molecule compounds. Swiss ADME stands out for its predictive capabilities, enabling researchers to estimate the ADME (Absorption, Distribution, Metabolism, and Excretion) properties of input chemical compounds. Additional databases, including ChEMBL, DGIdb, and CTD, are valuable resources for retrieving detailed information on chemical compounds and their molecular targets. Selecting an appropriate chemical database should align with the specific goals of the research being conducted.
Table 2 Chemical component databases related to network pharmacologyDisease databasesDisGeNet (https://disgenet.cn/) [39] encompasses 5912 diseases, 1915 LncRNAs, 16,065 protein-coding genes, and 2611 MicroRNAs, alongside 447,382 interaction relationships. It enables users to query interactions between diseases and LncRNAs, protein-coding genes, and MicroRNAs while supporting specific disease-gene information retrieval and data downloads.
GeneCards (https://www.genecards.org/) [40] is a comprehensive database featuring 466,227 genes, including 43,850 HGNC-approved genes, 21,612 protein-coding genes, and 291,831 RNA genes, comprising 130,757 lncRNAs, 111,811 piRNAs, and 49,263 other ncRNAs. The database also lists 20,956 disease-associated genes and 128,261 functional elements. GeneCards facilitates the retrieval of disease-related targets, gene-specific information, and provides interactive pathway network maps.
The Online Mendelian Inheritance in Man (OMIM) database [41] features 17,375 gene descriptions, 14 combined gene and phenotype entries, 6895 phenotypes with known molecular bases, 1499 phenotypes with unknown molecular bases, and 1736 entries primarily relating to suspected Mendelian phenotypes. The OMIM database (https://www.omim.org/) allows users to query disease-related targets and gene-specific information.
Human Protein Reference Database (HPRD) [42] offers 30,047 protein entries, 41,327 protein–protein interactions, 93,710 post-translational modifications (PTMs), 112,158 protein expression datasets, 22,490 subcellular localizations, 470 domains, and 453,521 PubMed links. HPRD (http://www.hprd.org/) provides a wealth of protein annotation information, including expression profiles, classifications, and structural domains.
Databases like DisGeNet, GeneCards, and OMIM serve as rich repositories of disease-related target data, though their scope of coverage and data curation standards differ. To ensure a more comprehensive identification of disease-associated targets, researchers typically integrate data from multiple sources, leveraging the unique strengths of DisGeNet, GeneCards, and OMIM.
Table 3 Diseases databases related to network pharmacologyNetwork pharmacology analysis platformsSTRING (https://cn.string-db.org/) [43] is a comprehensive database comprising 12,535 organisms and over 59.3 million proteins, encompassing approximately 20 billion documented protein interaction relationships. The platform enables users to retrieve information on individual or multiple proteins, construct protein–protein interaction networks, and conduct Gene Ontology (GO) and KEGG pathway enrichment analyses.
RCSB PDB (https://www.rcsb.org/) [44] hosts 224,931 protein structures and 1,068,577 computationally derived structural models, providing valuable access to the three-dimensional structures of various proteins, which is essential for understanding their functional mechanisms.
DAVID (https://david.ncifcrf.gov/) [45] and Metascape (https://metascape.org/) [46] serve as powerful platforms for GO and KEGG enrichment analyses, offering functionalities for Gene ID conversion alongside comprehensive enrichment analysis capabilities.
The Molecular Interaction Database (MINT) (https://mint.bio.uniroma2.it/) [47] consists of data on 607 species, encompassing 139,547 interaction relationships and 27,756 unique interactors, alongside 6425 related publications. MINT is designed to facilitate the exploration of protein–protein interaction relationships.
The Kyoto Encyclopedia of Genes and Genomes (KEGG) (https://www.kegg.jp/) [48] comprises 573 pathway maps, 201 functional hierarchies, 489 KEGG modules, and 48 reaction modules. This platform allows for extensive pathway retrieval and visualization, enabling users to search for specific pathways that involve particular genes by their gene names. In network pharmacology research, the STRING and MINT databases are widely utilized for constructing PPI networks. For GO functional enrichment analysis and KEGG pathway enrichment analysis, platforms such as DAVID and Metascape are frequently employed. The RCSB PDB database is indispensable for accessing three-dimensional protein structures, which are crucial for molecular docking studies. Given the diverse functionalities of network pharmacology analysis platforms, researchers are advised to choose the platform that best aligns with their specific research objectives.
Table 4 Network pharmacology analysis platforms and databasesConclusionThe herbal, chemical compound, and disease databases, along with network pharmacology analysis platforms mentioned above, are integral tools in contemporary TCM network pharmacology research. Each database type offers unique data, as detailed in this discussion of their specific contents and functionalities. To enhance the reliability and robustness of network pharmacology outcomes, researchers are encouraged to synthesize data from multiple databases in conjunction with insights from existing literature.
留言 (0)