GalaxySagittarius-AF: Predicting Targets for Drug-Like Compounds in the Extended Human 3D Proteome

The introduction of innovative artificial intelligence techniques for predicting protein structures from sequences1, 2 has resulted in the expansion of the structure database. Compared to the 17% of all human protein residues with experimental structures, the structure coverage more than doubled utilizing highly accurate model structures predicted using AlphaFold3, 4. The expanded structural coverage of human protein sequences also enriched the structural representation of protein superfamilies, unveiling experimentally unknown novel folds5. This expansion of the protein fold space can facilitate a deeper understanding of their molecular function through elucidating the structure-function relationship6, 7. Thus, it is essential to disseminate this extended structure database across various biopharmaceutical research domains. One direction involves assessing the impact of drug-like molecules on the human biological system based on predicted structures, using cost-effective in silico methods.

As a first step in this direction, we have developed a free web service that identifies protein targets with which a given drug-like molecule can interact within the expanded human 3D proteome. This service can contribute to the drug discovery process by suggesting potential novel targets for drug repurposing and predicting possible off-targets that could result in unintended side effects. Moreover, this web server not only provides information on the potential interactors' identities but also offers specific three-dimensional interaction models that can be valuable for interpreting protein-drug interactions at the atomic level.

Compared to other target prediction methods8, 9, 10 that heavily depend on available activity data, GalaxySagittarius11 is a target prediction webserver employing structure-based and docking-based methods for target prediction. As described in detail in our previous paper11, GalaxySagittarius takes the query compound as input, searches against the database of human protein structures, and returns potential protein targets for the compound. During the search against the target proteins, it utilizes both ligand similarity-based and protein structure-based approaches as efficient screening methods, followed by protein-compound docking for rescoring. This process returns predicted protein targets along with their respective protein-compound complex structures.

One of the crucial components of GalaxySagittarius revolves around the protein target database used in the target prediction process. Given that the prediction method includes structure-based methods, atomic protein structures and their binding sites are required. Additionally, for the ligand similarity-based search, comprehensive binding ligand information is required. Thus, it is important that our database is equipped with a large and high-quality collection of protein structures and interaction information covering diverse human proteins.

Compared to GalaxySagittarius11, GalaxySagittarius-AF utilizes an updated protein structure database that integrates recently resolved experimental structures from the PDB database12, and the predicted structures from the AlphaFold Protein Structure Database (AFDB)13. For both experimental and predicted structures, the binding sites and ligands were newly predicted using an updated version of GalaxySite, leading to broader coverage of targets with binding information and a wider range of ligand varieties. The updates greatly expanded the protein target database and improved the target prediction accuracy of GalaxySagittarius-AF.

留言 (0)

沒有登入
gif