A comprehensive survey on protein-ligand binding site prediction

Protein-ligand interactions are essential for various cellular processes, including cellular components, cell signal transduction, and metabolic regulation [1, 2, 3]. These interactions occur through complementary structures and noncovalent/covalent interactions at specific binding sites on proteins, known as ligand binding sites (LBSs). LBS prediction provides valuable insights into the mechanisms of intermolecular interactions and the pathogenesis of diseases and serves as a foundation for drug development [4, 5, 6]. Figure 1a illustrates the mechanisms of how allosteric and orthosteric drugs work. Figure 1b shows the LBS in the structure of the SARS-CoV-2 main protease and its inhibitor CDD-1713 (PDB ID: 7LTN) [7]. Interactions analyzed by PLIP [8] consist of the covalent bond, the hydrophobic interaction and hydrogen bonds.

Compared with traditional experimental methods, such as X-ray crystallography, nuclear magnetic resonance, and cryogenic electron microscopy, computational methods offer an effective means to perform high-throughput predictions, reducing costs and accelerating the discovery of protein-ligand interactions. Since the 1990s, numerous computational methods have constantly been proposed to identify LBSs. Earlier methods relied on manually crafted spatial geometry or energy features to identify large hollows or cavities within protein structures where interactions often occur. However, since LBSs are also influenced by various bio-physicochemical properties specific to different ligands and protein types, these methods often result in high false positive rates [9]. In recent years, databases such as the Protein Data Bank (PDB) [10], BioLip [11], and DrugBank [12] lead large-scale dataset-based methods to the mainstream, including template-based, traditional machine learning-based, and deep learning-based methods. Table 1 collects some popular large-scale dataset-based LBS prediction methods published in the last five years, indicating a growing focus on deep learning algorithms.

In this paper, we begin by summarizing the challenging issues of LBS prediction. Then, we present a comprehensive overview of popular LBS prediction methods based on their input features, computational algorithms, and ligand types, as depicted in Figure 2. Then, we introduce the two categories of LBSs, orthosteric and allosteric sites, and discuss the specificity of allosteric site identification. Lastly, we emphasize the need for continued exploration of more effective solutions and draw attention to areas that demand further improvement.

留言 (0)

沒有登入
gif