Targeting SARS-CoV-2 main protease: a comprehensive approach using advanced virtual screening, molecular dynamics, and in vitro validation

Pharmacophore model construction, validation, and screeningLigands collection and clustering

To construct ligand-based pharmacophore models, we obtained inhibitors of the Mpro from the ChEMBL and BindingDB databases, both of which are well-known repositories of biochemical activity data. We selected inhibitors with IC50 values below 20 μM and no more than two violations of Lipinski's rule of five, ensuring the selection of potent and drug-like compounds. This filtering process yielded a final set of 155 compounds deemed to be highly reliable inhibitors of the Mpro target. These compounds were subsequently clustered using the Tanimoto coefficient with a similarity threshold of 0.7. The clustering analysis distributed the 155 compounds into 94 clusters, with 64 single-molecule clusters. The largest cluster comprised 11 molecules, exhibiting a mean internal similarity value of 0.757, calculated based on the Tanimoto distance matrix.

Pharmacophore model construction

The pharmacophore hypothesis is a fundamental concept in drug discovery, delineating a molecule's essential structural characteristics and spatial orientation that are crucial for its biological or pharmacological activity. This hypothesis serves as a foundation for designing novel drug candidates, enabling researchers to predict and emulate the interactions between a target and a ligand, thereby facilitating the development of potential therapeutics [45].

Several pharmacophore models were constructed and evaluated based on the clusters described above for selectivity and specificity. The two most promising models were built from the largest cluster and the cluster with the lowest median IC50 value (61.5 nM). The pharmacophore model from the largest cluster comprised nine features (Fig. 1A1, A2), namely one hydrogen bond donor, three hydrogen bond acceptors, three aromatic rings, and two hydrophobic groups. Whereas the model derived from the cluster with the lowest median IC50 included eight features, namely one hydrogen bond donor, three hydrogen bond acceptors, two aromatic rings, and two hydrophobic groups (Fig. 1B1, B2). These models were designed to identify the critical features necessary for ligand binding. Among the two primary models, the first, based on the largest cluster, exhibited superior selectivity (0.169 compared to 0.084 for the second model) and specificity (0.991 compared to 0.984 for the second model), making it the more effective model for virtual screening applications.

Fig. 1figure 1

3D pharmacophore models shown in two perspectives. A1, A2: pharmacophore model constructed from the largest cluster. B1, B2: pharmacophore model constructed from the lowest median IC50 cluster. Purple arrows are hydrogen bond donors, red spheres are hydrogen bond acceptors, yellow spheres are hydrophobic groups, and double blue circles are aromatic rings

Pharmacophore-based screening of library

To filter potential hits against the constructed pharmacophore model, we conducted virtual screening against several publicly available chemical libraries, namely CHEMBL30, ChemDIv, Chemspace, MCULE_ultimate, MolPort, and ZINC. The selected pharmacophore model was imported to the Pharmit server, where each library was screened using the default parameters.

The large-scale screening was carried out across six databases on the Pharmit webserver. As presented in Table 1, the total number of compounds within databases ranged from 1,456,000 (ChemDiv) to 126,471,000 (MCULE_ultimate). The total conformers generated spanned from 21,562,000 in ChemDiv to 378,880,000 in MCULE_ultimate. Of the six libraries, MolPort displayed the highest number of hits identified, totaling 385, while ChemDiv showed the least with only 5 hits. Interestingly, although MCULE_ultimate had the largest number of compounds and conformers, it did not have the highest number of hits. Instead, it ranked fourth with 31 hits.

Table 1 Screened databases on Pharmit webserver

Following the virtual screening process, 790 matches were found. Any duplicate ligands identified in this process were subsequently removed. Moreover, molecules infringing more than two of Lipinski’s rule of 5 were excluded, too. Upon completion of this process, a final count of 168 unique ligands remained.

Molecular docking analysis

Docking-based virtual screening has been extensively applied in the search for SARS-CoV-2 Mpro inhibitors, aiming to identify promising candidates from vast compound libraries. However, the predictive accuracy is often compromised by weak correlations between docking scores and experimental bioactivity, as well as challenges in distinguishing active from inactive compounds. To enhance the reliability of virtual screening for Mpro inhibitors, rigorous validation protocols, the use of diverse experimental datasets, and the integration of complementary computational methods are essential [46]. Hence, our study implemented advanced virtual screening (AVS), combining ensemble docking using eight different crystallographic structures and implementing two class-leading docking programs: Autodock Vina and ICM Pro. As previously demonstrated, the AVS enhances the trustworthiness of the virtual screening outcomes by employing pre-verified information, collective docking methods, and a consensus approach [24]. This methodology can augment the efficacy of the virtual screening process, especially in the initial identification of potential hit compounds, a critical aspect of real-world screening applications. The results from the two programs were used to calculate the consensus docking score.

Ensemble docking

The first step of the AVS method was ensemble docking. This computational technique uses multiple conformations of a target protein to predict the binding geometry and affinity of candidate ligands. Ensemble docking can account for the receptor flexibility and diversity often neglected in conventional docking methods that use a single static protein structure [46]. It has been shown to improve the accuracy and reliability of pose and affinity prediction in various drug discovery applications. Ensemble docking can also enhance the early enrichment power of the models and reduce the computational cost and false positive pose predictions by selecting the most relevant conformations of the target protein [47].

The following crystallographic structures of Mpro were used for ensemble docking: PDB ID: 7VLP, 7TE0, 7RFS, 7T43, 7DPU, 7DDC, 6WNP, and 5RHE. The selection of proteins was based on a clusterization of 342 MPro structures and representative selection from each cluster to include conformational diversity of the binding pocket's interacting amino acids to improve the ensemble docking performance [48]. To ensure the reliability and accuracy of our docking protocols and parameters, we conducted redocking experiments prior to the primary docking analysis. Native ligands were extracted from the eight crystallographic structures and subsequently redocked using both ICM-Pro and AutoDock Vina software. The RMSD between the native ligand poses and the best-scoring docking poses was calculated to assess the performance of each docking program (Table S2). ICM-Pro yielded a mean RMSD of 1.11 Å, with individual RMSD values ranging from 0.62 Å for the structure 7RFS to 2.59 Å for 7DDC. In comparison, AutoDock Vina produced a mean RMSD of 1.28 Å, with the lowest RMSD of 0.62 Å observed for 5RHE and the highest RMSD of 2.62 Å for 7T43. These results demonstrate that both docking programs successfully reproduced ligand poses that closely resemble their native conformations. The consistent performance across multiple structures validates the robustness of our docking protocols and supports the subsequent docking analyses conducted in this study.

A ranking-based scoring system was employed to evaluate the docking performance of each ligand across the ensemble of crystallographic structures. For each structure, ligands were ranked based on their docking scores, with the highest-scoring ligand assigned a rank of 1 and the lowest-scoring ligand assigned a rank of 168. The ranks for each ligand were then summed across all eight structures, resulting in an overall ensemble score for each ligand. Ligands with lower ensemble scores were considered better performers, as they consistently achieved higher docking scores across the panel of crystallographic structures (Table S3).

Consensus docking

In the consensus-scoring approach, scores for each compound obtained from the individual software for the same representative protein structure were combined using different docking methods. Consensus docking involves running multiple docking simulations using different algorithms and scoring functions and combining the results to obtain a consensus prediction. By leveraging the strengths of different docking methods, consensus docking aims to improve the accuracy and reliability of binding mode predictions, thereby aiding in the design of new drugs [49, 50].

For this study, consensus docking was performed using the ensemble docking results obtained from Autodock Vina and ICM Pro. Since these docking programs utilize distinct scoring functions, score normalization was required prior to combining the results. Each compound was ranked based on its docking scores in each software. By integrating the individual rankings from Autodock Vina and ICM Pro, we established a comprehensive and more reliable ranking of the compounds. Specifically, after performing ensemble docking on the eight structures, the docking results from the two programs were combined using consensus scoring. The best docking scores for each compound from both programs were averaged to calculate an overall score, and the molecules were then ranked based on this average score (Table S4). As demonstrated in the table, the docking results for the 168 ligands from Autodock Vina and ICM Pro were filtered based on their standardized consensus scores (with lower scores indicating better binding affinity). Consequently, the top ligands with superior consensus rankings were selected for further analysis.

MM/PBSA

Quantifying binding free energy is a central concern in assessing ligand-receptor binding affinities. The MM/PBSA method balances computational resource requirements and prediction accuracy. It is one of the most widely adopted approaches in virtual screening applications, as corroborated by previous studies [40]. Based on the consensus docking score, the top 43 compounds were analyzed for binding free energy using MM/PBSA (Table S5). The 7RFS structure was chosen for the analysis based on our previous study, where it consistently demonstrated one of the best performances in virtual screening, both in consensus and ensemble docking [48]. The free energy scores ranged from −5.67 kcal/mol (worst) to −18.61 kcal/mol (best), with a mean value of −10.8 kcal/mol. Given the experimental analysis results, 30 compounds with the best results (mean value: −12.16 kcal/mol) were selected for more in-depth molecular dynamics simulation studies.

Molecular Dynamics

Molecular dynamics (MD) is a computational method widely employed in drug discovery to simulate the temporal evolution of atomic and molecular interactions at an atomistic level. It provides detailed insights into the dynamic behavior of biomolecules, including their conformational flexibility, interaction profiles, and thermodynamic properties, enabling a deeper understanding of the molecular mechanisms underlying biological processes and drug-target interactions [51]. MD simulations play a crucial role in understanding the mechanisms of drug-target interactions and predicting the binding affinity of small molecules to target proteins. By simulating the dynamic behavior of a drug molecule within the target protein's active site, valuable information can be gained about how drugs bind, their stability, and the conformational changes that occur upon binding [52].

After performing the MD analysis on the 30 compounds screened by MM/PBSA, the protein–ligand complexes were examined, and the most stable complexes were selected for further analysis (Fig S1S5). Based on the evidence in the literature [53], we assumed that the stability of complexes, specifically those maintaining stability for a minimum duration of 50 ns, may serve as a reliable indicator of their functional performance in a biological context. Ligands, when bound and retained within the protein's active site for an adequate length of time, could effectively influence and modify the protein's functionality. Accordingly, 16 compounds were selected for in vitro analysis. These compounds demonstrated the highest stability within the protein's active site, maintaining stable interactions for at least 50 ns based on RMSD analysis of the ligands against the protein. This selection criterion ensured that only the most promising candidates, based on their predicted functional performance in a biological context, were advanced for experimental validation.

ADME-Tox prediction

Understanding the absorption, distribution, metabolism, excretion, and toxicity (ADME-Tox) properties is fundamental to drug design and development, as these properties shape therapeutic compounds' pharmacokinetics, efficacy, and potential side effects [54]. Therefore, in addition to our MD analysis, we computationally evaluated the ADME-Tox profiles of the top 16 compounds using the SwissADME and ADMETlab 3.0 online tools. Specifically, we analyzed various physicochemical and biological attributes, including molecular weight (MW), topological polar surface area (TPSA), partition coefficient (LogP), number of rotatable bonds (RB), hydrogen bond acceptors (HBA), and hydrogen bond donors (HBD). These properties, along with the compounds' SMILES annotations and molecular structures, are presented in Table 2. Furthermore, we calculated human intestinal absorption (HIA), blood–brain barrier penetration (BBB-P), P-glycoprotein (P-gp) substrate potential, and cytochrome P450 3A4 (CYP3A4) inhibition profiles (Table S6).

Table 2 ADME properties of top 16 compounds

The results demonstrate that most of the compounds exhibit characteristics typical of drug-like molecules, such as appropriate molecular weights, an optimal balance of lipophilicity and polarity, and suitable numbers of hydrogen bond donors and acceptors. Additionally, these compounds showed no calculated toxicity and good biological membrane penetration. The notable exception is compound ID31, which was predicted to inhibit CYP3A4, has only moderate intestinal absorption, and is a non-substrate for P-gp. Overall, these findings suggest that at least 15 of the top 16 compounds have a high potential to exhibit favorable pharmacokinetics and bioavailability, making them promising candidates for further drug development and testing.

In Vitro IC50 evaluation

Computational screening provides valuable insights into compounds' potential interactions and efficacy. Yet, it is essential to validate these predictions through in vitro experiments to ensure accuracy and reliability. In vitro tests offer tangible evidence of a compounds' activity and possible side effects. This critical step substantiates computational findings and further refines the selection process before advancing to costly and time-consuming in vivo studies or clinical trials [55].

We evaluated the inhibitory activity of the 16 small molecule compounds against the Mpro using a SARS-2 Mpro enzyme assay. The assay was performed with 25 nM of SARS-2 Mpro and a substrate concentration of 25 μM. The tested compounds were measured for their half-maximal inhibitory concentration (IC50) values, which indicate the concentration of the compound required to inhibit 50% of the Mpro activity.

Seven of the 16 compounds tested (ID11, ID20, ID35, ID18, ID32, ID41, and ID14) exhibited IC50 values below 100 μM (Fig. 2), suggesting a potential inhibitory effect on Mpro activity. Out of them, the most potent compounds were ID11 and ID35, with IC50 values of 54.39 μM and 59.39 μM, correspondingly. The remaining nine compounds demonstrated IC50 values greater than 100 μM, indicating limited or no inhibitory effects on Mpro activity. Although our compounds are considerably less potent than the reference (0.013 μM), they may still hold promise for further optimization and development as potential antiviral agents targeting SARS-CoV-2 Mpro. Nevertheless, due to financial constraints, only the 16 out of 30 compounds were subjected to in vitro testing; therefore, we invite other researchers to evaluate the remaining 14 compounds to further explore their potential as Mpro inhibitors.

Fig. 2figure 2

In Vitro IC50 Evaluation of top 16 compounds. REF—Reference

Long molecular dynamics analysis of ID11 and ID35

Long MD simulations, particularly those over extended timescales like 200 ns or more, are vital in drug discovery for understanding protein-drug and protein–ligand interactions. These extended simulations are crucial for capturing the full range of protein motions and conformational changes, providing a comprehensive view of how drugs interact with protein targets. Shorter simulations may miss important conformations and dynamics, leading to less accurate drug design [56].

Accordingly, long MD simulations of 500 ns duration were performed on two compounds with the most promising IC50 values (ID11 and ID35). These simulations aimed to understand better the interactions between the selected compounds and the Mpro. Post-MD analyses calculated parameters such as binding energy, minimum atomic distance, and the number of atomic contacts within the protein–ligand complex (Fig. 3). The binding energy of a protein–ligand complex is a critical determinant of the interaction strength. It is typically estimated by subtracting the aggregate energies of the individual, unbound protein, and ligand entities from the total energy of the bound complex. The minimum atomic distance measures the shortest spatial separation between any pair of atoms or atomic groups within the system. For instance, within the context of protein–ligand interactions, this parameter signifies the smallest distance between any atom within the protein and any atom in the ligand. A smaller minimum distance often implies a more potent or direct interaction. The number of contacts, defined as the total count of atom pairs within a specified cutoff distance, measures the interaction interface between the protein and the ligand. In this study, atom pairs (each pair consisting of an atom from the protein and one from the ligand) closer to a predefined cutoff distance of 0.6 nm were considered "in contact." This metric provides insights into the extent of the interface between the protein and the ligand during their interaction since, up to this distance, various bonds can form, including hydrogen bonds, ionic interactions, van der Waals forces, hydrophobic interactions, π-π stacking, and cation-π interactions, which collectively contribute to the specificity and stability of the complex.

Fig. 3figure 3

The 500 ns MD analysis of ID11 and ID35. RMSD: Root-mean-square deviation of ID11 and ID35. Number of Contacts: Number of contacts between the ligand and the protein. Minimum Distance: Minimum distance between the ligand and the protein. Energies: Interaction energy profiles of ID11 and ID35

As demonstrated in Fig. 3, the analysis of ID11 shows a consistent trend in complex-ligand interaction energy, number of contacts, and minimum distance over the MD simulation. Specifically, the interaction energy decreases, and the number of contacts increases, which indicates strong binding interactions and stability of the complex. Despite some fluctuations in RMSD, the overall pattern suggests that ID11 maintains stable interactions within the active site, aligning well with the in vitro results. For ID35, similar patterns were observed, albeit to a lesser extent. The decrease in interaction energy and increase in the number of contacts are present, though not as pronounced, and the minimum distance remains relatively stable throughout the simulation. Altogether, these observations collectively support the strong interactions between the compounds and Mpro during the simulation, indicating potential stability and efficacy despite the observed RMSD variations. Furthermore, the stabilities of both complexes were confirmed by the visual inspection of the simulation videos, where it is clearly visible that both ID11 and ID35 remain stable in the binding pocket throughout the 500 ns simulations, despite some minor fluctuations (Supp. Videos S1 and S2).

To comprehensively evaluate the interactions of the two ligands with the binding site amino acids, we conducted a detailed analysis to elucidate the molecular interactions underlying their observed bioactivity. MD trajectories were clustered based on the RMSD values of the conformational frames, resulting in the identification of nine distinct clusters for each ligand. From the largest cluster, the pose with the highest average interaction energy was selected for further examination. This pose was then used to generate a detailed 2D interaction map of the ligands within the binding pocket (Fig. 4), highlighting the key residues involved in binding and their specific interactions.

Fig. 4figure 4

3D representations and 2D interaction maps of the most frequent poses of ID11 and ID35 MD simulations. A and B the 3D representation of ID11 and ID35 compounds and the Mpro binding pocket, respectively. C and D 2D interaction maps of ID11 and ID35 with the amino acids of the binding pocket

The interaction analysis revealed that the ligands forms hydrogen bonds with His163 and Asn142, as well as extensive hydrophobic interactions with His163, Asn142, His41, Met49, Phe140, Leu141, Ser144, Cys145, Glu166, and Gln189. These results align with established findings in the literature, particularly regarding the importance of Cys145 and His41, which constitute the catalytic dyad central to the enzymatic activity of Mpro. This detailed interaction profile provides a deeper understanding of the structural basis for ligand binding and its potential contributions to the observed inhibitory activity [16].

The results further confirm our assumption that ID35 and ID11 directly affect the activity of Mpro by interacting with the essential amino acids in the binding pocket. Therefore, these two compounds can serve as important milestones for further optimizations and, subsequently, in finding potent SARS-COV2 medicines.

留言 (0)

沒有登入
gif