FastGrow: on-the-fly growing and its application to DYRK1A

The cross-growing set was regenerated on the PDBbind refined set v.2020 [37]. The procedure was the same as previously described [18], but resulted in more test cases due to the growth of the PDBbind. The previous cross-growing set contained 326 test cases and 155 unique fragments, whereas the new cross-growing set contained 425 test cases and 176 unique fragments.

The shape-based growing has improved slightly since the original publication [18] and recreated \(66.8\pm 4.5\%\) (95% confidence interval) of ligand poses at an RMSD less than 2Å. Restrained JAMDA optimization at \(70.8\pm 4.3\%\) showed minor improvements in RMSD (Fig. S1). As shown in the section below, the changes it performed seemed to be more beneficial to individual interaction geometries than to the pose as a whole.

The small position changes a restrained JAMDA optimization performed at the core atoms occasionally led to slightly higher RMSD. JAMDA was used in a restrained configuration that applied a quadratic penalty term to movement of the core atom. These are assumed to already be in the correct position, but a certain amount of movement was necessary for correct optimization. A full run of the cross-growing set took around half an hour without JAMDA optimization and three and a half hours with optimization, as can be seen in Table 1.

Table 1 Overview of the performance and runtime of FastGrow on the cross-growing setMaintaining interactions

Search points were generated for all 425 test cases of the cross-growing set. 252 of these test cases generated at least one interaction that was stable across both binding site conformations. 85 acceptor search points and 21 donor search points were found in the cross-growing set. There were almost five times as many hydrophobic search points (a total of 532), than there were donor and acceptor search points, even though only terminal hydrophobic groups and “fully hydrophobic” rings were included. This was largely due to the very permissive definition according to ChemScore [24] in comparison to the geometrically constrained hydrogen bonds. The largest contributors to the number of hydrophobic search points were hydrophobic ring systems.

In general, the pose generation performance of growings with search points was better than for those without (\(77.0\pm 5.2\%\) vs. \(64.3\pm 5.9\%\), see Sect. 4 of the Supplementary Information). It is, however, more meaningful to evaluate the interaction geometries themselves that growings with search points produce. Figure 1 shows what percentage of interactions could be maintained by the grown poses. The purely shape-based growing conserved \(62.4\pm 12.0\%\) of the hydrophobic interactions, but had significant difficulties maintaining hydrogen bond acceptor and donor interactions at \(30.6\pm 9.8\%\) and \(28.6\pm 19.4\%\), respectively. Shape-based growing had no way of telling where to place hydrogen bond interactions because it did not receive any search point information and the RVM descriptor does not contain any electrostatic information [18].

Fig. 1figure 1

Percentage of stable interactions that could be maintained in the cross-growing set by different methods. “Growing” represents purely shape-based growing. “Search Points” and “Optimized Search Points” represent growing with search points with or without subsequent restrained JAMDA optimization. The y-axis denotes what percentage of input interactions could be maintained during growing. The three most prominent interaction types are color coded according to the legend. The error bars are 95% confidence intervals

FastGrow using search points and FastGrow using restrained JAMDA optimization as well as search points both outperformed purely shape-based growing in conserving acceptors. FastGrow using search points had a success rate of \(55.3\pm 10.6\%\) and FastGrow using search points with optimization had a success rate of \(68.2\pm 9.9\%\). Furthermore, hydrogen bond donor functionalities, as the most geometrically constrained interactions in this set, seemed to profit the most from restrained JAMDA optimization after a search point query with \(76.2\pm 18.2\%\) of them being conserved. The upper bound to maintaining any interactions was the performance of the pose generation. Search points with JAMDA optimization showed a very similar ability to maintain interactions as the pose generation performance.

Water replacement

Water replacement test cases could be extracted for 162 cross-growing test cases. 81 cases generated one search point for the water replacement query. 58 cases generated two search points, which implied that in these test cases two waters were replaced by the ligand to be grown. The highest number of waters replaced by a ligand was four. Figure 2 shows a few examples of the ligand to be grown overlapped with water in the binding site used for growing. While some of these waters were replaced by hydrophobic substructures, some were in positions that the ligand subsequently occupied to make directed interactions with the binding site.

Fig. 2figure 2

Water replacement test cases that were extracted from the cross-growing set. The replaced waters are shown with the interactions that were used in the query to replace them. Yellow spheres are hydrophobic interactions and green cylinders hydrogen bond interactions. Binding site residues are light blue. The ligands are non-native to the binding site. The darker blue part of a ligand will be grown in a water replacement test case. a The ligand of 5ULT in 3GI6 (Gag-Pol polyprotein). b The ligand of 4AGO in 4AGM (Cellular tumor antigen p53). c The ligand of 6MA4 in 6MA5 (O-GlcNAc transferase). d The ligand of 6PGA in 6PG4 (WD repeat-containing protein 5). All 3D molecule images were made with the NGL viewer[38]

Although there was a difference in pose generation between purely shape-based growing with a \(64.2\pm 7.4\%\) success rate and water replacement with subsequent optimization at \(72.8\pm 6.8\%\), it was not significant. The somewhat small effect may in part be a symptom of the comparatively few test cases. Water replacement does not seem to confer a significant general advantage and may be more useful in specific systems. Figure 3 shows a side chain of a quinolinone-6-sulfonamide derivative crystallized in PDB code: 6MA4 being grown into 6MA5. The purely shape-based growing without water replacement did not know about any of the interactions the carboxylic acid could make with the backbone and turned it away. Using the waters in the binding site led to the side chain fully stretching out and interacting with the backbone.

Fig. 3figure 3

Growing with and without water replacement. The purely shape-based growing is to the left and the one using water replacement is to the right. The binding site is from PDB code: 6MA5. The ligand of 6MA4 in gray is aligned to the binding site and used as a reference for RMSD calculation. The grown ligands are in a darker blue

Ensemble flexibility

Ensemble test cases with at least two binding site conformations, not including the reference PDB, could be generated for 246 cross-growing test cases. Almost half of the ensemble test cases (120) contained five binding site conformations. 29 ensemble cases had only two binding site conformations. Except for five test cases, the average all atom RMSD of the ensemble binding sites to the SIENA query was between 0 and 1.5 Å distributed around a mean RMSD of 0.74 Å.

Automatically generated binding site ensembles did not show a strong statistical improvement of the pose generation performance (\(69.3\pm 5.7\%\) vs. \(75.2\pm 5.4\%\), see Sect. 6 of the Supplementary Information). It is well-known that ensembles tend to cause false positive hits in docking [39] and while many systems profited from ensembles, a similar amount generated poses with higher RMSDs.

When used appropriately, ensemble flexibility can make a significant difference [40]. Conformational changes of binding site residues may obstruct growing of a ligand bound in a different structure of that binding site. Figure 4 shows a shikimate precursor mimicking ligand from PDB code: 3N76 that was grown into 3N7A and an ensemble of 4B6O and 4KIU. FastGrow could not grow the phenolic substituent of 3N76 into the groove that was defined by 3N7A. It needed information about the flexibility of the loop and the movement of a conserved Tyrosine to calculate a successful pose with less than 2 Å RMSD.

Fig. 4figure 4

Growing in a binding site with a highly flexible loop [41]. To the left in purple is PDB code: 3N7A. To the right are 4KIU in orange and 4B6O in blue. All are structures of the 3-dehydroquinate dehydratase. The conserved TYR24 is in light blue. The ligand atoms in darker blue were grown by FastGrow in the 4KIU/4B6O ensemble, which was not possible in 3N7A alone. The reference ligand from 3N76 is in gray

Docking comparison

The pose re-prediction performance of FastGrow was compared to DOCK [20] on the cross-growing set. Only test cases where both methods could generate at least one pose were included in the statistics and only the top poses compared. Anchored docking could not generate poses for 12 test cases and FastGrow could not generate poses for 5 test cases. One of these test cases overlapped, so 16 (4%) of the 425 test cases were excluded. A discussion of why molecules failed can be found in Sect. 7 of the Supplementary Information.

Figure 5 shows the performance of DOCK cross-docking and anchored docking versus FastGrow growing and growing with subsequent restrained JAMDA optimization. Anchored docking with a success rate of \(54.3\pm 4.8\%\) outperformed cross-docking at \(33.3\pm 4.6\%\), which is to be expected. Anchored docking received more input information, i.e., the core of the cross-growing test case, which significantly reduced the degrees of freedom in the system and therefore the potential for error. It is nonetheless an important point to make that using all the input information available can significantly impact the correctness of a prediction.

Fig. 5figure 5

Performance comparison of DOCK in a “Cross Docking” and an “Anchored Docking” to FastGrow with or without subsequent restrained JAMDA optimization. The error bars are 95% confidence intervals

Both growing at \(67.5\pm 4.5\%\) and growing with optimization at \(71.4\pm 4.4\%\) significantly outperformed anchored docking. It was unexpected to see FastGrow outperforming anchored docking. Both systems received the same input information and had been validated for this task. The difference probably arose in how sensitive both systems were to steric clashes with the binding site. FastGrow has a comparatively high clash tolerance in the initial pose generation, which produced many poses a typical docking workflow would have rejected [18]. Restrained JAMDA optimization then resolved these clashes with minor geometry adjustments. The permissiveness of the pose generation may have been able to sample the sometimes uncomfortable fit of a non-native ligand better than a more clash sensitive system.

DYRK1A case study

DYRK1A is one member of the Dual Specificity Tyrosine-phosphorylation-regulated Kinases. Its expression pattern suggests a role in the central nervous system [42], which supports its implication in neurodegenerative disease [43] and Down’s Syndrome [44]. Furthermore, DYRK1A is suspected to be involved in some of the pathways that lead to an increased cancer risk for individuals with Down’s Syndrome [45]. Servier and Vernalis recently published a collection of novel and highly active inhibitors for DYRK1A that were produced in a collaborative FBDD campaign in Walmsley et al. [21] and Weber et al. [22].

Growing a ligand from a fragment

Walmsley et al. discovered fragment 1 (PDB code: 7A4R) as one of their micromolar hits (DYRK1A cKi 1.5 \(\mu\)M) from a fragment screening that was performed on DYRK1A using the Vernalis fragment library [21, 46, 47]. Toward the end of the publication they focused on compound 34 (PDB code: 7A5N), a nanomolar ligand (DYRK1A IC\(_\) 7 nM) that they described as “[...] a potent, in vivo-tolerated, selective inhibitor of DYRK1A kinase” [21]. Both fragment 1 and compound 34 can be seen in Fig. 6. Both fragment 1 and compound 34 exhibited canonical hinge binding [48] and compound 34 retained the central ring core of fragment 1. Compound 34 could therefore be grown from fragment 1 by exchanging the amine close to the hinge to a methyl, growing into the salt-bridge region, and using the vector defined by the carbonyl at the ring of fragment 1 to address the glycine loop. These three steps were performed with FastGrow and DOCK.

Fig. 6figure 6

Fragment 1 and compound 34 in a structure of DYRK1A. The structure and fragment 1 coordinates in gray are from PDB code: 7A4R and compound 34 in orange comes from PDB code: 7A5N and is aligned to the binding site using SIENA [27]. The cartoon for the residues GLU160 to ASP178 was removed for clarity as it is in Walmsley et al. [21]. All 3D molecule images were made with the NGL viewer [38]

Exchanging the amine of fragment 1 to a methyl was an important step towards the selectivity of compound 34. The uncommon position of the LEU241 carbonyl opened up space near the hinge, which was specific to DYRK1A [21]. Computationally, the switch from amine to methyl should be trivial. Figure 7 shows the poses that were generated by DOCK and FastGrow. Both the anchored docking approach using DOCK and FastGrow generated a realistic pose for fragment 1 with a methyl. Cross-docking with DOCK, which was not constrained by the position of the core atoms, generated a pose that drifted out of the pocket. Clearly, not using the available information of the core atoms led to a poorer quality pose. Although the change in RMSD is minor, several interaction geometries shifted into the unphysical range.

Fig. 7figure 7

Amine to methyl exchange of fragment 1. Fragment 1 from PDB code: 7A4R is in grey. To the left are poses that were generated by DOCK cross-docking in light green and anchored docking in yellow. To the right is the pose that was generated by FastGrow in a darker blue

A hydrogen position at the pyrrole substructure of the pyrrolopyrimidine could be used to grow into the salt-bridge region. The pyridine nitrogen of the aminopyridine to be placed there was expected to interact with LYS188, while the amine was expected to interact with GLU203. Figure 8 shows the poses generated by the three methods. Using an anchored docking or a cross-docking configuration led to significantly different poses in this case. The cross-docking generated a pose very close to the eventual orientation the aminopyridine has in compound 34 (i.e., PDB code: 7A5N). Despite receiving the coordinates of the full methylated fragment 1 as an input, the anchored docking optimized its pose out of the pocket. This could have been an overreaction of the underlying scoring function and optimization to minor clashes. Cross-docking did find a good pose, proving that both the scoring function and optimization could support one, however some strong effect perceived by the optimization led to a poorer pose for anchored docking.

All methods initially placed the amine of the aminopyridine away from GLU203. There are structures that support this as at least a reasonable position. Weber et al. modeled the ligand of PDB code: 7AJ2 with multiple conformations one of which has an aminopyridine that points away from GLU203. Most structures reported by Walmsley et al. and Weber et al., however, were modeled so that the aminopyridine pointed toward GLU203. We could achieve a pose more consistent with the other structures by using the position of the water HOH603 to estimate a reasonable position for the amine to interact with GLU203. HOH603 was then removed as usual in the growing process. We could also use the position of the amine in a different structure to achieve the same result. A search point with the correct type at that position guided FastGrow to place the amine near GLU203.

Fig. 8figure 8

Growing the aminopyridine into the salt-bridge region. The core of methylated fragment 1 that was used as an input to FastGrow and anchored docking is in grey. To the left are the poses that were generated by DOCK cross-docking in green and anchored docking in yellow. To the right are two poses produced by FastGrow in darker blue. The pose with the amine towards GLU203 was guided by placing a search point at the position of HOH603

The last growing step involved replacing the carbonyl in the now modified fragment 1 with a difluoro-benzylamine, which interacts with the glycine loop. Figure 9 shows that all methods agreed on this step. The anchored docking, cross-docking, and FastGrow all reproduced the conformation of the difluoro-benzylamine of compound 34 in 7A5N.

Fig. 9figure 9

Poses of compound 34 that were grown from fragment 1 in PDB code: 7A4R. In green and yellow are the DOCK cross-docking and anchored docking poses, respectively. The FastGrow pose is in a darker blue. The compound 34 reference structure aligned to 7A4R is in orange

Each of the three growing vectors considered above resulted in at least one series of compounds in the publication by Walmsley et al. [21]. While this case study was therefore reductive, it demonstrated the three methods: DOCK cross-docking and anchored docking, as well as FastGrow, in a practical application. Cross-docking generated good poses for the two larger modifications but exhibited unnecessary pose drift in the methylation. Anchored docking incorporated template information but seemed to have stability issues when confronted with clashes. FastGrow generated realistic poses for all three parts of the incremental growing. In one case the pose generation could be improved by using a crystallized water or external information to place a search point as guidance.

Fragment growing enrichment

A number of ligands from Walmsley et al. [21] and Weber et al. [22] had common cores that could be used to grow other ligands. The diaminopyridine moiety of the ligand in PDB code: 7AJW was used as the growing seed or core for the enrichment case study. Figure 10 shows the ligand of 7AJW with the diaminopyridine highlighted in green. Any fragment growing from this core would initially point directly to the hinge region.

Fig. 10figure 10

The fragment screening enrichment core generated from PDB code: 7AJW. The diaminopyridine many of the ligands in the data set have in common is in light green. The bond between the gray and the green substructures is cut and the green structure used for growing. The growing direction will therefore be towards the hinge region

Many ligands of DYRK1A, especially those from Weber et al. had a conserved diaminopyridine moiety. Filtering all diaminopyridine-containing ligands down to only the ones with affinities of less than 10 nM resulted in 77 ligands. These 77 ligands were fragmented at the bond to the diaminopyridine. The Rule of Three [35] properties of these fragments were inspected to detect outliers far outside of the property distribution. Nine fragments were discarded as outliers. There was a large group of fragments with exactly the same TPSA, which had to be downsampled to avoid biasing the screening. This lead to a collection of 40 highly active fragments. A comparison of properties before and after the filtering can be found in the Supplementary Information, Sect. 8.

The set of 40 known highly active single-digit nanomolar fragments was combined with a set of generic fragments assumed to be not as active. 2653 property-matched generic fragments were generated using the BioSolveIT fragment set [34]. The fragments were property matched using “Rule of Three” properties [35]. After property matching, sorting by molecular weight was chosen as a comparison baseline. Molecular weight retained some residual discriminative power despite property matching and was therefore chosen as a comparison baseline instead of an idealized null baseline. This is discussed further in Sect. 8 of the Supplementary Information. For evaluation we used the receiver operating characteristic (ROC) and the area under the curve (AUC). The ROC curves and AUCs of the three methods (FastGrow’s pose scoring, FastGrow with restrained JAMDA optimization and anchored docking with DOCK) can be seen in Fig. 11.

Fig. 11figure 11

ROC curves and AUCs of DOCK anchored docking, FastGrow with JAMDA, the FastGrow pose scoring function and sorting by descending molecular weight. The annotated errors on the AUCs describe a 95% confidence interval

All methods outperform the molecular weight baseline. The methods themselves all performed similarly and very well. Most surprising is that the FastGrow pose scoring function, which was never parametrized or evaluated for this purpose, performed almost on par with the other more sophisticated scoring functions. The FastGrow pose scoring function is made up of three terms: filled volume, number of close contacts, and clash [18]. Clash is the only term it shares in common with the other two scoring functions and therefore must be the driving force behind the general high performance. To substantiate this we repeated the experiment with all terms of the pose scoring function set to zero except for clash. Figure S7 shows that the pose scoring function with just clash performed comparably to the other scoring functions. 3D clash, and by extension shape complementarity, seemed to be a very discriminative property in this particular system, which leads to the high performance of all three of these methods.

Table 2 shows the time it takes for each of the three methods to generate poses for all fragments in the enrichment screening and score them. There was a factor five difference in speed between DOCK anchored docking and FastGrow with restrained JAMDA optimization. There was a factor 500 difference in speed between anchored docking and using FastGrow’s pose scoring function. Of the 2 s FastGrow and its pose scoring function spent screening, half was spent in the input-output operations of extracting 2693 fragments from the screening database and writing the hits. The hundred-fold increase in runtime between the simple FastGrow pose scoring function, which performed competitively in this case study, and the more sophisticated scoring functions was disproportional to the apparent gain in performance.

Table 2 Runtimes of fragment growing enrichment for the three methods

留言 (0)

沒有登入
gif