Structure-based pharmacophore modeling 2. Developing a novel framework for structure-based pharmacophore model generation and selection

G protein-coupled receptors (GPCR) are a superfamily of membrane proteins that serve to transmit extracellular signals to intracellular effectors, typically through the binding of an extracellular ligand. These receptors play a role in many physiological pathways (such as blood pressure and immune response regulation) and disruption of their signaling can lead to the manifestation of conditions such as asthma, ulcers, and hypertension [1]. Consequently, GPCR are drug targets of immense interest, with approximately 35% of FDA-approved drugs acting upon these receptors [2]. Though GPCR have proven to be therapeutically important targets, identification of ligands for these receptors (a critical first step in drug discovery) faces a multitude of obstacles. For example, a majority of the known “druggable” GPCR are yet to be targeted by currently approved drugs [3], implying that novel methods of exploiting the therapeutic potential of these understudied GPCR are necessary. Furthermore, ligand discovery for GPCR is often impeded by a lack of knowledge concerning ligand activity and receptor structure. In regards to ligand activity, many GPCR lack known endogenous ligands (known as orphan receptors [4]), hindering exploration of a receptor's function and potential signaling pathways. Many of these orphan receptors also lack synthetic ligands, further obscuring their biochemical and physiological roles. Only 105 of the over 800 known GPCR in the human genome possess experimentally resolved structures in the Protein Data Bank as of October 24, 2022 [5,6], leading many drug discovery workflows to rely on modeled structures. Therefore, new methods of ligand elucidation for understudied GPCR targets are necessary, regardless of whether a three-dimensional structure of the target has been experimentally determined.

As an alternative to costly and time-consuming high-throughput random screening, virtual screening is often employed in GPCR ligand identification workflows to select subsets of screening candidates from large compound libraries. During the virtual screening process, pharmacophore models (spatial arrangements of chemical features capable of making interactions thought to be essential for receptor activity) are frequently utilized as templates to identify prospective ligands, effectively reducing the number of compounds considered for experimental screening. Pharmacophore models are typically constructed by extracting structural commonalities from sets of known ligands for a target, and are thus termed ligand-based pharmacophore models [7]. While these ligand-based pharmacophore models have exhibited success in prior studies [7], many GPCR lack sufficient numbers of known ligands to make this approach effective for ligand discovery. Alternatively, structure-based pharmacophore models can be established by probing possible interaction points with a three-dimensional structure of a macromolecular target to establish a collection of features thought to be necessary for biological activity [7]. Unlike ligand-based pharmacophore modeling, the only prerequisite for structure-based pharmacophore modeling is a target's three-dimensional structure, whether experimentally determined or modeled. Advances in GPCR structure determination by experimental [8] and modeling [9] methods has led to increases in the numbers of publicly available receptor structures as well, further increasing the applicability of a structure-based pharmacophore modeling workflow to GPCR ligand discovery.

Although past structure-based pharmacophore modeling studies have been successful in identifying active ligands for various targets [[10], [11], [12]], these studies often fail to consider cases where a target does not possess known ligands (e.g. orphan GPCR) or an experimentally determined structure. Thus, the work discussed herein describes a method of structure-based pharmacophore model generation that is applicable to any GPCR structure, whether experimentally determined or modeled. Furthermore, a priori knowledge of active ligands is not required, allowing for a truly structure-based method of pharmacophore model generation. Pharmacophore models were generated in experimentally determined structures, as well as homology models generated with our previously benchmarked GPCR modeling workflow [[13], [14], [15]], allowing for the assessment of pharmacophore search performance starting from a wider range of structure sources.

As described in the first paper in this two-paper series, our structure-based pharmacophore modeling workflow (Fig. 1) begins with output from a Multiple Copy Simultaneous Search (MCSS), which randomly places numerous copies of varied functional group fragments into a receptor's active site and then energetically minimizes each independently of the others to determine energetically optimal positions for each fragment [16]. The method described here differs from that in the companion paper [17] through application of a “score-based” fragment selection method prior to pharmacophore model generation. In this work, each iteration of pharmacophore model generation considers N+1 fragments placed with MCSS (starting with N = 0) that are first ranked using fragment-receptor interaction scoring and are then subjected to automated fragment selection based on distance cutoffs intended to emulate the placement and end-to-end distances of ligands that typically bind GPCR. This loop of sequentially importing score-sorted fragments and retaining/removing fragments from consideration based on distances continues until the pharmacophore model possesses 7 features, at which point it is considered complete.

Pharmacophore models were generated in experimentally determined and modeled structures of 13 target GPCR with known active ligands. While known ligands are not a prerequisite for score-based pharmacophore model generation, here they allowed for the calculation of the enrichment factor (EF) and goodness-of-hit (GH) scoring metrics to determine pharmacophore model performance. The first metric, EF, describes how many fold better a given pharmacophore model is at selecting active compounds when compared to random selection [18]. The second metric, GH, determines how well a pharmacophore model prioritizes a high yield of actives and a low false-negative rate when searching a compound database [18]. Though both scoring metrics are useful, we mainly focus on the EF metric since it is the most relevant to our lab's experimental work.

When using structure-based pharmacophore models to identify screening candidates, the selection of a single pharmacophore model or set of pharmacophore models to use as a search query is a critical step in the virtual screening process. While many publications detailing structure-based pharmacophore modeling protocols assess search performance in the context of protein targets with known ligands, pharmacophore model selection for targets with no known ligands is rarely discussed. Even if generated pharmacophore models identify active ligands for test case receptors (where active ligands are known), how does one select a pharmacophore model to apply to the majority of cases where a target lacks known ligands? For instance, structure-based pharmacophore modeling tools such as AutoPH4 [12] and Catalyst [19] demonstrate the ability to identify active compounds for protein targets with known ligands in artificial virtual screening workflows. However, the application of these structure-based pharmacophore modeling methods to apo protein structures often results in an overabundance of features in generated pharmacophore models, necessitating manual feature pruning that is likely to result in varied virtual screening performance when applied to GPCR with no known ligands [12,20]. For structure-based pharmacophore modeling tools that do implement automated methods of pharmacophore feature refinement (such as FLAP [21]), mixed results have been observed when they are applied to GPCR [22]. Thus, there is a clear need for a reliable method of selecting high-performing pharmacophore models for use in database searches to identify active compounds for GPCR with no known ligands.

Consequently, we explored two distinct methods of selecting score-based pharmacophore models that are applicable to any target. We first assessed whether a specific combination of variables explored during pharmacophore construction consistently produced high performing pharmacophore models. This first method, herein referred to as progressive variable selection, takes advantage of the range of variables (MCSS fragment set, score type used for sorting, etc.) considered when generating our score-based pharmacophore models. Upon determining that progressive variable selection did not consistently lead to the identification of high-performing pharmacophore models, we applied machine learning methods to the pharmacophore models. Our method of pharmacophore model selection via machine learning is novel in this context and relies on an ensemble machine learning workflow to identify pharmacophore models likely to possess higher enrichment values when applied in a virtual screening context (Fig. 2). In the companion paper, thousands of unique pharmacophore models were generated via the annotation of randomly selected functional group fragments placed with MCSS [17]. These models were used to train an ensemble method of pharmacophore model classification. This ensemble classification utilizes a “cluster-then-predict” workflow that has exhibited success in prior studies [23,24]. The first algorithm used in our cluster-then-predict workflow, K-means clustering, is a method of unsupervised learning used to separate data into k clusters [25]. Instances assigned to each cluster possess similar attributes, allowing for the identification of groups that have not been explicitly labeled in a dataset [25]. The second algorithm in our cluster-then-predict workflow, logistic regression, is a method of binary classification that uses a set of independent variables (predictors) to predict a categorical dependent variable [26]. In practice, logistic regression is used to model the probability of a certain class or event existing, allowing for the classification of observations in a dataset into 1 of 2 labeled classes [27]. Consecutive implementation of K-means clustering and logistic regression produced binary classification models capable of accurately identifying pharmacophore models likely to possess higher enrichment values. Since pharmacophore models generated for targets that lack known ligands cannot be scored with the EF metric, using logistic regression to predict score-based pharmacophore model enrichment class based on features of the pharmacophore models allowed for the identification of useful pharmacophore models even when active ligands were not known for a target.

Ultimately, the goal of this research is to develop a method of pharmacophore model generation that can use an experimentally determined or modeled structure of any GPCR target as input, regardless of whether active ligands are known or not. Score-based pharmacophore models predicted to result in higher enrichment values with this workflow can be used to search databases of commercially available compounds, allowing for the identification of candidate ligands for the many orphan or understudied GPCR. While we exclusively discuss the applications of this work in the context of GPCR, this method of pharmacophore model generation can realistically be applied to any biological target, after appropriate training or validation of the cluster-then-predict classifiers. Overall, this work demonstrates the ability of our score-based pharmacophore modeling and binary classification workflow to generate and accurately select pharmacophore models predicted to result in higher enrichment values in both experimentally determined structures (12 of 13 cases) and homology models (9 of 13 cases). Furthermore, classification of score-based pharmacophore models generated in either structure type with our cluster-then-predict workflow resulted in accurate classification of an average of 82% of all pharmacophore models predicted to result in higher enrichment values, indicating that this workflow identified high proportions of higher enrichment pharmacophore models without guidance from known active ligands.

留言 (0)

沒有登入
gif