Integrating pharmacogenomics and cheminformatics with diverse disease phenotypes for cell type-guided drug discovery

A gene-centric approach to building cell type-specific gene-perturbation networks

The recent large-scale CMap powered by the L1000 assay (referred to as LINCS-CMap or CMap 2.0) [18] contains gene expression measurements across the transcriptome in response to thousands of small molecule perturbations (“perturbagens”). These measurements were taken in a wide array of cell lines and over varying numbers of experiments (“instances”) encompassing different doses, time points, and replicate numbers. As a feature of this multidimensionality and heterogeneity, the LINCS-CMap data offer unique opportunities and challenges in building global, cell type-specific gene-perturbagen networks for computational drug repositioning and discovery. However, efforts aimed at building these types of networks, while addressing the potential data biases adequately, remain limited. Notably, two recent approaches that aim to identify differential gene-perturbagen pairs, namely the Moderated Z-score (MODZ) [18] and the Characteristic Direction (CD) measure [35, 44], have sought to form a weighted consensus among experimental replicates in LINCS-CMap. As such, they can be regarded as “perturbagen-centric” approaches that can be extended to build cell type-specific gene-perturbagen networks.

To address the challenges in reliably defining cell type-specific gene-perturbation networks while exploiting the full extent of the next generation CMap, we used LINCS-CMap Level 4 data. Level 4 data retain the information on individual measurements (i.e., instances), which may be for different doses, time points, or technical replicates, without collapsing them to a single value per perturbagen (Fig. 2A; see Additional File 1: Supplementary Note 1). We devised a gene-centric network-building strategy, which we call QUantile-based Instance Z-score Consensus (QUIZ-C), that identifies the perturbagens that (i) differentially and (ii) consistently modulate the expression of each gene (Methods). Briefly, for each gene, we assigned a “perturbagen z-score,” \(pZS\), to each perturbagen instance against the background of all perturbagen instances on that gene (Fig. 2B). The fraction of genes that might have few or no significant perturbagens affecting them, a condition that might bias the \(pZS\) values, was between 0.05% and 1.5% across all cell lines (Additional File 1: Tables S2-3, Additional File 1: Fig. S4; see Additional File 1: Supplementary Note 2). The majority of perturbagens in LINCS-CMap have multiple experimental instances (Additional File 1: Fig. S1), which permits variability between the \(pZS\) values of a perturbagen, primarily due to experimental replicates and, to a limited extent, due to different doses and time points (Additional File 1: Fig. S5; see Additional File 1: Supplementary Note 1). To prevent spurious gene-perturbagen associations and capture high-confidence gene-perturbagen pairs, we required a sufficient number of differential instances of a perturbagen to view it as significantly affecting a gene. To determine this number, we defined a flexible consensus threshold based on the \(pZS\) distributions (Fig. 2C). Perturbagens that passed this flexible consensus threshold were considered differentially affecting the expression of the gene in question (Fig. 2C, D). Repeating this procedure for all genes and perturbagens in every cell line and aggregating all the differential gene-perturbagen links (Fig. 2D), we built global cell type-specific gene-perturbagen networks (Additional File 1: Fig. S6, Additional File 1: Tables S4-5; see Additional File 1: Supplementary Note 3 on the topological properties of these networks). To enable a direct comparison with our cell type-specific gene-perturbagen networks (henceforth, referred to as QUIZ-C networks) in our downstream analyses, we also built cell type-specific gene-perturbagen networks using the MODZ and CD approaches (Additional File 1: Supplementary Methods).

QUIZ-C networks reflect the biological diversity of LINCS-CMap cell lines

QUIZ-C networks include a large proportion of the small-molecule perturbations tested on the CMap cell lines, with 41 out of 70 cell type-specific networks having over 80% of the perturbagens tested in each respective cell line (Fig. 3A). We observed that the largest hub perturbagens (i.e., perturbagens modulating the expression of a large number of genes) are generally cell type-specific and are not shared by many cell lines, whereas perturbagens with fewer targets can be present in a large number of cell lines (Fig. 3B). By contrast, genes targeted by many drugs were observed in a large number of cell lines, while genes targeted by fewer drugs were observed in fewer cell lines (Pearson’s r = 0.62) (Fig. 3C). These findings on hub drugs and targets were recapitulated in CD and MODZ networks (Additional File 1: Figs. S7A-D), indicating that this feature is independent of the specific network-building approach and, rather, reflects the underlying biological differences between distinct cell lines and their transcriptional responses to small-molecule perturbations. Since the L1000 assay relies on the 978 landmark genes to infer the expression of the remaining genes, we tested the enrichment of landmark and inferred genes in our networks and compared their in-degree distributions (Additional File 1: Fig. S8; see Additional File 1: Supplementary Note 3). In 13 cell lines, landmark genes were statistically overrepresented (Additional File 1: Table S6). In terms of individual gene-perturbagen interactions, we found little overlap between pairs of cell types (Additional File 1: Fig. S9; see Additional File 1: Supplementary Note 3). In terms of the direction of effect (i.e., up- or down-regulation of the gene by the perturbagen) of the overlapping edges across cell lines, we observed a high level of agreement between cell lines in QUIZ-C networks, which was supported by MODZ networks and, to some degree, by CD networks (Additional File 1: Fig. S9F). This observation suggests that gene-perturbagen pairs that are conserved across cell lines tend to have the same directional effect.

Fig. 3figure 3

Properties of QUIZ-C networks. A The proportion of LINCS-CMap drugs tested in each cell line that was in the QUIZ-C network of that cell line. B The number of cell type-specific QUIZ-C networks a given drug appears in, as a function of the mean normalized out-degree of each drug across all QUIZ-C networks. Every circle represents a drug, and circle size is proportional to the standard deviation of the normalized out-degree. C The number of cell type-specific QUIZ-C networks a given gene appears in, as a function of the mean normalized in-degree of each gene across all QUIZ-C networks. Every circle represents a gene, and the circle size is proportional to the standard deviation of the normalized in-degree. D Reducibility of QUIZ-C networks. Left panel shows the relative entropy quality function \(q(\cdot)\) of each hierarchical clustering branch, with the maximum value of \(q(\cdot)\) corresponding to the optimal configuration of the aggregation of layers. Right panel shows the hierarchical clustering dendrogram and the optimal clustering threshold. Cell lines belonging to the same cluster are in the same color, and clusters consisting of only one cell line are shown in light grey

To supplement our findings on the structural uniqueness of QUIZ-C networks, we used a recent method that leverages information theoretic measures to quantify the degree of information redundancy (or “reducibility”) in multilayer networks (Additional File 1: Supplementary Methods) [36]. QUIZ-C, MODZ, and CD networks can each be represented as a multilayer network in which every layer is a cell type-specific network. Within this framework, QUIZ-C networks were largely non-redundant with a reducibility metric \(\chi\) of 0.32 (Fig. 3D), whereas MODZ and CD were highly reducible with reducibility values of 0.93 and 0.73, respectively (Additional File 1: Fig. S10). This finding suggests that, while most cell lines in MODZ and CD can be grouped together in terms of their topological similarity, the same cannot be said about QUIZ-C networks. Given the genetic and ontological diversity of the cell lines involved, which we have verified by quantifying their single tandem repeat (STR) profile similarity (Additional File 1: Fig. S11) and Cell Line Ontology (CLO) similarity (Additional File 1: Fig. S12) (Additional File 1: Supplementary Methods), this finding suggests that QUIZ-C networks remain sensitive to and thereby reflect differences among cell lines, a property that has important implications in disease-specific (rather than endophenotype-specific) drug development.

Overall, these results suggest that QUIZ-C networks have sufficient representation of drugs tested in CMap, are distinct in terms of their strongest perturbagens, capture non-overlapping sets of perturbagen-gene pairs, and contain non-redundant information between cell types. These findings support the cell type-specificity of the QUIZ-C networks and suggest that they can serve as a good substrate on which to carry out in silico drug prediction.

Drug repurposing and discovery potential of QUIZ-C networks

Using the Drug Repurposing Hub (DRH) [42], we inquired about the mechanism of action and clinical phase of drugs in LINCS-CMap-derived networks (Additional File 1: Supplementary Methods). More than 50% of drugs in 50 out of 70 QUIZ-C networks had a known mechanism of action, and these percentages were similar in MODZ and CD networks (Additional File 1: Fig. S13A). Drugs with a known mechanism of action were over-represented in QUIZ-C networks, with the exception of the nine “core” cell lines used in the Touchstone dataset [18] (Fig. 4A), whereas drugs with a known mechanism of action were under-represented in the MODZ and CD networks (Additional File 1: Figs. S13B-C). These results indicate that the “core” cell lines can be mined for novel therapeutic agents, whereas the other cell lines may be used to identify drugs with repurposing potential. In terms of clinical development, we found a balanced representation of experimental drugs (drugs in the preclinical phase), investigational drugs (drugs in clinical development phases 1 through 3), and approved (launched) drugs (Fig. 4B). This distribution was also observed in MODZ and CD networks (Additional File 1: Fig. S14), suggesting that CMap-based gene-perturbation networks offer a combination of drug repurposing and novel drug discovery opportunities with drugs represented across all stages of drug development.

Fig. 4figure 4

A The odds ratio of enrichment of drugs in each QUIZ-C network that have MoA information in the Drug Repurposing Hub. Error bars indicate 95% confidence intervals. Blue text indicates significantly under-represented, red text indicates significantly over-represented, and black text indicates neither over- nor under-represented. B Clinical development phase breakdown of the drugs in each QUIZ-C network. C The topological drug specificity of QUIZ-C. MODZ and CD networks. D The odds ratio of enrichment of known drug-target interactions in each QUIZ-C network. Error bars indicate 95% confidence intervals and cell lines in red indicate a significant enrichment

Finally, combining the number of cell lines and out-degree, we defined “topological drug specificity” as \(1/(_* _^\rangle }_)\) where \(_\) is the number of cell lines in which perturbagen p is present and \(_^\rangle }_\) is the mean normalized out-degree of the perturbagen over cell lines. Hence, drugs that are exclusive to a few cell types and have few target genes have high topological specificity. QUIZ-C networks had significantly higher topological drug specificity compared to both MODZ and CD networks (two-sided Mann–Whitney U test p-value < 10−12 for both) (Fig. 4C).

QUIZ-C networks have a higher enrichment of drug targets than other CMap-derived networks

We hypothesized that the degree of enrichment of known drug-target interactions would be an additional indicator of the utility of these networks for predicting novel and repurposed drugs. We used known drug-target interactions from the literature to determine the enrichment of known interactions among the gene-perturbagen edges in CMap-derived networks. To maximize coverage of known drug-target interactions, we combined the DRH and the Drug-Gene Interaction Database (DGIdb) [43] (Additional File 1: Supplementary Methods). The addition of DGIdb data increased overall coverage (Additional File 1: Fig. S15A). Forty-four percent (31/70) of QUIZ-C cell type-specific networks had a significant enrichment (two-sided Fisher’s exact p-value < 0.05) of known drug-target interactions (Fig. 4D) compared to 24% (17/70) and 18% (13/70) for MODZ and CD networks, respectively (Additional File 1: Fig. S15B-C). In terms of the difference of log odds-ratios, 73% and 72% of cell lines had a higher enrichment of literature-evidenced drug-target interactions in QUIZ-C compared to MODZ and CD, respectively (Additional File 1: Fig. S15D-E). Together, these results suggest the potential utility of QUIZ-C networks in drug discovery and repurposing.

Using pathophenotypic profiles to determine the congruity between input signatures and perturbagens

The discovery and repositioning of potentially therapeutic drugs using CMap methods have largely focused on the direct comparison of input gene signatures (which can be associated with a disease, treatment, or any other biological context) with perturbagen-induced gene signatures [9]. As such, this approach provides a local view of correlation between the two gene sets that is limited to the biological context of the input gene signature and is, therefore, agnostic to the larger network of gene-perturbagen interactions in other disease contexts. Gene-perturbagen networks derived from LINCS-CMap data, on the other hand, enable us to investigate the global effect of perturbagens on genes that are implicated in disease etiologies other than those of the input genes. Here, we exploited the fact that QUIZ-C and other CMap-derived networks can be used to study simultaneously the transcriptional response of multiple sets of disease-associated genes to a perturbagen in a cell type-dependent manner. In particular, we hypothesized that using a comprehensive network of human disease signatures (pathophenotypes) as a common basis for similarity and dissimilarity between perturbagens and input signatures will facilitate and provide additional insights into the prioritization of novel and repositioned drugs.

Here, we describe the PAthophenotypic COngruity Score (PACOS), which uses CMap-derived gene-perturbagen networks to predict small molecule and cell line combinations that can enhance or repress a given input signature. PACOS is summarized in Fig. 5 and described in detail in the “Methods” section. In brief, we define the Signature Congruity Score (SCS) to quantify the agreement between two gene signatures in terms of the direction of effect (i.e., an increase or decrease in expression). As an enrichment metric that is akin to the odds ratio or the chi-square statistic (Additional File 1: Fig. S16), SCS enables us to assign a scalar value in the range [− 1, 1] to any perturbagen signature (i.e., gene-perturbagen edges of a given perturbagen in QUIZ-C or other CMap networks)-disease signature or input signature-disease signature pair (Fig. 5A). Extending the SCS calculation to all diseases in a large-scale, multi-sample disease-gene network consisting of 569 disease signatures (Methods), we define Pathophenotypic Congruity Profiles (PCPs), which are vectors composed of SCS values (Fig. 5B). We then calculate PCPs for each perturbagen in each cell type-specific CMap-derived network as well as for the input signature. Finally, we calculate PACOS, which quantifies the similarity/dissimilarity between the PCP values of any input signature and cell line-perturbagen pair (Fig. 5B). A positive PACOS indicates an “enhancing” perturbagen whose transcriptional effect is congruent with that of the input signature, whereas a negative PACOS indicates a “repressing” perturbagen whose transcriptional effect is incongruent with that of the input signature (Fig. 5B). A sensitivity analysis demonstrated that potential correlations between the intermediary disease signatures do not affect PACOS distributions (Additional File 1: Figs. S17-18, Additional File 1: Supplementary Note 4). Finally, given an input signature, we rank PACOS values across all perturbagens in each cell line, for which we calculate the area under the receiver operating characteristic curve (AUROC) using positives obtained from the Toxicogenomics Database (CTD) [30] via a directional enrichment approach (Fig. 5C, see the “Methods” section). These AUROC values are compared with those of permuted rankings to determine an empirical p-value for each cell line. This procedure enables us to create a ranking of both cell lines and perturbagens whereby the cell lines with statistically significant AUROC values can be queried for the most enhancing/repressing perturbagens (Fig. 5C).

Fig. 5figure 5

Overview of the PACOS algorithm. Blue squares indicate perturbagens, black hexagons indicate intermediary disease phenotypes, orange diamonds indicate input signatures, and pink circles indicate genes. A The signature congruity score, \(_^\), between each disease \(d\) and each perturbagen \(p\) in cell line \(c\) is calculated. The same procedure is repeated for the input signature \(g\) and each disease \(d\), resulting in \(_\). B Aggregating \(_^\) and \(_\) values for all intermediary diseases, we form the \(_^\) and \(_\) vectors, respectively. The Spearman correlation between \(_^\) and \(_\) yields the PACOS value for each perturbagen for the given input signature in the given cell line. C Given the PACOS ranking of perturbagens in each cell line, receiver operating characteristic (ROC) curves are generated for each cell line. This procedure is repeated for randomized ranks to yield an empirical p-value for each cell line, which can in turn be used to rank cell lines and then perturbagens within each cell line. D Workflow summarizing the ranking procedure Pathopticon uses for cell lines and perturbagens within each cell line

Benchmarking Pathopticon

To assess the prediction performance of our framework under various scenarios, we designed a benchmarking strategy that uses 73 gene signatures from the literature as input and drugs targeting each gene signature from CTD [30] (Fig. 5C, Additional File 1: Table S1; see the “Methods” section). We performed drug prioritization using the Pathopticon framework: we calculated the AUROC values for all benchmark signatures and cell lines, resulting in 73 × 70 AUROC values, and identified the cell lines with statistically significant AUROCs. Across all benchmark signatures, we observed a balanced representation of significant cell lines, pointing towards the lack of bias towards certain cell lines in our prioritization. We found that 68% and 71% of cell lines were significant (empirical p-value < 0.05) in at least one benchmark signature in both repressing and enhancing modes, respectively (Additional File 1: Fig. S19). Moreover, repressing and enhancing modes were complementary to each other with non-overlapping sets of significant cell lines.

In our benchmark, we were particularly interested in testing (i) how using QUIZ-C as the underlying gene-perturbagen network compares to using other CMap-derived networks; (ii) how PACOS compares with purely cheminformatic, yet input- and cell type-agnostic, measures; (iii) how PACOS compares with recent CMap-based prioritization methods such as L1000CDS2 [35] as well as other similar approaches such as DeepDRK [26] and Multiscale Interactome [22]; and (iv) how each of the above comparisons fare in terms of chemical diversity of the top predicted drugs. Below we present the results of these tests.

QUIZ-C networks surpass other CMap-derived networks in predictive performance within the Pathopticon framework

To compare the predictive capability of PACOS across CMap-derived networks, we focused on the best-performing cell line (i.e., the cell line with the highest AUROC value among significant cell lines) for each benchmark input signature. PACOS-QUIZ-C showed, on average, higher maximum AUROCs compared to both PACOS-MODZ and PACOS-CD in both “enhancing” and “repressing” mode; the difference was statistically significant (two-sided Wilcoxon signed-rank test p-value < 0.05) in all four comparisons (Fig. 6A). Overall, these data suggest that the underlying gene-perturbagen network derived from CMap has a non-trivial function in drug prioritization performance and that the cell type-specificity of QUIZ-C networks may play a role in improving prioritization performance in comparison to other CMap-based networks.

Fig. 6figure 6

Benchmark results. A Boxplots of the maximum area under the ROC curves across all benchmark input gene sets for different LINCS-CMap networks and methods. Each dot corresponds to a benchmark input gene set. B Boxplots of the number of positives (known drug targets) in the top 50 predictions for QUIZ-C against L1000CDS2. C Boxplots of the maximum area under the ROC curve values for QUIZ-C against DeepDRK and the Multiscale Interactome

Integrating pharmacogenomic and cheminformatic data improves cell type-specific drug prediction performance

Despite being frequently used as reliable predictors of potential novel or repurposed small molecules [45], ligand-based measures such as binding selectivity and tool score [46] derived from bioactivity data [41] largely focus on single protein targets. As such, these measures prioritize compounds by taking as input single targets rather than multiple targets [25]—an approach that is at odds with recent bioinformatic findings that drugs have, on average, 32 targets in the proteome [47]. In this sense, these cheminformatic measures are agnostic to disease context in the form of multiple disease-associated genes. They nevertheless provide useful information on chemical binding properties of a given drug, a feature that is crucial in drug development. In our benchmarks, tool score indeed performs better than random expectation (two-sided Wilcoxon signed-rank test p-value < 0.05) and similar to PACOS (Fig. 6A). Given the drawbacks of solely relying on pharmacogenomic data in drug discovery [48, 49], we hypothesized that cheminformatic and pharmacogenomic information are complementary and that their integration will be more beneficial in drug discovery than either approach alone. To test this hypothesis, we used a heuristic measure that optimally combines PACOS with tool score (Methods, Additional File 1: Fig. S3A-B). Supporting our hypothesis, the integration of PACOS and tool score significantly surpassed PACOS alone in all three cases in both enhancing and repressing mode (two-sided Wilcoxon signed-rank test p-value < 10−12 in all cases), with the PACOS-QUIZ-C-Tool measure outperforming both the PACOS-MODZ-Tool and the PACOS-CD-Tool measures in both modes (Fig. 6A). Moreover, these observations remained valid when we used other intermediary gene sets such as kinase perturbation signatures, transcription factor perturbation signatures, and tissue-specific gene expression signatures in lieu of disease phenotypes, implying that the choice of intermediary gene sets did not have a significant effect on prediction performance (Additional File 1: Fig. S3C). The integration with cheminformatic data also increased the number of significant cell lines identified by Pathopticon (Additional File 1: Fig. S20) compared to LINCS-CMap data alone (Additional File 1: Fig. S19).

As an additional external benchmark, we compared PACOS to a state-of-the-art method that also uses LINCS-CMap data to prioritize drugs, namely L1000CDS2 [35], as well as two other highly relevant approaches, the Multiscale Interactome [22] and DeepDRK [26]. Comparison with L1000CDS2 is important since L1000CDS2 is another tool that makes full use of LINCS-CMap data and simultaneously ranks drug candidates along with the relevant cell lines. Although they do not incorporate CMap data directly, a comparison with DeepDRK and the Multiscale Interactome is important since DeepDRK is also an integrative tool that integrates gene expression and cheminformatic data and Multiscale Interactome is also a tool that incorporates multiple pathways to interpret disease mechanisms. In terms of the number of known targets captured in the top 50 candidates, L1000CDS2 performed significantly better (two-sided Wilcoxon signed-rank test p-value < 0.05) than PACOS-QUIZ-C, and PACOS-QUIZ-C-Tool significantly surpassed L1000CDS2 for both enhancing and repressing modes (Fig. 6B, Additional File 1: Table S7). Similarly, PACOS-QUIZ-C and PACOS-QUIZ-C-Tool significantly surpassed (two-sided Wilcoxon signed-rank test p-value < 0.05) both DeepDRK and the Multiscale Interactome in all but one comparison (Fig. 6C, Additional File 1: Table S7).

Together, these results demonstrate the predictive advantage of integrating cheminformatic data with LINCS-CMap data for cell type-specific drug prioritization, and suggest that QUIZ-C networks best complement cheminformatic data compared to the other CMap-based networks.

High chemical diversity of the PACOS-prioritized drugs

While the top-prioritized compounds can be selected as potential candidates for downstream validation studies, a larger set of highly ranked drug candidates can also be thought of as a screening set of compounds. In this case, it is desirable that the set of drugs chosen for screening are structurally diverse rather than similar to ensure the exploration of a larger portion of the chemical space [25, 45, 50]. We, therefore, sought to quantify the structural diversity of the top-ranked drugs for each benchmark signature. Overall, the chemical structure similarity, measured by the Tanimoto coefficient (Additional File 1: Supplementary Methods), was low for all methods with a median value between 0.2 and 0.4 (Additional File 1: Figs. S21-24), in agreement with a recent report that suggested that setting a Tanimoto structural similarity threshold of 0.2 was an effective proxy for chemical diversity [25]. QUIZ-C had significantly lower (two-sided Mann–Whitney U p-value ≤ 0.05) Tanimoto similarity (therefore higher structural diversity) in the top-50 predicted drugs than MODZ and CD for more than half of benchmark signatures, with or without tool scores (Additional File 1: Table S8, Additional File 1: Figs. S21-22). Compared to L1000CDS2, QUIZ-C had a higher number of benchmark signatures with significantly lower Tanimoto similarity for three out of four comparisons (Additional File 1: Table S8, Additional File 1: Figs. S23-24). These results suggest that the top drugs predicted by Pathopticon hold the potential for being used as cell type-aware compound screening libraries focused on a specific input signature.

Pathopticon identifies potential therapeutic candidates targeting inflammation-related pathways involved in vascular diseases

To investigate the utility of Pathopticon to guide in vitro validation studies by using disease-associated omic data, we carried out two proof-of-concept studies on vascular diseases. First, we considered a hypothesis-driven approach where researchers identify a priori the cell types to be studied based on the pathophysiology of interest. Since Pathopticon was built on gene expression data, we prioritized the use of transcriptomic data as our input data type. Using RNA-seq data from patients with subclinical atherosclerosis [31], a clinically silent disease that is highly prevalent in middle-aged individuals [51], we first identified a gene signature for subclinical atherosclerosis that is associated with circulating immune cells, which was used in Pathopticon to identify compounds capable of regulating this gene signature (Methods). We then examined the cell types that were found to be significant by Pathopticon (Fig. 7A). To match the identified subclinical atherosclerosis gene signature, we searched cell types that would match the types of immune cells used in the original study [31]. We chose U937 monocytes since infiltrating monocytes differentiate into macrophages within atherosclerotic plaques and since the inflammatory status of monocyte-derived macrophages is a well-known determinant of atherosclerotic disease progression [52]. The top compounds identified by Pathopticon to regulate the subclinical atherosclerosis disease signature in a repressing manner are shown in Fig. 7B. Using a combination of three criteria, (i) commercial availability of compounds, (ii) previously known cytotoxic profile of compounds in monocytes, and (iii) scalability of validation experiments, we chose six of these compounds for experimental validation. Of the compounds selected, simvastatin, a widely prescribed drug for patients at risk for atherosclerotic disease, has well-known anti-inflammatory pleiotropic properties [53]. Similarly, cyclosporin A, a commonly used immunosuppressive agent, potentially regulates atherosclerosis risk [54]. AG-1478 is a chemotherapeutic agent with potential protective effects in atherosclerotic mouse models [55]. However, little remains known about the contribution of Neratinib, GW-408533, and TG-101348 within the context of inflammation associated with atherosclerosis. The intermediary disease signatures that overlapped with the subclinical atherosclerosis signature for each of these top candidates were highly enriched in pathways related to IL-17 signaling, innate immunity, and toll-like receptor (TLR) signaling (Fig. 7C, Additional File 1: Fig. S25). We then built a network consisting of subclinical atherosclerosis genes, intermediary pathophenotypes and their associated genes, and the six identified drugs and their targets in the QUIZ-C networks for U937 cells (Fig. 7D, Additional File 1: Fig. S26; see the “Methods” section). Pathway genes for the identified pathways had higher eigenvector and closeness centrality in this network compared to the other genes in the network (Fig. 7E, Additional File 1: Fig. S27), indicating that these pathways may be the principal mediating axes of the regulation of subclinical atherosclerosis by the six identified drugs (Methods). Based on these findings, we designed an in vitro validation experiment to test the downstream effects of the six drugs on some of the prototypical markers of these pathways. Lipopolysaccharide (LPS) is a well-known stimulant of TLR signaling within monocytes and macrophages. We, therefore, stimulated U937 cells with LPS and measured expression of genes selected from TLR signaling pathways. We identified that five out of six compounds were able to successfully reduce LPS induced IL-1β gene expression (Fig. 7F). On the other hand, none of the compounds reduced LPS-induced STAT1 gene expression (Fig. 7F). Interestingly, simvastatin reduced the expression of both NF-kB and the gene expression of its downstream target genes S100A8 and S100A9, which were the genes with the highest centrality (Additional File 3).

Fig. 7figure 7

A Ranking of cell lines based on their AUROC values. Dark red represents significant cell lines with empirical p-value < 0.05. B Top ten predictions based on PACOS + Tool combined score (blue circles). PACOS scores only (red circles) and tool scores only (brown triangles) are also shown. C The most highly enriched pathways identified by Pathopticon for each of the six drugs chosen for in vitro experiments. Darker colors indicate higher enrichment (i.e., lower two-tailed Fisher’s exact test p-values) and blank cells indicate not significant (p-value > 0.05). D Combined network showing the input disease (subclinical atherosclerosis) (pink nodes), the six identified drug candidates (teal nodes), the intermediary disease phenotypes (orange nodes), and the genes targeted by each drug/intermediary disease phenotype (purple nodes). Red and blue edges indicate an increase and decrease in gene expression, respectively. E Boxplots showing the closeness centrality distributions of the three identified pathways. ** indicates two-tailed Mann–Whitney U test p-value < 0.01 and *** indicates two-tailed Mann–Whitne

留言 (0)

沒有登入
gif