Network dynamics-based subtyping of Alzheimer’s disease with microglial genetic risk factors

Construction of a microglia molecular regulatory network

We sought to construct a comprehensive model of molecular interactions within microglia to elucidate the intricate processes contributing to the diverse pathological traits observed in AD. To capture the essential functions of microglia implicated in AD pathogenesis, we specifically focused on the phagocytotic, inflammatory, and anti-inflammatory pathways known to be core functions of microglia. In order to systematically analyze the crosstalk between the multiple pathways, we reconstructed a Boolean model for a molecular network of microglia based on published literature and TF-TG relationships based on single-cell RNA sequencing data (See Supplementary Information 1 for comprehensive information about the nodes, links, and Boolean logical rules governing the node states.). We inferred and selected key transcriptional factors, including BHLHE41, CHD2, IKZF1, SF1, XBP1, and YY1, for inclusion in the network (Supplementary Information 2). Additionally, we incorporated known AD risk factors into the microglial network to explore their effects on microglial pathways.

The constructed microglial network comprises 63 nodes and 214 links, each node labeled by its corresponding gene name where applicable (Fig. 2). Nine input nodes (amyloid beta, growth factors, APOE, TGFβ, TNFα, CLU, CX3CL1, and interleukins 1,6 and 4,10) represent various stimuli, activating different network model pathways. Additionally, seven output nodes (TNFα, TGFβ, interleukins 1/6, 4/10, autophagy, phagocytosis, and amyloid beta plaques) represent the phenotypes and cytokines generated by the network model.

Fig. 2figure 2

Microglia network model with risk factors. The constructed microglia network model was plotted using PowerPoint software. The network contains nine input nodes and eight output nodes. The color of the node indicates the pathway of the corresponding node. The input node is indicated with a white square, while the output node is represented by a hexagon. The color or shape of the arrow indicates the direction of the interaction. The final network contains 63 nodes and 214 links. The risk factors included in the network are represented in bold

To validate the inferred network links, we first checked the correlations between the source and target genes of each link in the microglia network using the original dataset. These correlations were much more significant than those between random gene pairs (Supplementary Information 3 A, Fig. S1, S2). We then checked whether the links in the microglia network were also inferred from other independent datasets. For this, we constructed two additional networks using single-cell RNA sequencing data: one from the Human Brain Cell Atlas and another from previously published literature which presents single-nucleus RNA sequencing data from iPSC-derived microglia-like cells, using the same methods as the original microglia network [14, 47]. When we checked the overlap of the links in these networks with the microglia network, we found that almost all the links in the microglia network were present in the newly constructed networks. These results indicate that the microglia network is robust and reliable (Supplementary Information 3B, Table S1).

To validate the accuracy of our constructed network, we conducted simulations by updating random initial states using Boolean logical rules using an unperturbed network state to emulate normal aging conditions. We compared this state to a network configuration that simulates AD by manipulating amyloid beta levels. By using known experimental results, perturbation simulations on nodes were performed, and changes in the model and experimental results were compared to ensure the appropriateness and fidelity of our network model (Supplementary Information 3B, Table S2).

Effects of multiple risk factors in probabilistic boolean simulations

Standard deterministic Boolean networks use fixed Boolean functions to determine the next state of each node, in contrast to probabilistic Boolean networks, which incorporate randomness by assigning probabilities to transitions between states. Deterministic Boolean networks are limited in their use of considering risk alleles due to the fact that the count of risk alleles for each risk factor ranges from zero to two, which cannot be implemented in deterministic Boolean models. Furthermore, due to their deterministic properties in deterministic Boolean models, combinations of risk factors are abrogated. To address this challenge, we devised three illustrative toy models, each implemented with both deterministic and probabilistic Boolean models.

The first toy model illustrated the combinatorial effect of two risk genes on a downstream node (c), where the logic gate “OR” determined the node’s state based on the inputs from upstream nodes (a) and (b). In the scenario where a risk factor upregulated one of the upstream nodes, the deterministic network struggled to discriminate the presence of an additional risk factor. However, the probabilistic framework demonstrated its ability to discern another risk factor’s presence by altering the output node’s average node activity (Fig. 3A). The second toy model showed the influence of two risk genes on a downstream node (c) but employed the logic gate “AND” for node determination. Here, the deterministic network faced difficulty distinguishing the number of risk factors when a downregulating risk factor was introduced into the system. In contrast, the probabilistic Boolean network exhibited greater flexibility in accounting for the impact of risk alleles, even in the presence of downregulation (Fig. 3B). The third toy model introduced a cascading three-node network, where the signal from an upstream node (a) propagated to nodes (b) and (c). In this setup, the deterministic Boolean network overlooked the effect of the upper-risk allele, leading to both networks producing identical outcomes (Fig. 3C).

Fig. 3figure 3

Probabilistic Boolean networks are able to discriminate the types of risk factors. Deterministic Boolean simulations produce the same results, regardless of a second Boolean activating risk allele (above). Probabilistic Boolean simulations provide distinct results and distinguish the presence of the second risk factor. (A) An example of when a third node is tied with the “OR” logic. (B) An example of when the third node is tied with the “AND” logic. (C) Three cascading nodes with two risk alleles in the upstream nodes. (D) A toy network model and its deterministic/probabilistic simulation results. Each risk allele is assumed to activate or inhibit the corresponding node by 50%

In a scenario where a feedback loop is present between two nodes (a) and (b), each with a downregulating risk factor, and an upregulating risk factor, deterministic simulations fix the average node expression levels to zero or a value of one (Fig. 3D). In deterministic simulations, node (c) would be in an “OFF” state and would lead to limited signal transduction, which would be unrealistic due to risk alleles rarely having deleterious effects on the corresponding risk genes.

These findings highlighted the limitations of deterministic Boolean simulations, as they fell short in representing the complex interplay of risk alleles in AD. In contrast, the probabilistic Boolean network emerged as a more promising approach, capable of considering both the location of risk genes and the count of risk alleles, thus offering a more accurate portrayal of the intricate relationships between risk factors in AD. This distinction is crucial, especially considering that the number risk alleles is known to elevate the risk of AD, and that risk alleles rarely induce complete inhibition or activation of its corresponding risk genes. The toy model signifies the importance of adopting a probabilistic framework for a more comprehensive understanding of AD pathogenesis.

Risk variant identification and patient-specific probabilistic boolean network simulations

In our study, we obtained risk alleles for each subject and obtained perturbation probabilities. We then perturbed corresponding risk genes to create subject-specific network models, as the construction and simulation of subject-specific network models allow the identification of differences in network dynamics. These networks were then subjected to probabilistic Boolean simulations to identify the specific combinations of risk factors influencing the cytokine profiles and phenotypes of microglia. Our investigation encompassed a total of 20 risk factors, well-established for their impact on AD risk (Table 1). Information regarding these factors, including risk allele location, frequency, odds ratio, risk allele identity, and reference allele, were sourced from the NHGRI-EBI-GWAS catalog [45]. Among these factors, CLU and APOE are exogenous agents not naturally produced by microglia but were included as input nodes within the microglia network model, given their significant influence on AD risk.

Table 1 Risk genes included in the network and information used risk variants

Subsequently, we generated subject-specific networks with varying logic, incorporating data on the number of risk alleles obtained from VCF files originating from the ROSMAP and MSBB projects (Supplementary Information 3 C) [74, 75]. We then incorporated an activation or inhibition ratio of 5% multiplied by the risk allele count for risk genes for the subject-specific networks. For example, if a risk gene has one risk allele, the activation or inhibition probability would be 5%, while if a risk gene has two risk alleles, the probability would be set to 10%. The selection of a 5% probability was guided by its ability to differentiate clusters without fixing average node activity to binary values of 0 or 1. APOE, owing to its significant role in AD risk, was assigned a higher probability of 15% for its regulatory effects during probabilistic network simulations. This approach, featuring upregulation or downregulation ratios, offered greater precision, acknowledging that risk alleles often induce partial increases or decreases in the expression of risk genes.

After setting the activation or inhibition probabilities, risk genes were stochastically updated based on one of the two logics within our probabilistic Boolean modeling framework. In the case of inhibitory risk alleles, the probability of inhibition was set at 10% when two risk alleles were present. With this defined probability, the risk node had a 10% chance of being turned into a FALSE value in the subsequent update step. Otherwise, the risk node followed the original logic of the node.

Leveraging probabilistic simulations allowed us not only to account for the count of risk alleles but also to simultaneously consider multiple risk genes. We employed 1,000,000 initial states for 500 steps of synchronous updates, with the average node expression at time steps 401 to 500 selected as a suitable measure, signifying sufficient convergence of network expression levels (Supplementary Information 3D, Fig. S3). Such simulations enabled the identification of combinations of risk factors influencing microglial phenotypes and cytokine production.

Classification of subjects based on subject-specific network dynamics

The classification of subjects based on network dynamics hinges on genetic alterations, constituting the most significant source of heterogeneity within microglia, consequently impacting their transcriptional signatures. Hence, for precision medicine, it becomes imperative to perform subject classification that incorporates the genetic backgrounds of individuals, which is possible using network simulations (Fig. S4).

Our approach centered on evaluating the average activity of each node in subjects to facilitate their classification. Based on these average node activities within the network, we successfully identified nine distinct subtypes among the subjects with different clinical phenotypes (Fig. 4A and B, Table S3).

Fig. 4figure 4

Identified patient subtypes and their characteristics. (A) The result of UMAP clustering on the average node expressions, using a resolution of 0.1, and their corresponding disease status. Each point represents a single subject obtained using the average node activity post-simulation. (B) The number of APOE4 alleles and the number of AD subjects are presented for each patient subtype. (C) A heatmap of the expressions with hierarchical clustering. Each row represents the average node activity for each subject obtained via probabilistic Boolean simulations. The columns represent the nodes included in the network

These subtypes can be broadly categorized into two major groups. The first category encompasses subtypes 0, 2, 3, 5, and 8 and is characterized by a pronounced association with inflammatory signatures. Conversely, the second category comprises subtypes 1, 4, 6, and 7, displaying non-inflammatory signatures (Fig. 4C). This classification aligns with earlier findings, which also reported a similar distribution of patients within these two groupings [76]. Of particular note, the high-inflammatory group exhibited lower activity in phagocytosis nodes, higher activity in autophagy nodes, and relatively reduced levels of amyloid beta plaques. The average expression of amyloid beta plaques between the clusters correlated with Braak staging, showing that the clusters appropriately represent clinical phenotypes. (Supplementary Information 3E, Fig. S5)

Further observation of APOE4 allele count revealed that subtype 2 displayed a higher presence of APOE4 alleles, and subtype 0 exhibited a lower presence of APOE4 alleles. However, other than the two subtypes, the overall impact of the APOE4 allele on subtype classification within the microglia network was relatively minor. The cognitive diagnosis score was highest AD in subtype 2 and lowest in subtype 8. Braak staging, on the other hand, was highest in subtype 8, while others showed small differences. CERAD score was lowest in subtype 2 and high in subtypes 0 and 7. Such results show that the subtypes identified through simulations show distinct clinical phenotypes (Fig. 5). These findings are consistent with previous reports that Braak staging does not completely match with AD diagnosis [77].

Fig. 5figure 5

Clinical phenotypes for each identified patient subtype. The clinical phenotypes, including CERAD score, Braak staging, and cognitive diagnosis, are represented (* p-value < 0.05, ** p-value < 0.01, *** p-value < 0.001) for each phenotype. In the case of the APOE4 allele count, the significance levels between subtype 0 and all other subtypes are significant (p-value < 0.001) and are not represented in the diagram for simplicity

To further validate our results, we compared the subtypes identified by our method with those previously identified by Neff et al. using RNA-sequencing data [78]. Neff et al. identified five subtypes, each with distinct characteristics related to amyloid beta binding, tau-related genes, and immune pathways. We found that our network dynamics-based classification results are largely consistent with previous classification results (Supplementary Information 3 F), indicating the robustness and reliability of our subtypes based on genetic risk factors.

Network analysis using an expanded Boolean network reveals core feedbacks for each subtype

Our investigation led to the identification of core feedback mechanisms that underpin the classification of subjects within each subtype, with a particular focus on the involvement of risk genes. In this context, mutations commonly present and instrumental in classifying the subtypes were utilized to discern the feedback loops involving risk alleles using an expanded Boolean network, which integrates the regulatory rules of each node in the original network (Fig. 6A and B) [79]. Three pivotal genes -SPI1, CASS4, and MEF2C- emerged as the key determinants in subject classification.

Fig. 6figure 6

Expanded network representation of core feedbacks for distinguishing subtypes. (A) A toy Boolean model and its corresponding logic. (B) The constructed expanded network. If node 1 has an activating risk factor and node 5 has an inhibiting risk factor, the core feedback is represented in red. The dots in the network represent a composite node, in which all the upstream nodes must be in an “ON” state in order for the composite node to be turned on. Core feedbacks of (C) subtypes 0, 2, 3, (D) subtypes 5, 8, and (E) subtypes 1, 4, and 7 are presented

For subtypes 0, 2, and 3, risk alleles in the SPI1 gene were identified, while no risk variants in the CASS4 gene were observed. It is noteworthy that SPI1 is a well-known master regulator of the inflammatory response in microglia [72, 80, 81]. The presence of risk variants in the SPI1 gene played a central role in creating a positive feedback loop. This loop had cascading effects on key nodes within the microglial network, including ROS, XBP1, and NFκB. Consequently, it activated downstream inflammatory nodes, such as IL1 and TNFα, while concurrently downregulating nodes associated with anti-inflammatory processes, such as IL4 and autophagy. Importantly, the absence of risk variants in CASS4 correlated with an absence of expression increase in the CASS4 node. This lack of expression enhancement was linked to a decrease in SRC expression. Additionally, the upregulation of SPI1 further contributed to a positive feedback loop by downregulating SYK expression, leading to the inhibition of phagocytosis levels, a crucial microglial function. In subtype 3, which lacked risk variants in the ABI3 gene, a distinct positive feedback loop involving ABI3 was present, and downregulated AKT expression was present. Simultaneously, increased activity in SPI1 led to elevated CD33 and PTPN6 node expressions while reducing SYK node expression. This decline in SYK expression further contributed to the positive feedback initiated by CASS4, SRC, LAT2, PLCG, and calcium ion nodes, further decreasing phagocytosis levels (Fig. 6C).

Subtypes 5 and 8 exhibited an upregulation of CASS4 and SPI1 due to the presence of risk alleles. Additionally, PTPN6 displayed an upregulation driven by the cascade of CD33 and PILRA. SRC exhibited an upregulation influenced by both PTPN6 and CASS4. The interplay among CASS4, SRC, and PTPN6 established a positive feedback loop that led to enhanced upregulation of the phagocytosis node. Furthermore, consistent with previous observations in other subtypes, the SPI1 feedback mechanism in subtypes 5 and 8 exerted inhibitory effects on autophagy and IL4 expression. Conversely, this same feedback loop activated the expression of inflammatory markers IL1 and TNF α (Fig. 6D).

In subtypes 1, 4, 6, and 7, no risk variants in the SPI1 gene were detected. Consequently, both nodes were deactivated due to the positive feedback loop between SPI1 and NFκB. This deactivation had cascading effects on downstream nodes, leading to the deactivation of BIN1, INPP5D, IRF1, and IKZF1 nodes while activating the IRF3 node. Specifically, in subtypes 4 and 6, where MEF2C risk alleles were absent, a cooperative effect between IRF3 and MEF2C was observed. Together, they activated the TGFβ node, initiating a positive feedback loop between TGFβ and MEF2C. This led to the deactivation of downstream inflammatory nodes, such as IL1 and TNF α, while activating the anti-inflammatory node IL4. In subtypes 6 and 7, where CASS4 risk variants were present, the activation of PTPN6 triggered a positive feedback loop involving CASS4, SRC, LAT2, PLCG, ERK, CD33, and PTPN6. This loop was further strengthened by the inhibition of PI3K, caused by the activation of PTPN6. Additionally, the inactivated SPI1 node led to the deactivation of the ROS node, which, in turn, further inhibited NFκB. This series of events reinforced the positive feedback loop centered around SPI1 (Fig. 6E).

To further verify the significance of the pivotal risk genes in subtype classification, we employed other independent publicly available gene expression data obtained from 35 subjects to confirm whether the clustering using pivotal risk genes, including SPI1, CASS4, MEF2C, ABI3, and INPP5D, on RNA sequencing data shows similar phenotype patterns to our network-based subtypes [82]. We found that by using the pivotal risk genes, phenotype marker patterns for phagocytosis, autophagy, inflammation, and anti-inflammation show similar patterns to the corresponding predicted node activities. This indicates that the pivotal risk factors play an important role in subject classification (Supplementary Information 3G, Fig. S6).

Subtype classification analysis unveiled intricate feedback loops within the microglial network shaped by the combinations of specific risk variants associated with AD. These feedback loops exert influence on the inflammatory, anti-inflammatory, phagocytotic, and autophagy responses within microglia.

Identification of control targets for each subtype to restore AD to a normal phenotype

Based on the previous results, we embarked on the critical task of identifying control targets within the microglial network that could potentially be manipulated to restore a normal phenotype in the context of AD. Our approach was rooted in network perturbation analysis and the quantification of specific scores to gauge therapeutic effectiveness, which was shown in previous studies that network-based stratification is effective in predicting therapeutic responses.

To assess the impact of network perturbations, we employed a phenotype score that considered the average expression of output nodes (IL1, IL4, TGFβ, TNFα, phagocytosis, autophagy), which represent key phenotypic characteristics. The goal was to calculate a score that could indicate how close a perturbed state was to a nominal AD state (phenotype score close to 1) or to a nominal normal state (phenotype score closer to 0). An AD state was defined for each subtype using the differences in risk allele count (Supplementary Information 3 H). The phenotype score was calculated as the sum of the absolute differences between the average node expression of a nominal state, without any perturbations present, and the average node expression post-perturbations, divided by the sum of the absolute differences between the AD node expression and the nominal node expression of the output nodes as shown below, where E_normal, E_control, and E_AD represents the normal state node expression, node expression post control, and the node expression in an AD state respectively.

$$\:\text\text\text\text\text\text\text\text\text\:\text\text\text\text\text\:\:=\:\frac$$

We selected network perturbations with the smallest phenotype score, using up to two perturbations for each subtype (Fig. 7; Table 2). The network simulations identified the inhibition of PICALM as a single target and the dual inhibition of LAT2 and PICALM for a double-target approach for the largest subtype, subtype 0. These targets were chosen based on their potential to counteract the distinct characteristics of cluster 0. Compared to control subjects, individuals with AD in cluster 0 exhibited elevated levels of phagocytosis and amyloid beta plaques, two key pathological features associated with the disease. Both single and double-target strategies reduced phagocytosis and amyloid beta plaques, effectively reverting the phenotype towards a more normal-like state (Detailed data on the other clusters can be found in Supplementary Information 3I, Table S4, S5). A literature search on the identified targets verified their potential as therapeutic targets (Table S6).

Fig. 7figure 7

AD scores and phenotype node activities for restoring a normal phenotype. (A) A bar plot showing the effectiveness of the restoration targets. A score close to 0 means that the phenotype was closer to that of a normal state. (B) The activities of the phenotype nodes for subtype 2 are represented in a radar chart. The phenotypes of an AD state (left), the effects of a single target (middle), and the effects of a double target (right) are presented, with the normal phenotype being represented in blue, and the AD phenotype and each phenotype after control being represented in red

Table 2 Optimal single and double targets to restore a normal phenotype

Furthermore, our study explored the effects of targets such as AKT and INPP5D. While these targets were not identified as the optimal reversion targets in all clusters, our findings demonstrated that their upregulation or downregulation could effectively reduce amyloid beta plaques in different genetic backgrounds, which aligns with previous research indicating that inhibition and activation of specific factors reduced amyloid beta plaques in different contexts (Supplementary Information 3 J). This underscores the complexity of microglial network regulation and the importance of considering network-based analysis when developing therapeutic strategies for AD.

留言 (0)

沒有登入
gif