In-silico computational approaches to study microbiota impacts on diseases and pharmacotherapy

In latest years, with the speedy development of strategies in bioinformatics and life science, a massive quantity of biomedical information has been amassed, based on which researchers have evolved numerous computational procedures to discover potential associations between human microbes, drugs and diseases. This article offers a thorough analysis of current developments in identifying possible relationships among microbes, drugs and diseases using biological data and computational models.

Drug–microbe association

It should be noted that drugs can change the species diversity and function of microbial communities [36], and the number of drug-resistant bacteria is growing. In this line, microorganisms play a vital role in lowering the medications’ adverse reactions. Collectively, there is an urgent need to recognize the possible pharmaceutical-microbial associations [37]. In the rest of this section, the studies related to the prediction of microbe–drug relationships in the literature were reviewed.

Graph convolutional network (GCN)

Long, Y. et al. used various sources of biomedical information and created several networks (diagrams) for microbes and drugs. Then, they developed a novel ensemble framework of graph attention networks with a hierarchical attention mechanism for microbe–drug association prediction from the constructed multiple microbe–drug graphs, denoted as Ensembling graph attention networks for human microbe–drug association prediction (EGATMDA). Specifically, for each input graph, a graph convolution network is designed according to the node surface to learn to embed the nodes (e.g., microbes and drugs). To effectively integrate node embeds from multiple input diagrams, graph-level attention has been implemented to learn the importance of different input diagrams [38].

Graph attention networks

The proposed Graph Convolutional Network (GCN) based framework for predicting human Microbe–drug Associations (MDA), named GCNMDA is a convolutional neural network-based model for predicting drug-microbe interactions. Initially, a heterogeneous network is built to combine microbial gene information, drug chemical information, and microbe–drug interactions. Later, an RWR-based preprocessing mechanism is designed to extract effective properties. Finally, a CRF layer is generated in the GCN to enhance the learning of node representation for drugs and microbes so that similar nodes have similar representations. A layer of the CRF attention mechanism is designed to accurately collect representations from neighbors [42].

Heterogeneous network embedding representationAdjacency matrix

In this approach, the information obtained from the confirmed experimental results related to human microbe–disease (microbe–drug) is extracted from the corresponding databases for microbe–disease (microbe–drug) associations. Then, an adjacency matrix A ∈ Rnd*nm is created (nd and nm show the number of diseases (drugs) and the number of microbes, respectively) as follows:

$$aij=\left\ 1 & if\,association\,between\,disease (drug)\,di\,and\,microbe\,mi\\ 0, & else\end\right.$$

Similarity calculation and heterogeneous network

Various computational methods, that have been designed and proposed to predict microbe–disease (microbe–drug) data, are mentioned in the previous sections. The approaches can be classified into two groups: (i) those that use known disease-microbe relationships to calculate microbe–disease similarity, and (ii) those that use extra data.

In a method for determining similarity based on microbe–disease associations, the adjacent matrix A ∈ Rnd×nm is used as the input, and the similarity matrix between microbial Sm ∈ Rnm×nm and the similarity between Sd ∈ Rnd×nd disease is used as the output. The similarity calculation methods are the same for diseases-microbes (drugs) and the methods include Gaussian interaction profile kernel similarity [43]. The following approaches can be implemented:

Cosine similarity: In Euclidean space, the cosine similarity measures the cosine of the angle between two interaction profiles. Having capitalized on this approach, a few studies were able to obtain the microbe and disease similarity matrix [21, 44].

Spearman correlation similarity: Spearman correlation coefficients as similarity ratings are calculated using sequences of positions or time points of pairwise microbes [45].

In a recent study, Wang et al. advanced a gene-based disease association approach based on neighbor-dependent similarity estimation. In most studies, after creating similar networks for diseases and microbes, researchers have used known microbe–disease associations through databases to construct the proposed models [46].

Two researchers have proposed a biased two-way network algorithm to predict the most likely microbe–drug relationships and increase the accuracy of the proposed model. Heterogeneous Network Embedding Representation framework for Microbe Drug Association (HNERMDA) is based on the representation of an embedded heterogeneous network via metapath2vec and the recommendation of a two-part network. To build heterogeneous networks, they capitalized on interactions between microbes and drugs, such as drug-microbe interactions [39].

KATZ measurements

Using known drug-microbe associations, a microbe similarity network is constructed by calculating the GIP core similarity of microbes. Due to the two similar networks and similar connections of known medicinal microbes, a heterogeneous network of medicinal microbes is created. An HMDAKATZ model is designed to predict drug–microbe communication [40].

Multi-modal variational graph embedding

A multi-modal variational graph embedding model for predicting microbe–drug associations (Graph2MDA) is a new technology that uses a graph autoencoder to predict microbe–drug interactions variational graph auto encoder (VGAE). Created multi-modal attributed graphs based on molecular structures, microbe genetic sequences, and function annotations of bacteria and pharmaceuticals. A deep neural network classifier was used to predict microbe–drug relationships [47]. Figure 2 represents the architecture of predicting microbe–drug relationships using a convolutional neural network model.

Fig. 2figure 2

The architecture of predicting microbe–drug relationships using a convolutional neural network model

Recruited datasets and approaches for prediction of microbe–drug associations

Previous studies on the microbe–drug relationship have used a variety of data.

Table 1 lists the data used to predict microbe–drug based on the information we reviewed.

Table 1 A list of all the data that fed into the microbe–drug association prediction

In addition, different approaches for predicting the relationship between microbe–drug are summarized in Table 2.

Table 2 Different methods to predict microbe–drug associationComparison and application of models to each other

Since predicting the interactions between microbes and drugs is a new field of study, few computational approaches have been proposed for this critical task. The various approaches for link prediction problems in the field of bioinformatics and the existing techniques for microbe–drug interactions are compared [38, 4850]. The Graph2MDA model had the highest AUC value, followed by LAGCN, while NTSHMDA had the lowest AUC value. Deep learning-based methods frequently outperform more traditional machine learning-based ones. The more effective method provides the following benefit over other models: Using multimodal feature graphs based on ontological information, multiple similarities between microbes and drugs, and their known relationships, methods may fully use many different sorts of features and links. Additionally, by incorporating the topological structure into multimodal feature networks, the impact of the cold start problem is lessened. Potentially mitigate the effect of similar noises [38].

Microbe–disease association

According to some new research [51], microbes are being increasingly linked to human pathogens. Disease-related microbe research aims to understand disease processes and the creation of novel diagnostic and therapeutic methods. Many theoretical models for predicting disease-causing microbes have been suggested. In the continuation of this section, we review the studies related to the prediction of microbe disease relationships that exist in the literature.

Path-based methods

Weighted meta-graph-based model on heterogeneous information network (WMGHMDA) have been presented to predict the relationship between diseases and microbes. Path-based approaches examine indirect pathways across networks, which often evaluate the weight of a prospective route as the score of unknown relation. The Meta-Graph search algorithm is run on the heterogeneous network to count the meta-weight patterns of each disease-microbe pair. Summing the contribution values of the related weighted Meta graphs yields the likelihood score for each pair of disease-Microbe [52].

BWNMHMDA (Bidirectional Weighted Network model Human Microbe–disease Association Prediction) is a new method for predicting the microbe–disease association based on the Bidirectional Weight Network. The main idea of this model is to produce a bidirectional disease-microbe communication network that converts them into matrices to compute the probability of correlation. It can be achieved by assigning weight to nodes and edges in the integrated network using the similarity of the Gaussian interaction profile kernel [53].

The PBHMDA (Path-Based Human Microbe–disease Association Prediction) proposes a new path-based prediction model for inferring potential microbe–disease associations. It is based on the main similarity of Gaussian interaction profiles for diseases and Gaussian interactions between microbes. A special depth-first search algorithm was designed in the model to ensure no duplicate nodes were found [54].

The KATZ measurement model was proposed to predict the Human Microbe–disease Association (KATZHMDA) Researchers combined the number of walks and their distances as an appropriate measure index for measuring the possible interaction likelihood between microbes and diseases. It is based on the graph constructed by the established microbe–disease association network, microbe similarity network, and disease similarity network [48].

By integrating several data sources and path-based HeteSim scores, Fan et al. developed a new method for predicting disease-microbe Multiple Data sources and Path-based HeteSim scores for Human Microbe–disease Associations (MDPH_HMDA) communication. The similarity of microbes was calculated by combining microbial functional scores and Gaussian core profile similarity. The similarity of the disease pairs was calculated using the similarity scores based on the symptoms. The HeteSim method has been used to obtain the relevance score and normalized measurement from each disease-microbe pair [55].

Random walk methods

For iterative walking, random walk methods use a graph-based transfer likelihood matrix. Niu et al. made a higher-order hyper graph sample to accurately determine the intrinsic association between microbes and human diseases. They develop a model based on the random walk on hypergraph for microbe–disease association prediction (RWHMDA). They ranked all-volunteer microbes for every perused human disease. Hypergraphs can efficiently mitigate data loss occurring in the normal graph methodology. For the generated hypergraph, the core similarity of the Gaussian interaction profile, random walk, and integration of known microbe–disease associations from the HMDAD database was performed [56].

A heterogeneous network by combining the Gaussian interaction profile microbial similarity network and the Gaussian interaction profile disease similarity network has been produced by known networks of microbe–disease associations. Then, a novel way for predicting the future microbial and disease relationships based on extensive optimized random walking was announced by introducing network topological similarity (NTSHMDA) [49].

Zou et al. have combined the microbial similarity network and the disease similarity network to generate a heterogeneous network. A two-random walk algorithm was implemented on the network generated by the Gaussian interaction profile's similarity and logistic transformation. A novel computational model to predict potential microbe–disease associations by bi random walk on the heterogeneous network. Developed a new computational model for predicting potential human microbe–disease associations by bi random walk in heterogeneous network (BiRWHMDA) [57].

Zhang et al. proposed the bi-direction similarity integration label propagation (BDSILP) method for predicting microbe–disease associations. Using the Mesh, the semantic similarity of the disease and the functional similarity of the microbes were calculated. With the help of integrated disease similarity and integrated microbial similarity, they have produced two graphs. And BDSILP does the label propagation on the graphs to score the pairs of disease-microbe. BDSILP accepts the weighted mean of their scores as final predictions [58].

The symptom-based likeness is calculated by the concurrence of diseases and the term symptoms. After calculating the similarity of the core of the Gaussian interaction profile of microbes based on known microbial disease associations, the similarity with the logistic function was obtained. Using the Similarity Network Fusion (SNF) method with similarity based on symptoms and the similarity of the core, the Gaussian interaction profile was calculated according to the known microbe–disease associations of the disease network. The two networks created for microbes and disease have been combined by well-known microbe–disease associations and used by BRWMDA (Bi-random walk microbe–disease associations) to predict potential new microbe–disease relationships through random walking with different stages in microbial and disease networks [59].

After extracting information about the disease and germs, microbial networks were built using Spearman, and the disease network was generated based on the symptoms. Then, by combining the networks formed, a heterogeneous network of disease microbes is formed. Shen et al. developed the random walk with a restart algorithm for the heterogeneous network, using the goal disease and corresponding microbes as seed nodes. They employed this algorithm to reveal the latent relationship between diseases and microbes [60].

A team of researchers has proposed a new model of extended random walking with restart optimized by Particle Swarm Optimization (PRWHMDA) based on human microbe–disease associations. Wu et al. used cosine to calculate the similarity of diseases and microbes. Then, by combining networks, they formed a heterogeneous interconnected network. They introduced the RWR method to obtain strong communications [44].

Wang et al. have proposed a novel computational model based on the bidirectional label propagation to predict potential human microbe–disease associations (NBLPIHMDA). The Gaussian interaction profile kernel similarity was applied to measure the disease similarity matrix along with the microbe similarity matrix. The edge weights of nodes in these two networks were determined. Bidirectional mark dissemination was used to achieve the association score matrix between diseases and microbes [61].

Using known connections from microbial network databases, disease networks and microbe–disease networks were created. A heterogeneous network was constructed using known microbe–disease associations from the database, the microbial network, and the disease network. Wang et al. then predicted novel microbe–disease associations by a new method called the double ended restart random walk human microbe–disease association model (DRWHMDA) implemented on the interconnection network [62].

Bipartite local models

Fundamentally, the bipartite local models work independently on both sides of a microbe–disease pair and can be combined to provide a conclusive prediction outcome. These approaches are capable of making independent observations on both the microbe and the disease fronts. The final scoring matrix is based on the combination of the probability scores related to user-based and case-based collaborative filtering [63].

Zou et al. proposed a model using a combination of a neighborhood-based prediction model and a graph-based recommendation model for human microbe–disease association (called NGRHMDA). The graph-based prediction model presents a two-step diffusion approach on the microbe–disease bipartite graph. Two new integrated adjacent matrices have been developed based on the similarity of symptom-based diseases and on the similarity of Gaussian-based microbes to consider microbial and disease similarities [64].

The core similarity of the Gaussian interaction profile for germs and disease was extracted from the microbe–disease linkage network. Then, constructing and minimizing the cost function for optimal classifiers in microbe and disease space turned it into an integrated classification. A semi-supervised computational model_Laplacian Regularized Least Squares for Human Microbe–Disease Association (called LRLSHMDA) was proposed by Wang et al. to predict disease-microbe relationships [65]. Based on known microbe–disease communication networks, a heterogeneous network was created from the HMDAD database for the main similarity of disease Gaussian interaction profiles and microbe Gaussian interaction profiles. Then, Bao et al. planned the Network Consistency Projection for Human Microbe–disease Association prediction model (called NCPHMDA) to discover potential disease-microbe associations [66]. The KATZBNRA model, like the KATZHMDA, was designed by Li et al. using the KATZ criterion and the core similarity of the Gaussian interaction profile for diseases and microbes based on the known associations. In addition, they utilized a bipartite (two-way) Network Recommendation (BNR) algorithm to increase the prediction accuracy more than KATZHMDA [67].

Matrix factorization methods

The theory behind matrix factorization is that the input matrix decomposes into two low-dimensional matrices and the product of the two low-dimensional matrices approximates the input matrix [68, 69]. Wu et al. discovered disease characteristics by combining two similarities based on the Gaussian kernel and one based on symptoms. The microbial properties have also been calculated using the similarity of the Gaussian kernel. They presented a computational model using matrix completion to predict the association of the human microbe–disease profile (called MHMDA) [70]. Chen et al. introduced a method for predicting microbe–disease associations based on the Kernelized Bayesian Matrix Factorization (KBMF), which is dependent on the Gaussian interaction profile kernel similarity for microbes and diseases [71].To compute the microbial similarity and similarity of the disease, Liu et al. used the similarity of the core of the Gaussian interaction profile and applied logical functions to adjust the similarity of the disease. Based on the known microbe–disease associations, they suggested a model for predicting microbial disease associations using the regular non-negative matrix factorization chart (NMFMDA) [72].

By merging the known disease-microbe associations and the similarity of the core of the Gaussian interaction profile, Shen and his colleagues offered a Collaborative Matrix Factorization for Human Microbe–disease Association Prediction (CMFHMDA) model [73].

For the prediction of human microbe–disease associations, a novel predictive model of graph regularized non-negative matrix factorization (called GRNMFHMDA) was developed by He et al. Microbe and disease similarity were initially calculated using symptom-based disease similarity and Gaussian interaction profile kernel similarity for microbes and diseases, respectively. To prevent a negative effect on prediction results, a preprocessing phase was used in which unknown microbe–disease pairs were given associated probability scores. Finally, a graph-regularized non-negative matrix factorization method was employed to concurrently determine the possible correlations with all diseases [74]. Qu et al. introduced a statistical model of matrix decomposition and label propagation for the Human Microbe–disease Association prediction (so-called MDLPHMDA) by integrating proven microbe–disease associations obtained from the HMDAD database, disease symptom similarity, and Gaussian interaction profile kernel similarity for microbes and diseases. Using the spare learning method (SLM) on the original association details derived from HMDAD, a new adjacency matrix of microbe–disease associations was developed, and possible microbe–disease associations were further predicted using the label propagation algorithm (LPA) [75]. A Deep Matrix Factorization Prediction (DMFMDA) model has been proposed by Liu et al. to predict the associations between microbes and diseases that do not require microbial and disease-like networks and is based on deep neural networks, which combine the linear modeling advantages of matrix factorization with the non-linear modeling advantages of a multi-layer perceptron [76].

Network based methodsGraph attention networks

Long et al. present a new graph-attention network-based model for microbe–disease association prediction (called GATMDA) in a bipartite network, combining inductive matrix completion (IMC). Researchers used functional similarities of microbes, functional similarities of diseases, and Gaussian kernel similarities to obtain comprehensive specifications for microbes and diseases. Graphic Attention Networks (GAT) then introduced a GAT criterion for learning to display nodes using talking heads, which helps maintain a more informative display model [77].

Liu et al. proposed a multi-component Graph Attention Network based system to predict microbe -disease association (MGATMDA). By using a node-level attention mechanism, the decomposer first decomposes the edges in a bipartite network to discover the latent components. The combiner then automatically reassembles these hidden parts to provide a coherent embedding for component-level attention prediction. Finally, a fully linked network is employed to forecast known and unknown connections between bacteria and diseases [78].

Models based on neural networks

Using the similarity of microbial classification, the similarity of microbial interaction characteristics and disease interaction, semantic similarities and disease symptoms, and known disease and microbial associations, Ma et al. have developed a new method (NinimHMDA) based on neural integration of neighborhood information in a multiplex heterogeneous network (MHEN)for different types of human microbe–disease association prediction [79]. Li et al. proposed a new back-propagation neural network model to predict microbial-disease association (BPNNHMDA). The model input is a matrix of known microbe–disease associations, and its output is a matrix of potential microbe–disease association probabilities. An activation function is built based on the hyperbolic tangent function to activate the hidden and output layers. The Gaussian interaction profile core for microbes has been employed to improve binding weights and increase training speed [80].

Network consistency projection and multi-data integration

Then Fan et al. combined the matrix created for microbes and diseases with the linear network integration method. Get an integrated similarity matrix for diseases and microbes, and by integrating this matrix, network cohesion prediction was created. Disease-microbe associations were detected by predicting network cohesion and analyzing privileges extracted from them. Human Microbe–Disease Associations Prediction (HMDA-Pred) is a network-based computational method that connects multiple similarity networks to an integrated linear network method and predicts the association of disease-related microbes based on the Network Consistency Projection (NCP) algorithm [81].

Link propagation based on node information

PENG et al. have proposed a computational model of node information-based link Propagation for human microbe–disease association prediction (LPHMDA) to prioritize disease-associated microbes. Using well-known associations between disease-causing microbes and similarities between them, the Gaussian interaction profile of the matrix has created a likeness for microbes. They have formed a disease similarity matrix by combining the symptoms of the disease [82].

Machine learning-based

Xu et al. proposed a new computational method based on the Kronecker regularized least squares (MDAKRLS) method, which is a machine learning approach, to identify potential associations of microbe–disease communication. To measure the microbial similarity of diseases, they introduced the similarity of the Hamming interaction characteristics. To construct two types of Kronecker similarities between pairs of microbes. Based on the well-known associations, they have calculated the similarity of Kronecker and the similarity of Hamming to disease-microbe pairs. To obtain prediction scores, Kronecker has designed at least four regular squares with different Kronecker similarities. They attained the ultimate forecast outcome by integrating the contributions of distinct similarities [83]. The architecture of predicting the microbe–disease relationship is shown in Fig. 3.

Fig. 3figure 3

The overall architecture of predicting microbe–disease relationship

Other methods

There are some methods in the literature that do not fit into any of the above-mentioned groups. As a result, these approaches are discussed in this section.

The microbe similarity was calculated from the Gaussian Interaction Profile (GIP) kernel similarity, which is based on the well-known microbe–disease associations. Disease similarity was calculated using the mean of GIP similarity, symptom-based similarity, and functional similarity of the disease. The matrix completion method was used by the Singular Value Threshold (SVT algorithm) to compute the scores of unknown communication between disease-causing microbes. Finally, a low-rank matrix completion(called the MCHMDA) model was proposed [84]. Shi et al. suggest a new predictive method based on the Binary Matrix Completion (called BMCMDA) to forecast possible microbe-noninfectious disease associations (MDAs) by grouping a series of microbe–disease associations into a binary Microbe–disease association matrix. The suggested method suggests that the observed incomplete microbe–disease association matrix is the sum of a latent parameterizing matrix and a noise matrix. It also provides a binomial model for sharing observations that occur independently of the microbial-disease association matrix [85].

The adaptive boosting for human microbe–disease association prediction (ABHMDA) was developed to explore the relationship between diseases and microbes. Due to the lack of sufficient information, the combination of microbial similarity of the GIP kernel and the similarity of symptomatic disease has been considered a feature of the experimental sample. Unknown associations have been used as negative examples as well as positive examples to maintain the balance between the samples during the decision tree training [86]. Lei et al. have proposed a model of microbial disease association with learning graph representations and a modified scoring mechanism on the heterogeneous network (called LGRSH). A heterogeneous network was shaped by combining microbial similarity networks, disease similarity networks, and known microbe–disease associations[87].

Recruited datasets and approaches for prediction of microbe–disease associations

Previous studies on the microbial-disease relationship have used a variety of data sources. Table 3 summarises the recruited datasets to predict microbial disease based on the information we reviewed.

Table 3 List of all the data that was utilized in the microbe–disease prediction

In addition, different approaches for predicting the relationship between microbes and disease are summarized in Table 4.

Table 4 Various approaches for predicting the relationship between microbes and diseasesAdvantages and disadvantages

The KATZ measure might rebuild probable links concurrently in a vast network, but the computation of GIP kernel similarity will always lead to a bias towards those known relationships. Although the label propagation and random walk algorithms are effective and simple to use, the majority of prediction techniques built on them tend to have less detail. However, when more data is added to the network, training the embeddings will become more challenging. The weighted network-based and heteSim-based methods are excellent at capturing potential subtle semantic associations, but they cannot predict a microbe (drug, disease) in the absence of any known associations. The methods based on matrix factorization can mine deeper potential connections. Matrix factorization has a relatively low spatial complexity because it saves storage space, but selecting the optimal parameters is more challenging. GCN improves the applicability of translation invariance to non-matrix-structured data but it has poor flexibility and scalability. GAT can effectively enhance the aggregation effect of graph neural networks, but it is difficult to aggregate higher-order neighbors. The pooling layer will lose a lot of valuable information and ignore the correlation between the local and the whole.

Challenges and prospects

Based on the existing studies, some valuable suggestions are provided for further improving predictive performances.

Integrating multiple types of data for a single task

In this review, we briefly summarized the advanced and widely used dataset of computational methods related to the problems of microbe–disease and microbe–drug prediction, respectively. To improve prediction performance, the most basic idea is to combine all of these commonly represented databases as a whole to predict any single problem, because they are all closely related In addition, other types of datasets were also introduced, for example, chemical structure-based and phenotype-based data widely used in predictions [88,89,90], symptom-based disease similarity, and disease semantic similarity in predictions [48, 55]. Certainly, it is a challenge to improve the performance of the prediction model to rationally integrate different types of bioinformatics data to target a prediction task.

Introducing new mechanisms

The majority of currently available computational methods improved their performance by enriching more entity similarities than the previous algorithm. In addition to this strategy, many other approaches, such as heterogeneous graph neural network (GCN) and attention mechanisms [91,92,

留言 (0)

沒有登入
gif