MD-GNN: A mechanism-data-driven graph neural network for molecular properties prediction and new material discovery

Molecular materials are widely used in medical and health, food, daily chemical industry, and other fields. So, accelerating discovery of new molecular materials is meaningful for promoting the development of science and society [1]. At present, the research of molecular materials is very time-consuming and requires huge demand of effort to determine a certain target property and optimize the synthesis conditions of molecule. The theoretical high-throughput computation method [2] is commonly used to predict the properties of molecules. This mechanism-driven calculation model with reasonable explanation can effectively accelerate the discovery of new materials. However, mechanism-driven calculation model is a theoretical model with the simplification of parameters. It ignores the effect of the factors such as material defects, the real environment, facilities, and skills of researchers. These factors may lead to the inaccuracy of the prediction. Recently, big-data-driven artificial intelligence methods are widely utilized in the areas of computer vision [3], natural language process (NLP) [4], medical science [5], social science [6], and transportation [7]. Due to the powerful non-linear ability and accessibility of big data of molecules, predictions of material properties based on machine learning and deep learning have received significant attention from researchers. At present, there are two main aspects of artificial intelligence methods in materials. One is descriptor-based machine learning prediction [1,[8], [9], [10], [11], [12], [13], [14]], which requires to find the descriptors with a strong correlation with their target properties; The other is graph neural network-based end-to-end deep learning model [[21], [22], [23], [24], [25], [26], [27]], which is a type of neural network that uses molecular graph structure as input, and the abstract information can be extracted from molecular graph structure to map to the target properties. However, graph neural network has the same issue as other machine learning methods which is lack of generalization and apt to the limit within the training data. Especially for new material discovery, the prediction of deep learning method may be outrageous. In addition, most of the graph neural networks are end-to-end with the use of graphs as input [[21], [22], [23], [24], [25], [26]] without making full use of the multi-modal data. The fact shows that when the real molecule is abstracted as the graph structure, it will lose part of the three-dimensional structure information and extranuclear electronic information. And it will lead to the inaccurate prediction of the results. In order to solve all the problems mentioned above, we propose a Mechanism-Data-Driven Graph Neural Network (MD-GNN) as main network to improve the accuracy of molecular properties prediction and to accelerate the discovery of new materials. The contributions are listed as following.

1.

Propose a general mechanism-data-driven framework called MD-GNN as main network for molecular properties prediction and new material discovery with high interpretability, generality, and accuracy.

2.

Attention-based message passing layers is proposed to extract information from molecules structure and feature fusion layers is proposed to fuse the extracted information with the numerical features to improve the accuracy of model prediction.

3.

A correction block is constructed in which mechanism-driven model is fused with data-driven model by modulating the output to integrate the calculation result and experiment data. It makes molecular property prediction more interpretable and improves the generalization of deep learning model.

留言 (0)

沒有登入
gif