Graph neural networks for clinical risk prediction based on electronic health records: A survey

Electronic health records (EHRs) are extensive, heterogeneous, and longitudinal repositories that document patients’ health, including symptoms, prescriptions, clinical notes, and medical images. With the increase in EHR data collection, there is growing interest in leveraging this information to improve patient care, especially in the context of clinical risk prediction [1]. Recent machine learning approaches focused on predicting events such as disease diagnoses, mortality, and hospital readmissions have been relevant to this endeavor [2], [3].

Despite the rich information present in EHRs, translating it into actionable insights presents challenges due to data-related problems such as heterogeneity (multiple types of medical attributes describing a patient), high dimensionality (a large number of attributes associated with a patient), quality (missing values and inconsistencies) and temporal dynamics (numerous patient encounters and timestamped clinical events) [1], [4], [5], [6], [7]. Considering that the success of machine learning models depends largely on an adequate representation of the input data, studies in representation learning — the process of learning expressive representations of the input data for improved performance of predictors [8] — are paramount for effectively transforming patient data from the raw EHR format into adequate representations that fully capture their health status [1].

Recent deep learning techniques have effectively addressed these challenges. Unlike traditional machine learning approaches, which rely heavily on expert-driven feature engineering, deep learning models can automatically extract meaningful latent feature representations from complex raw data [9], [10], [11]. Among these, graph neural networks (GNNs) stand out. The goal of graph representation learning is to encode graphs into a low-dimensional vector space while preserving topology and node properties [12]. In this sense, GNNs are particularly adept at representing EHRs because they can capture the intricate relationships and dependencies between medical entities to generate rich, context-aware embeddings for further downstream tasks [13], [14]. This is a promising feature in contrast to other machine learning and deep learning algorithms, which often treat medical concepts as a flat “bag of features”, disregarding structural information and variable interdependencies during model development [15], [16]. Furthermore, GNNs are powerful in handling the high sparsity and frequent missing values found in EHR data, as they can respectively propagate information through the graph structure to densify representation and infer features based on the attributes of neighboring nodes in the medical graph [17], [18], [19].

The strength of GNNs lies in their capability to navigate the intricacies of non-Euclidean spaces [13], [20]. Unlike grid-based data structures such as images, which have inherent locality and consistent neighboring relationships, graphs often lack a natural node ordering, and the spatial proximity of nodes does not determine their relationships, making it more challenging to apply key operations such as convolutions [21], [22]. For example, EHR graphs can represent a dense web of patient histories, diagnoses, treatments, and other clinical outcomes, with heterogeneity in node types, nodes with varied degrees, and edges indicating co-occurrence, causality, hierarchical relations, and other relevant interactions, resulting in complex topological structures (Fig. 1). In this sense, GNNs offer the necessary flexibility to capture and exploit these relationships, yielding thorough representations and, consequently, improved interpretability and efficiency compared to other deep learning models [23].

Early studies on GNNs for clinical risk prediction aimed to take advantage of hierarchical medical information, structured as ontologies and knowledge graphs, as distant supervision [24]. They introduced label information through structured knowledge graph propagation, learning correlations between medical codes and paralleling them with codes observed in patients to obtain better predictions [24]. This approach enabled more accurate predictions than other deep learning baseline models [25]. Subsequent approaches started to prioritize the learning of novel graph representations based on EHR rather than the integration of knowledge graphs. Some of these representations include patient similarity, patient-medication interactions, and temporal relations between medical events [26], [27], [28]. Today, given the plethora of existing EHR multimodal information, heterogeneous graphs have also been used, including clinical notes, disease codes, medical images, and lab results into the learned embeddings, enriching the representations of the data used for critical health predictions [29], [30].

The manifold use of GNNs in EHRs represents a transformative paradigm in the landscape of clinical task predictions. GNNs, as powerful tools for modeling complex relationships within graph-structured data, have demonstrated remarkable efficacy in capturing intricate dependencies inherent in healthcare systems and will likely support future disruptive advances in this domain. Notably, recent studies have discussed the applications of deep learning in electronic health records. However, none have explicitly focused on using GNNs for clinical risk prediction based on EHR. For example, [7] have concentrated on temporal patient presentation, while [1], [11] have focused on general deep learning techniques for EHR. The most recent studies concerning graph representation have limited their scope to diagnosis prediction only [31] or did not focus on GNNs [32].

Thus, while preceding review papers have explored the broader landscape of deep learning applications in EHRs, a dedicated review addressing the intricacies of utilizing GNNs for clinical risk prediction based on EHRs remains an unexplored niche in the literature. The narrative review presented in this paper aims to bridge this gap by offering a targeted exploration of the advancements, open challenges, and potential future research directions in this specific application domain. Inspired by systematic protocols, the intention is to summarize the current scope and depth of the available literature, while also setting the stage for future systematic reviews as the field grows, providing a fundamental overview essential for advancing research in the area (see Table 1).

留言 (0)

沒有登入
gif