Converting OMOP CDM to phenopackets: A model alignment and patient data representation evaluation

Electronic health records (EHR) offer valuable patient data for clinical and translational research, but their heterogeneous and inconsistent nature poses challenges to large scale computational research [1], [2], [3]. Data integration challenges hinder the interoperability and portability of EHR-based methods across institutional settings or when integrating multiple modalities [4], [5]. Further limited is the ability to integrate external biomedical knowledge with EHR data to derive disease and patient-level insights that enable precision medicine [6], [7]. Several efforts have sought to normalize patient data to facilitate efficient sharing and secondary use of EHR data [7], [8]. Common data models (CDM) do so by defining rules and expectations for how clinical data should be stored and represented. They specify for example how related data elements should be organized, which vocabularies are supported for various data types, and how missing data is handled. These design decisions are informed by the purpose of the data model, the types of information it is intended to store, and the analyses on that data it aims to support.

The Observational Medical Outcomes Partnership (OMOP) CDM was designed to enable population-wide large-scale observational research by mapping various clinical terminologies to create standard concepts [1], [9]. While a powerful research tool, further integration is needed to make OMOP data available for end-user applications such as real-time clinical decision support, for example by layering an OMOP database with an application program interface [10]. Additionally, further work is required to bridge OMOP data with external biomedical knowledge sources [6], [7]. Like the value of harmonizing standardized terminologies in OMOP, computable ontologies have been touted as the missing piece to integrate clinical data and broader biomedical knowledge to enable precision medicine [6]. Integrating external knowledge such as disease etiology or genomics with clinical data can enable computational efforts such as discriminating between phenotypes and disease trajectories [6], [7], [11].

Phenopackets is an emerging data standard built on ontologies that offers the infrastructure to integrate external biomedical knowledge as well as real-time integration at the point of care. Phenopackets was designed by the Global Alliance for Genomics and Health (GA4GH) to standardize the representation of a single patient’s phenotypic data, with a particular focus on data sharing and analysis in genomic and rare diseases [12]. Phenopackets provides a portable patient-level representation that can support multimodal data integration, patient similarity analyses, and other biomedical research efforts that deepen our understanding of disease [13], [14]. By connecting to ontologies that organize biomedical knowledge, such a tool bridges disease knowledge to patient-level data available in a clinical context to enable precision medicine [6], [13], [15].

While a promising tool, Phenopackets has largely been utilized in purely translational contexts, and its alignment with clinical data warrants further exploration to validate its potential in care [12], [15], [16]. Prior work has explored conversion between OMOP and Phenopackets at the schema level, but the mappings remain incomprehensive and do not evaluate the gaps and transformation process between these models when working with real-world data [16]. Particularly, an in-depth evaluation of model alignment is necessary to ensure data consistency across different representations. This study aims to provide the first comprehensive gap analysis between two widely used CDMs, OMOP and Phenopackets.

The first goal of this paper is to explore model alignment between the OMOP CDM and Phenopackets. We develop mappings between their respective domains, high level categories such as conditions and procedures, as well as their fields, which store relevant values for each domain. This entails identifying which data elements can be mapped between the data models and what is needed to conform to model specifications, such as ensuring compatibility of particular data types or the presence of required fields. Further, in cases where alignment between the models is ambiguous, we incorporate tools provided by the UMLS (Unified Medical Language System) to differentiate between semantic use cases of these data to inform mapping [17], [18]. Second, using clinical data stored according to the OMOP model, we aim to apply our developed mappings in a transformation pipeline to experimentally evaluate the transformation of real-world patient data to Phenopackets, an expert knowledge driven schema. We examine how this conversion will incur information loss in real-world application. By converting the OMOP-oriented data for Alzheimer’s disease (AD) patients to their Phenopackets representation, we aim to evaluate how Phenopackets as a patient data representation extends to common diseases, as opposed to special use cases for which it was largely developed (e.g. rare diseases, genetic diseases, or cancers).

留言 (0)

沒有登入
gif