FedFSA: Hybrid and federated framework for functional status ascertainment across institutions

Patients' functional status assesses their independence in performing activities of daily living (ADL), including basic ADLs (bADL) such as bathing, dressing, and transferring, and more complex instrumental ADL (iADL) like managing finances and meal preparation. The measures of functional status are important due to their previously reported associations with mortality and/or likely impact on aging-related outcomes in older adults and cancer survivors [1], [2]. For example, Narain et al. have shown that patients with lower functional status have as admitting diagnosis, decreased mental status, and nursing home admission compared to those with higher functional status [3]. Both National Cancer Institute and National Institute on Aging have deliberately emphasized functional status as a primary clinical measure of aging [4], [5].

Because functional status information can be measured both subjectively through surveys and questionnaire instruments and objectively through grip strength, gait speed, chair stands, and balance, the information documented in electronic health records (EHRs) largely remains in semi-structured or free text format [6]. Therefore, there is a strong need to leverage computational approaches such as natural language processing (NLP) to facilitate and accelerate the manual extraction and curation effort of functional status information. Researchers have developed NLP methods to identify functional status from clinical free text in EHRs. Bales et al. adapted an existing NLP system to enable automated assignment of selected International Classification of Functioning, Disability and Health (ICF) codes through lexicon and coding table modifications [7]. Agaronnik et al. utilized ClinicalRegex NLP software to search EHRs for functional status documentation, employing an ontology comprising 5 keyword categories to identify relevant information [8]. More recently, Newman-Griffis et al employed NLP methods to analyze patient functioning information in clinical documents related to federal disability benefit claims from the U.S. Social Security Administration. The final system achieved robust performance with over 80 % in F1-score on two datasets [9]. However, the tasks formulated in existing studies were neither specific individual ADL categories (e.g., bathing, dressing, and transportation), nor did they indicate the patients' impairment status (impaired vs. non-impaired). Building upon existing efforts, our goal is to expand the scope of the task by identifying both generic ADL concepts and all 15 specific ADL concepts, while predicting the corresponding impairment status. This expansion will be accomplished by leveraging real-world EHR data from four different healthcare institutions.

Identifying patient's impaired functional status across multiple unique categories from different institutions requires NLP solutions to simultaneously address multiple issues, including 1) multitasking, 2) word sense disambiguation, 2) documentation heterogeneity (e.g., different low frequency concepts across different sites), and 3) instrument-assisted documentation (i-Doc; i.e., language derived from unstructured or semi-structured instruments including templates, questionnaires, assessment forms, and smart forms). Based on the ICF standard, there are a total of 15 unique ADL categories (6 bADLs and 9 iADLs). These categories present heterogeneous semantics and requires large human efforts or training data in order to develop a comprehensive and robust NLP system. In addition to that, each ADL contains various synonyms and semantic variations in clinical notes. For example, bandage dressing and dressing supplies are commonly documented in clinical notes but has no indication of patient's dressing ability (i.e., the act of a person that dresses). In our previous study, which compared functional status documentation across three distinct EHR systems, we identified considerable information heterogeneity in the documentation of functional status. This included varying levels of context-specific textual information and documentation patterns across the three institutions [6]. The differences in workflow and instruments used to assess and document functional status information resulted in diverse textual documentation patterns across institutions, which brings additional challenge to the task.

Our study aims to solve these issues through a hybrid and federated NLP framework. A hybrid model is an NLP approach that integrates both rule- and machine-learning-based approaches into one system. This design offers learning advantages and improves customizability [10]. In addition to the hybrid architecture, federated learning (FL) is a decentralized approach that enables machine learning models to be collaboratively trained on a shared model while keeping data sources local. Beyond its privacy-preserving benefits, the FL approach can potentially enhance data quality and diversity by accessing data sources from various healthcare institutions and therefore resulting in a more robust model representative of the diverse population characteristics. Our study aims to leverage the advantage of both approaches to accurately detect the patient's impaired functional status from multiple EHR systems. Key contributions of the study include:

Proposing a generalizable framework that combines customizable symbolic methods with state-of-the-art pretrained language model components to achieve distributed federation for clinical NLP tasks.

Demonstrating an end-to-end implementation process in real-world multisite settings and highlighting unique pragmatic implications and lessons learned.

Conducting comparative experiments to validate the advantages of FL over non-FL modeling, along with in-depth analyses summarizing lessons applicable to other use cases beyond functional status ascertainment.

留言 (0)

沒有登入
gif