A Roadmap for Using Causal Inference and Machine Learning to Personalize Asthma Medication Selection


Introduction

Asthma is a chronic disease characterized by inflammation, narrowing, and hyperactivity of the airways causing shortness of breath, chest tightness, coughing, and wheezing []. Asthma affects about 25 million people in the United States []. In 2021, there were 9.8 million exacerbations of asthma symptoms (or asthma attacks) leading to over 980,000 emergency room (ER) visits and over 94,500 hospitalizations []. Asthma costs the US economy over US $80 billion in health care expenses each year, work and school absenteeism, and deaths [].

Inhaled corticosteroid (ICS) is a mainstay treatment for controlling asthma and preventing exacerbations in patients with persistent asthma [] accounting for over 60% of people with asthma [,]. Many types of ICS drugs are used, either alone like fluticasone (Flovent, Arnuity, and Aller-flo), budesonide (Pulmicort, Entocort, and Rhinocort), mometasone (Asmanex), beclomethasone (Beclovent, Qvar, Vancenase, Beconase, Vanceril, and Qnasl), ciclesonide (Alvesco), and so forth, or in combination with a long-acting beta2 agonist like fluticasone/salmeterol (Advair), budesonide/formoterol (Symbicort), mometasone/formoterol (Dulera), and fluticasone/vilanterol (Breo), and so forth []. Regular use of appropriate ICSs improves asthma control and reduces airway inflammation, symptoms, exacerbations, ER visits, and inpatient stays [-].

Despite the widespread use of ICSs, asthma control remains suboptimal in many people with asthma [-] including 44% of children and 60% of adults based on asthma exacerbations in the past year [,], 72% of patients based on asthma control test [], 53% of children and 44% of adults based on asthma attacks in the past year [], and 59% of children based on the 2007-2013 Medical Expenditure Panel Survey []. Suboptimal control leads to recurrent exacerbations, causes frequent ER visits and inpatient stays, and is projected to have an economic burden of US $963.5 billion over the next 20 years []. Suboptimal control is due to multiple factors [-] including (1) failure to recognize and act on early signs of declining control [,], (2) lack of self-management skills, (3) nonadherence to therapy [], and (4) inappropriate ICS choice for the patient [-]. While interventions targeting other factors exist, less attention has been given to inappropriate ICS choice.

Asthma is heterogeneous with variable profiles in terms of clinical presentations (phenotypes) and underlying mechanisms (endotypes) [,]. Molecular techniques have revealed a few phenotype and endotype relationships, allowing the categorization of asthma into two main groups (1) T-helper type 2 (Th2)-high (eg, atopic and late onset) and (2) Th2-low (eg, nonatopic, smoking-related, and obesity-related) [,]. It is known that within the 2 groups, there are many subgroups [,] with different biomarker expressions (eg, immunoglobulin E [IgE], fractional exhaled nitric oxide [FeNO], interleukin [IL]-4, IL-5, and IL-13) []. So far, only a few biomarkers have been characterized for use in clinical practice. Despite a few successes using biomarkers for targeted therapy, ICS choice, especially in the primary care setting, is largely by trial and error and many patients remain uncontrolled [-].

Besides patient nonadherence and environmental factors, response to ICS treatment is affected by genetic variations in ICS metabolizing enzymes [,], regardless of whether the ICS is used alone or is combined with another asthma medication like a long-acting beta2 agonist. Single nucleotide polymorphisms in cap methyltransferase 1 (CMTR1), tripartite motif containing 24 (TRIM24), and membrane associated guanylate kinase, WW and PDZ domain containing 2 (MAGI2) genes were found to be associated with variability in asthma exacerbations []. Additional evidence supports that these genes also cause variability in ICS response []. Due to genetic variations in cytochrome P (CYP) 450 enzymes that metabolize over 80% of drugs including ICS, up to 50% of people with asthma have altered metabolism to certain ICSs [-] impacting asthma control [,]. CYP3A5*3/*3 and CYP3A4*22 genotypes were found to be linked to ICS response [,]. These studies provide evidence that genetic variations greatly affect ICS responsiveness, although the exact relationships between genetic variations and ICS response remain largely unknown [,,]. Currently, many candidate genes are being studied, and pharmacogenetics has not yet reached routine clinical practice in asthma care.

ICS choice for patients is often dictated by insurance reimbursement, organizational policies, or cost, leading to a one-size-fits-all approach [-]. Some insurers require patients to first fail on a cheaper ICS before authorizing a more expensive ICS []. Nonmedical switch due to preferred drug formulary change is common and leads to bad outcomes, with 70% of patients reporting more exacerbations after the switch []. Patients also often report that they tried a few different ICSs before ending up with the drug that gave them the most relief, with 60% reporting it was hard for their providers to find the effective drug [-]. Cycling through various ICSs delays the start of an effective ICS and is neither efficient nor cost-effective []. New strategies are needed to allow a faster and more efficient way to tailor ICS selection to each patient’s characteristics [].

While the biologic heterogeneity of asthma is vast, few, if any, biomarkers or genotypes can currently be used to systematically profile all patients with asthma and predict ICS response [,,]. Readily available electronic health record (EHR) data collected during clinical care offer a low-cost, reliable, and more holistic way to profile all patients [,]. With a high accuracy of 87%-95% [], machine learning models using EHR data have been used to profile patients in various areas, for example, to develop a phenotype for patients with Turner syndrome [], identify low medication adherence profiles [], find variable COVID-19 treatment response profiles [], and predict hypertension treatment response []. Yet, while machine learning has helped find various asthma profiles [-], no prior study has predicted ICS response. Also, prior studies are mostly from single centers with small sample sizes and have not moved the needle of precision treatment for asthma [,].

A decision support tool is greatly needed, especially in the primary care setting, to guide providers to select at the point of care the ICS that will most likely and quickly ease patient symptoms and improve asthma control. Forecasting which patient will respond well to which ICS is the first step toward creating this tool, but no prior study has predicted ICS response, forming a gap.

To shift asthma care from one-size-fits-all to personalized care, improve outcomes, and save health care resources, we make three contributions in this paper, supplying a roadmap for future research: (1) we point out the above-mentioned need for creating a decision support tool to guide ICS selection; (2) we point out the above-mentioned gap in fulfilling this need; and (3) to close this gap, we outline an approach to create a machine learning model and apply causal inference to predict a patient’s ICS response in the next year based on the patient’s characteristics. We present the central ideas of this approach in the following sections.


Creating a Machine Learning Model and Applying Causal Inference to Predict ICS ResponseOverview of Our Approach

We use EHR data from a large health care system to develop a machine learning model and apply casual inference to predict a patient’s ICS response based on the patient’s characteristics. As endotyping or genotyping all patients is infeasible, our model uses EHR data to characterize all patients and extract patterns that could mirror endotype or genotype. Our model is trained on historical data, and can then be applied to new patients to guide ICS selection during an initial or early encounter for asthma care. The optimal ICS choice identified by our approach can be either an ICS (generic name and dosage) alone or an ICS combined with another asthma medication like a long-acting beta2 agonist.

Both pediatric and adult patients with asthma are treated by primary care providers (PCPs) who are mostly generalists and asthma specialists including allergists, immunologists, and pulmonologists. Large differences exist between PCPs and specialists in terms of knowledge, care patterns, and asthma outcomes, with asthma specialists adhering more often to guideline recommendations [-]. A greater difference exists between PCPs and specialists in controller medication use []. Compared to PCPs, asthma specialists tend to achieve better outcomes [], including higher physical functioning [], better patient-reported care [], and fewer ER visits and inpatient stays [-]. As over 60% of people with asthma are cared for by PCPs [], our machine learning model primarily targets PCPs, although asthma specialists could also benefit from this model.

The asthma medication ratio (AMR) is the total number of units of asthma controller medications dispensed divided by the total number of units of asthma medications (controllers + relievers) dispensed [,]. Higher AMR (≥0.5) is associated with less oral corticosteroid use (a surrogate measure for asthma exacerbations), fewer ER visits and inpatient stays, and lower costs [-]. Lower AMR (<0.5) is associated with more exacerbations, ER visits, and inpatient stays [,]. Approved by Healthcare Effectiveness Data and Information Set (HEDIS) as a quality measure, AMR is widely used by health care systems []. AMR is a reliable reflection of asthma control and gives an accurate assessment of asthma exacerbation risk []. We use change in AMR as the prediction target of our model for predicting ICS response, as AMR can be calculated on all patients. In comparison, neither asthma control nor acute outcomes (eg, ER visits, inpatient stays, or oral corticosteroid use) is used as the prediction target, as the former is often missing in EHRs and the latter does not occur in all patients. An effective ICS will lead to less reliever use and increased AMR. An ineffective ICS will lead to more reliever use and reduced AMR. We formerly used EHR data to build accurate models to predict hospital use (ER visit or inpatient stay) for asthma [-]. We expect EHR data to have great predictive power for AMR, which is associated with hospital use for asthma [-]. Using the AMR can facilitate the dissemination of our approach across health care systems.

We outline the individual steps of our approach in the following sections.

Step 1: Building a Machine Learning Model to Predict a Patient’s ICS Response Defined by Changes in AMR

We focus on patients with persistent asthma for whom ICSs are mainly used. We use the HEDIS case definition of persistent asthma [,], the already validated [] and the most commonly used administrative data marker of persistent asthma []. A patient is deemed to have persistent asthma if in each of 2 consecutive years, the patient meets at least one of the following criteria: (1) at least 1 ER visit or inpatient stay with a principal diagnosis code of asthma (ICD-9 [International Classification of Diseases, Ninth Revision] 493.0x, 493.1x, 493.8x, 493.9x; ICD-10 [International Classification of Diseases, Tenth Revision] J45.x), (2) at least 2 asthma medication dispensing and at least 4 outpatient visits, each with a diagnosis code of asthma, and (3) at least 4 asthma medication dispensing. In the rest of this paper, we always use patients with asthma to refer to patients with persistent asthma. The prediction target or outcome is the amount of change in a patient’s AMR after 1 year. The AMR is computed over a 1-year period [,].

We combine patient, air quality, and weather features computed on the raw variables to build the model to predict ICS response. Existing predictive models for asthma outcomes [-,-] rarely use air quality and weather variables, but these variables impact asthma outcomes [-] (eg, short-term exposure to air pollution, even if measured at the regional level, is associated with asthma exacerbations [-]). For each such variable, we examine multiple features (eg, mean, maximum, SD, and slope). We examine over 200 patient features listed in our papers’ [-] appendices and formerly used to predict hospital use for asthma, which is associated with AMR [-]. Several examples of these features are comorbidities, allergies, the number of the patient’s asthma-related ER visits in the prior 12 months, the total number of units of systemic corticosteroids ordered for the patient in the prior 12 months, and the number of primary or principal asthma diagnoses of the patient in the prior 12 months. We also use as features the patient’s current AMR computed over the prior 12 months [,], the generic name and the dosage of the ICS that the patient currently uses, and those of the long-acting beta2 agonist, leukotriene receptor antagonist, biologic or another asthma medication, if any, that is combined with the ICS.

Step 2: Conducting Causal Machine Learning to Identify Optimal ICS Choice

Our goal is to integrate machine learning and G-computation to develop a method to estimate the causal effects of various ICS choices on AMR for patients with specific characteristics. This causal machine learning method [] processes large data sets by capturing complex nonlinear relationships between features, thereby revealing the cause-and-effect relationships between ICS choice and change in AMR. We use the machine learning model built in step 1. Using G-computation [,], an imputation-based causal inference method, we estimate the potential effects of hypothetical ICS choices with specific dosages on changes in AMR after 1 year. G-computation builds on the machine learning model of the outcome as a function of ICS indicators, ICS dosages, and other features to predict AMR outcomes under different counterfactual ICS choice scenarios. CIs are estimated through 10,000 bootstrap resampling with replacement [].

We apply causal machine learning to estimate the impact of ICS choices on patients with specific characteristics by averaging predicted AMR after 1 year for a given ICS and these characteristics across all participants. This estimation is contrasted with the averaged predicted outcome in the absence of any ICS choice. The ICS choice with the highest and statistically significant contrast estimation is identified as the optimal choice for patients with these characteristics. All hypotheses can be tested at a significance level of .05.

Step 3: Assessing the Impact of Adding External Patient-Reported Asthma Control and ICS Use Adherence Data on the Model’s Predictions

EHRs have limitations regarding patient-reported data with extra predictive power such as asthma control and ICS use adherence. For asthma, asthma control and ICS use adherence are critical variables, as (1) a patient’s asthma control fluctuates over time and drives the provider’s decision to prescribe or adjust ICSs and (2) ICS use adherence impacts the patient’s asthma control and helps assess whether the patient is actually responding to an ICS. However, despite their high predictive power for patient outcomes, these variables are not routinely collected or included in EHRs in clinical practice. At Intermountain Healthcare, the largest health care system in Utah, we pioneered the electronic AsthmaTracker, a mobile health (mHealth) app used weekly to assess, collect, and monitor patients’ asthma control and actual ICS use adherence []. Like most patient-reported data, these patient-reported variables have been collected on only a small proportion of patients with asthma. To date, 1380 patients with asthma have used the app and produced about 45,000 records of weekly asthma control scores and ICS use adherence data (eg, the ICS’ name and the number of days an ICS is actually used by the patient in that week). If we train a predictive model using EHR and patient-reported data limited to this small proportion of patients, the model will be inaccurate due to insufficient training data. Yet, for these patients, combining their patient-reported data with the outputs of a model built on all patients’ EHR data can help raise the prediction accuracy for them. To realize this, we propose the first method to combine external patient-reported data available on a small proportion of patients with the outputs of a model built on all patients’ EHR data to raise prediction accuracy for the small proportion of patients while maintaining prediction accuracy for the other patients.

To illustrate how our method works, we consider the case that the model created in step 1 is built using Intermountain Healthcare EHR data. The weekly asthma control scores and ICS use adherence data collected from the 1380 patients with asthma are unused in step 1. Now we add features (eg, mean, SD, and slope) computed on patient-reported asthma control and ICS use adherence data to raise prediction accuracy for these patients. Among all patients with asthma, only 1% have asthma control and ICS use adherence data. We use the method shown in to combine the asthma control and ICS use adherence data from this small proportion of patients with the outputs of a model trained on EHR, air quality, and weather data of all patients with asthma. We start from the original model built in step 1. This model is reasonably accurate, as it is trained using EHR, air quality, and weather data of all patients with asthma and all features excluding those computed on asthma control and ICS use adherence data. For each patient with asthma control and ICS use adherence data, we apply the model to the patient, obtain a prediction result, and use this result as a feature. We then combine this new feature with the features computed on asthma control and ICS use adherence data to train a second model for these patients using their data. The second model is built upon and thus tends to be more accurate than the original model for these patients. The original model is used for the other patients. Our method is general, works for all kinds of features, and is not limited to any specific disease, prediction target, cohort, or health care system. Whenever a small proportion of patients have extra predictive variables, we could use this method to raise prediction accuracy for these patients while maintaining prediction accuracy for the other patients.

For the patients with asthma control and ICS use adherence data, we compare the mean squared and the mean absolute prediction errors gained by the model built in step 1 and the second model built here. We expect adding asthma control and ICS use adherence data to the model to lower both prediction errors. The error drop rates help reveal the value of routinely collecting asthma control and ICS use adherence data in clinical care to lower prediction errors. Currently, such data are rarely collected.

Figure 1. Our method to raise prediction accuracy for the small proportion of patients with asthma and asthma control and ICS use adherence data while maintaining prediction accuracy for the other patients with asthma. EHR: electronic health record; ICS: inhaled corticosteroid.
DiscussionPrincipal Findings

Besides the variables mentioned in the “Step 1: Building a machine learning model to predict a patient’s ICS response defined by changes in AMR” section, environmental variables beyond air quality and weather and many other factors can impact patient outcomes. Moreover, there are almost infinite possible features. For any first future study that one will do along the direction pointed out in this paper, a realistic goal is to show that using our methods can build decent models and improve asthma care rather than to exhaust all possible useful variables and features and obtain the theoretically highest possible model performance. Not accounting for all possible factors limits the generalizability of these models to medication selection for other diseases.

We use the G-computation method to conduct causal inference. This method relies heavily on correctly specifying the predictive model for ICS response, including accurately identifying all relevant confounders and interactions and incorporating them into the model. Misspecification of the model can lead to biased estimated effects of various ICS choices on AMR. To address this issue, we can adopt several preventive strategies during model development. We engage with subject matter experts to ensure that the model includes all relevant variables and reflects the underlying process. To guide model development and help identify potential sources of bias, we construct a directed acyclic graph that lays out the relationships among the independent and dependent variables. We use machine learning techniques that provide flexible modeling approaches to capture complex relationships among variables. When reporting our findings, we keep transparent about the final model specification and the rationale behind our model building process. We believe using these strategies will mitigate the risk of model misspecification and strengthen the reliability of our estimated effects of various ICS choices on AMR.

AMR is reported to be a reliable reflection of asthma control and of asthma exacerbation risk []. In a future study that we plan to do along the direction pointed out in this paper, we can use Intermountain Healthcare data to validate this relationship. Specifically, we use multivariable linear regression to assess the relationship between the AMR computed on EHR data and the patient’s asthma control level obtained from the external patient-reported data, while controlling for other factors. We expect to see a strong and positive association between the AMR and the patient’s asthma control level.

When creating the model in step 1, we can include medication persistence measures computed on insurance claim data [], such as the proportion of days covered for ICS, as features. However, this does not obviate the need to examine patient-reported ICS use adherence data in step 3. ICS persistence measures give information on the possession of ICS, but not on actual use of ICS. Each ICS persistence measure is computed at a coarse time granularity as an average value over a long period. In comparison, our patient-reported ICS use adherence data offer information on the actual use of ICS. The data are at a fine time granularity, with 1 set of values per week for a patient. This enables us to compute features on various patterns and trends that can be useful for making predictions.

Conclusions

In asthma care, ICS choice is largely by trial and error and often made by a one-size-fits-all approach with many patients not achieving optimal outcomes. In this paper, we point out the need for creating a decision support tool to guide ICS selection and a gap in fulfilling this need. Then we outline an approach to close this gap via creating a machine learning model and applying causal inference to predict a patient’s ICS response in the next year based on the patient’s characteristics. This supplies a roadmap for future research.

FLN and GL are co-senior authors mainly responsible for the paper. They conceptualized the presentation approach, performed literature review, and wrote the paper. BLS provided feedback on various medical issues, contributed to conceptualizing the presentation, and revised the paper. YZ wrote the causal inference section. All authors read and approved the final paper.

GL is an editorial board member of JMIR AI. The other authors declare no conflicts of interest.

Edited by A Benis; submitted 24.01.24; peer-reviewed by H Tibble, A Kaplan; comments to author 01.03.24; revised version received 12.03.24; accepted 25.03.24; published 17.04.24.

©Flory L Nkoy, Bryan L Stone, Yue Zhang, Gang Luo. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 17.04.2024.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.

留言 (0)

沒有登入
gif