Self-reported health status and latent health dynamics

Dynamic structural models of individuals often concern decisions for which health status is an important state variable. For example, DeNardi et al. (2010) examine how the strong serial correlation of medical expenses motivates the saving behavior of the elderly, and Aizawa (2019) models how workers sort between jobs depending on whether they offer employer-sponsored insurance. Papers in this vein often model health as a discrete state and use categorical self-reported health status (SRHS) as their sole empirical measure of health.1 Despite its simplicity, SRHS has been shown to be a good predictor of mortality and medical expenses (Idler and Benyamini, 1997) as well as labor supply decisions (Bound, 1991), and is strongly correlated with both clinical measures of health (LaRue et al., 1979) and itemized self-reports of more specific aspects of health (Blundell et al., 2017).

In this paper, I argue that the literature has often erred in its interpretation of SRHS and the treatment of its dynamics. The traditional approach treats the coarse categorization as an accurate representation of “true health”– it takes SRHS literally. Modelers often assume that the discrete health state follows a Markov(1) process, with the distribution of subsequent states determined only by current SRHS, demographic variables, and (potentially) health inputs. They estimate this process using one-wave-ahead transitions of SRHS from panel data, often by simple frequency counts or reduced form methods.

However, as observed by DeNardi et al. (2018) (inter alia), SRHS is not actually Markovian in the data. Consequently, the predicted distribution of health states generated by such “simple dynamics” swiftly deviates from the empirical distribution more than one wave ahead. To the extent that model agents’ optimal behavior depends on their beliefs about their future health prospects (e.g. via the need to maintain a buffer of assets against catastrophic medical expenses), counterfactual simulation of these models is unreliable if the long run dynamics of individual health are incorrectly specified.

Rather than treating categorical SRHS as an accurate measure of true health, I provide strong evidence that SRHS is better interpreted as a noisy signal of a continuous latent health state, as in Bound et al. (2010); further, latent health itself has Markovian dynamics. When someone is asked for her SRHS, their latent health is projected onto a discrete space of possible replies, subject to reporting error. That is, there is some partition of the space of “true health” and an individual’s SRHS represents a noisy report of which subset their health falls in. Under this interpretation, a change in an individual’s self-report of categorical health from one wave of a survey to the next represents a combination of transitory reporting error (e.g. the respondent’s mood) and a true change in her health.

Section 2 presents a fairly parsimonious model of this data-generating process for SRHS. When estimated by maximum likelihood on long panel data, the model is able to reproduce conditional distributions of SRHS up to twenty years ahead while still matching short run, one-wave transitions and mortality conditional on demographics. Health is much more persistent in the estimated latent health model than in the usual approach that does not account for reporting error in SRHS. The estimated model reveals a significant degree of heteroskedasticity in reporting errors across individuals– a small fraction of survey respondents account for a disproportionate share of SRHS transitions, due to inaccurate self-reports rather than changes in their true health status.

In Section 4, I present estimation results for a dataset that combines two long panel surveys, the Panel Study of Income Dynamics and the Health & Retirement Study, but similar results are achieved from each dataset independently, or when estimated on short panel data from the Medical Expenditure Panel Study. In addition to matching the future distribution of SRHS (conditional on its state in the baseline period), the estimated model reproduces the extent of duration dependence in empirical SRHS transition probabilities, as well as the distribution of the frequency of reports of bad health over an extended period.

In recent work, DeNardi et al. (2018) propose modeling binary SRHS dynamics as Markov(2), directly accounting for duration dependence by making transition probabilities dependent on the previous period’s SRHS along with its current value. They show that their model is able to reproduce the empirical dynamics of binary SRHS, including the extent of duration dependence and the distribution of the frequency of unhealthy periods across the population. In Section 4.3, I show that the latent health model reproduces the same features of SRHS dynamics as targeted in DeNardi et al. (2018). Further, the Markov(2) binary health model predicts that, conditional on the current state, lagged SRHS should not be predictive of health-motivated outcomes like medical spending, retirement, and mortality. However, I show that this is not the case in panel data, and that the latent health model fairly accurately reproduces these empirical patterns. That is, by recognizing that SRHS is a noisy measure of an unobserved state that usually moves sluggishly, both current and lagged SRHS are informative signals of true health, and thus predictive of health-driven outcomes. The present model explains the non-Markovian dynamics of SRHS, rather than assuming them as a feature of health itself.

The estimation method for the latent health model exploits long sequences of SRHS observations for each respondent, rather than just one-wave-ahead transitions. The key goal of the estimation is to separate transitory reporting shocks from the underlying dynamics of latent health by accounting for the econometrician’s uncertainty about a respondent’s latent health, tracking its nonparametric distribution. That is, each observation of SRHS improves the econometrician’s confidence about the true value of latent health, while the passage of time from one survey wave to the next introduces unobserved shocks to the latent health stock, expanding uncertainty. Likewise, observing the respondent still alive in a subsequent period provides a small bit of information about her true health state, as she was more likely to have survived if she had been healthier in the first place.

In contrast to the latent health model, models that treat SRHS as the true health state significantly underestimate the stability of health over the course of an individual’s life. By interpreting a change in SRHS as a true change in health and considering only one-wave-ahead transitions, the longer run dynamics of health are not accurately reproduced in such models. Consider two respondents with identical observed data, both of whom reported “good” health in the most recent wave of a panel study; they differ only in that one reported “fair” health in the prior wave, while the other reported “very good” health. A model that takes SRHS literally would predict that the two respondents have the same likelihood of reporting “very good” health in the next wave, contrary to empirical evidence. Alternatively, a model that incorporates exogenously heterogeneous health dynamics across individuals (e.g. through genetic variation) but treats SRHS as “true health” might predict different probabilities over subsequent SRHS for the two respondents, but based on a potentially spurious inference about differences their health process.

The modeling choice to use reporting error to account for short run volatility in SRHS, rather than transitory shocks to true health,2 is motived by Crossley and Kennedy (2002), who find that 28% of respondents change their SRHS if asked twice in the same interview. The extent of churn in SRHS before and after taking a relatively short survey about specific health conditions suggests that reporting error is present in SRHS responses. Likewise, the Medical Expenditure Panel Study (MEPS) mails a short survey to respondents after waves 2 and 4 of their panel, the Self-Administered Questionnaire (SAQ), which includes a nearly identical SRHS question. Empirical transition probabilities in the two week span between the MEPS interview and the SAQ are similar to those of a six month gap between interview waves, suggesting that the reported changes do not reflect the dynamics of health itself. To provide support for the modeling assumption, I show in Section 4.3 that the respondents who the latent health model judges to be least accurate in reporting their SRHS also exhibit the lowest correlation between SRHS and an objective measure of health (and vice versa).

The model presented here follows very recent work by Hosseini et al. (2021a), who construct a “frailty index” by summing indicators of adverse health– a composite measure based on objective reports. They compare the dynamics of the frailty index with those of SRHS, finding that health is more persistent than indicated by SRHS changes, analogous to the implications of the latent health model. They estimate the dynamics of the frailty index using three panel surveys with fairly rich data on health conditions and medical history, generating a more refined and precise measure of each respondent’s health at a point in time. The latent health model is estimated on data from the same three surveys, but uses only SRHS to gauge a respondent’s true health. It can thus be applied to panel datasets with only cursory health information but still extract a more refined view of health dynamics.

To make it easy for other researchers to incorporate the latent health model into their own work, this project includes a lightweight software archive that can generate two kinds of data files. First, it produces a “filtration dataset” that can be merged into panel data to add the model’s prediction of each respondent’s latent health (in distribution) and probabilities over their reporting type; the filtration conditions on the respondent’s sex, age, and sequence of recently reported SRHS. Second, it can produce a discretized latent health process (to the user’s specifications) based on the estimated parameters reported here and in the Online Appendix, or a custom parameter set. The latent health transition matrices, survival probabilities, and SRHS reporting probabilities it calculates can be easily imported into a structural model, so that simulated outcomes reproduce the complex empirical patterns of SRHS (so that the moments targeted in the estimation can be conditioned on a data-analogous feature) while better capturing underlying health dynamics. In principle, such a model would be no more difficult to solve than an analogous version using SRHS as the health state: it merely swaps one discrete state for another, and the difference between latent health versus SRHS is accounted for by a purely transitory shock.

The paper proceeds as follows. The remainder of this section discusses how health dynamics have been modeled in the structural literature, with a focus on models that treat SRHS as true health, then briefly shows how commonly used assumptions deviate from empirical evidence. Section 2 introduces my latent health model and describes how its parameters are identified through maximum likelihood estimation. Section 3 describes the two panel datasets that are used to estimate the main specification of the model, while Section 4 presents results of the estimation, including a discussion of the model’s fit to short- and long-run features of the data. I discuss how the latent health model can be applied in empirical or structural research in Section 5, and Section 6 concludes.

Almost universally in the dynamic structural literature, representations of individual health have at most one continuous dimension3; more typically, health is represented as a discrete (often binary) state, usually taken directly from SRHS data. The question eliciting SRHS is straightforward and general, leading to very low rates of refusal or other non-response. Most commonly, the top three categories (excellent, very good, and good) are combined into a “healthy” state and the bottom two categories (fair and poor) into an “unhealthy” state.

In many papers, the probability of transitioning between the two health states is calculated by simple fractions of the data (usually conditional on age, sex, education, and other observables) or estimated in reduced form as a logit or probit on observable characteristics. Whether two or five discrete states are used, the health state is assumed to follow a Markov(1) process. In models in which health dynamics are exogenous to agent choices (i.e. there are no endogenous health inputs), these probabilities are often calibrated outside of the main estimation by reduced form methods, based only on one-wave-ahead transitions.4

This practice dates to at least (Rust and Phelan, 1997), who model the health state of a living individual as binary, with bad health indicated by answering affirmatively to either of two questions about disability and daily activity. Likewise, Low and Pistaferri (2015) model transitions among three levels of work limitations (none, moderate, and severe), using data from the Panel Study of Income Dynamics. Both papers estimate the dynamics of disability level directly from one-wave-ahead transitions in panel data.

Three papers by a well known research team all employ the same approach to modeling health dynamics. Contemporaneous work in French and Jones (2011) and DeNardi et al. (2010) estimate one-wave-ahead transition probabilities between binary health states based on SRHS. In later research, DeNardi et al. (2016) add a third discrete health state for individuals in a nursing home; the transition probabilities are estimated as a multinomial logit. Similarly, Blau and Gilleskie (2006) utilizes binary health and estimates its age-conditional transitions based on logit-style probabilities. No fit of longer run transitions is presented in any of these papers.

In the past decade, several dynamic structural papers have focused on health insurance reform, spurred by the passage of the Affordable Care Act in 2010. Pashchenko and Porapakkarm (2013) model a discrete health state that is fully coincident with medical expenses by sorting observations into five medical expense “bins” for each age, with the lower three bins corresponding to “good” health. They calibrate transition probabilities among the bins using simple frequency counts on observed one-wave-ahead transitions. Ferreira and Gomes (2017) model individual health as binary using SRHS, with Markov(1) dynamics estimated using a logit specification on one-wave-ahead transitions.

Aizawa (2019) specifies health as binary, partitioning SRHS categories to separate the “healthy” from the “unhealthy’ in the MEPS and the Survey of Income and Program Participation (SIPP). Aizawa and Fang (2020) extend the standard approach and model two-dimensional binary health, with one component observed by the econometrician and the other unobserved (representing permanent heterogeneity). The observed component of the health state follows a Markov process; as usual, only one-wave-ahead SRHS transitions are used to estimate the dynamics of health.

Other models incorporate agents’ health behaviors (e.g. smoking, drinking, and exercise) and medical care into health dynamics and/or model permanent unobserved heterogeneity among respondents, so that transition probabilities among health states cannot be calculated separately from the structural estimation. However, most papers in this vein still take SRHS as a literal measure of health and treat its dynamics as representing changes in true health. Thus, while more accurately capturing short run SRHS dynamics by conditioning on more factors, such models might incorrectly estimate causal effects by ignoring reporting errors.

In this vein, Blau and Gilleskie (2008) specify the health state as binary and based on SRHS, estimating its dynamics using data from the HRS while accounting for permanent unobserved heterogeneity in health dynamics and the role of medical inputs (physician visits and hospital stays). Khwaja (2010) presents another dynamic discrete choice model estimated by maximum likelihood on HRS data, estimating transitions among the five SRHS categories as a multinomial logit; the transition probabilities depend on the agent’s extent of smoking, drinking, exercise, and medical utilization. In each paper, SRHS is treated as a representation of true health, and there is no reference to the model’s ability to fit transitions more than one wave ahead.

There are a few counterexamples in the structural literature to the typical specification of health as a univariate discrete state. Both Yang et al. (2009) and Darden et al. (2018) model multiple chronic health conditions, so that an individual’s health state is represented as a vector of binary values. Jung and Tran (2016) model health as a continuous variable with stochastic depreciation, using use the SF-12v2 physical health index in the MEPS as their empirical measure. Using data from the HRS, White (2018) constructs a continuous measure of health by estimating an ordered probit of SRHS on specific health outcomes. In current work, the estimated dynamics of the frailty index are used as a model input by the same authors in Hosseini et al. (2021b). By using multiple objective measures to represent the health state, the health dynamics estimated in these papers are much less susceptible to bias from omitted measurement error.

I am not the first to propose latent health as underlying the data-generating process for SRHS. Indeed, Bound et al. (2010) estimate the dynamics of latent health as expressed through noisy self-reported health, based on econometric considerations discussed in Bound (1991). However, the work presented here is the first to estimate the dynamics of latent health while explicitly handling the econometrician’s uncertainty about an individual’s true health, to account for heteroskedasticity of reporting errors, and to present comprehensive evidence for the model’s ability to fit long-run features of the data.

Latent health dynamics have been examined in other studies. In a working paper, Lange and McKee (2012) employ a two-step method, first recovering the distribution of latent health from self-reports of health outcomes as well as clinical measures, then estimating the dynamics of health using the simulated method of moments (SMM). The second stage estimation uses only one-wave-ahead distributional features as moments to fit, and thus does not account for longer run transitions. Halliday (2011) estimates a dynamic latent health model using SMM to target the distribution of SRHS sequences occurring within several age ranges. While more fully utilizing the panel structure of the data, the estimation only seeks to fit the frequencies of the three most commonly observed sequences, without presenting evidence about other sequences or long run patterns.

Other work on SRHS and latent health focuses on particular features of the data or is not concerned with the dynamic process. Contoyannis et al. (2004) address how socioeconomic status affects SRHS transitions, and investigate the extent to which selective attrition might bias estimates of transition probabilities. A working paper by van Ooijen et al. (2015) augments Dutch panel survey data with administrative health records to construct a univariate health index; the authors focus on one-wave transitions (annual frequency). Poterba et al. (2017) construct a health index using principal component analysis on HRS data, then investigate how this measure affects the stock of wealth as respondents age; they do not estimate dynamics of the health index. Recent work by Blundell et al. (2017) explores whether using several objective measures of health, and potentially a two-factor representation of latent health, provides significant benefit to predicting the timing of retirement, relative to only using SRHS. The paper focuses on comparing methodologies for estimating labor supply models, rather than constructing a dynamic process for latent health.

Despite it being commonly assumed in models, the empirical dynamics of SRHS do not actually follow a simple process. A truly Markovian variable would show no dependence on lagged observations of itself more than one period in the past, but this is not the case with SRHS. Moreover, the one-wave transition probabilities of a Markov(1) process should be able to be applied sequentially to accurately predict the distribution of a discrete variable several waves ahead, conditional on its value in the present; SRHS also fails this test.

As a simple illustrative example, consider women aged 40 to 45 years old in the Medical Expenditure Panel Survey (panels 1–20), who have been interviewed five times over a two year span (every six months). Conditional on reporting being unhealthy (fair or poor health) in the fourth wave, 61.2% of women also report being unhealthy in the fifth and final wave. For women who were unhealthy in both the third and fourth waves, 75.9% are also unhealthy in the fifth wave; this increases to 80.8% among women who have been unhealthy since the second wave, and 84.1% for those who were unhealthy in all of the first four waves. SRHS thus exhibits duration dependence, a pattern inconsistent with a Markov(1) process.5

Panel data also reveal that one-wave-ahead transition probabilities between healthy and unhealthy SRHS categories do not “chain up” correctly: the distribution of SRHS two (or more) waves ahead does not agree with the prediction of compounding the one-wave-ahead transition probabilities. Consider respondents aged 23 to 110 in the Panel Study of Income Dynamics (PSID) and Health & Retirement Study (HRS). I compute simple transition probabilities among the five SRHS categories from the observed wave-to-wave transitions in the PSID & HRS, conditional on age and sex, with a two-year wide smoothing kernel for age.

Fig. 1 plots the empirical distribution of SRHS (conditional on survival) one to six waves after reporting “very good” health in the baseline period versus the distribution implied by sequentially applying the age-appropriate simple probabilities, accounting for mortality risk. By definition, the one-wave-ahead simple distribution exactly matches the data (Fig. 1(a)). For longer run transitions, however, the simple model predicts too few people reporting “very good” health and too many in “fair” or “good” health; the problem is exacerbated with each successive wave.6 In general, the empirical distribution of SRHS indicates significantly more long run persistence of health than suggested by one-wave-ahead transitions– knowing someone reported “very good” health ten years ago is almost as informative as observing the same report two years ago.

These patterns can be explained by a model in which SRHS is a noisy measure of a latent health state. Under this interpretation, duration dependence in SRHS arises because many reported transitions among categorical health states are spurious, occurring only due to transitory reporting error rather than genuine changes to health.7 Likewise, transition probabilities among categorical health states being nearly identical when observations are taken two versus ten years apart is consistent with true health moving sluggishly (i.e. high serial correlation), with transitory reporting shocks driving most changes in SRHS. This process is analogous to the dynamics of household income, which are often modeled as being subject to both permanent and transitory shocks.

留言 (0)

沒有登入
gif