Discrete Choice Modeling for the Quantification of Health States: The Case of the EQ‐5D

Introduction

Composite measures of health outcomes such as “quality-adjusted life-years” (QALYs) require weights or values attached to different health states that reflect the levels of health associated with these states. The standard gamble (SG) and time trade-off (TTO), which have emerged from health economics research, are frequently used to assign values to health states 1. Psychology has contributed another technique, the visual analog scale (VAS) 2. Unfortunately, there are theoretical and empirical drawbacks to all of these techniques 3. Responses to the SG and TTO are likely to be influenced by factors extraneous to judgments about health levels, such as risk aversion or time preference. Moreover, empirical violations of the normative axioms supporting the use of these techniques have been noted. Regarding VAS, critics question its interval properties and point to its lack of a relation to economic theory. In the literature on health state valuation, arguments are raised for and against different techniques, but this debate has not led to consensus 4. Therefore, but also in light of the diverging empirical results, continued work on improving the methods is warranted.

Probabilistic discrete choice (DC) modeling offers an alternative approach for exploring people's values, although this approach is also not without problems and criticism 5-7. Such DC models can be used to analyze data obtained through approaches involving choices, ranks, or matches between alternatives, as defined by attributes and levels 8. The DC models were initially developed for the analysis of real-world data, but researchers became quickly aware of their potential for analysis of stated preference data allowing for exploration of a broader range of preference-driven behaviors than possible on basis of real-world data 9. This strategy was first developed in transport economics and marketing. There, instead of modeling people's actual choices (revealed preferences), Louviere et al. modeled the choices made by subjects in carefully constructed experimental studies based on stated preferences: discrete choice experiments (DCEs) 9. The term DCE refers to an experiment that is constructed to collect stated preference data that are consistent with the requirements for DC modeling. Recognizing that the DCE framework offers a conceptual basis for the evaluation of the benefits of health programs, the technique is now being used to extend economic evaluations in health care with information about the value of nonhealth outcomes such as waiting time, location of treatment, and type of care 10-12. More recently, DCEs and accompanying DC models have also been considered for health state valuation 13-17.

DC modeling has good prospects for health state valuation. The statistical literature classifies it among the probabilistic choice models that are grounded in modern measurement theory and consistent with economic theory (i.e., the random utility model). All DC models have in common that they can establish the relative merit of one phenomenon with respect to others. If the phenomena are characterized by specific attributes with certain levels, extended probabilistic choice models would permit estimating the relative importance of the attributes and their associated levels, and even estimating overall values for different combinations of attribute levels. A promising feature of DC models is that the derived values only relate to the attractiveness of a health state; they are not expressed in trade-offs between improved health and something else, as in TTO and SG. Bias as a result of these extraneous factors may therefore be prevented. Moreover, DC models have a practical advantage: when conducting DCEs, health states may be evaluated in a self-completion format. The scope for valuation research is thereby widened as compared to existing TTO protocols for deriving values for health state measurement instruments such as EQ-5D.

But DC models are not without problems when used for health state valuation. The analytical procedure on which analysis of DCE data is based assumes that the difference in values between choice options (e.g., two health states) can be inferred from the proportion of respondents that chose one option over the other. This implies that the relative position of all health states on the latent scale would lie between the “best” and the “worst” health states. For the estimation of QALYs, however, those values need to be scaled on the full health–dead scale. If DC modeling is used to value health, a way must be found to link the derived values under this model to the scale required to calculate QALYs. Yet, there is no consensus on what is the best way to handle the arbitrarily scaled DC values obtained, so it remains uncertain just how valid and informative DC-based values are.

A strategy for rescaling DC values may be to rescale by anchoring them on values obtained for the best and worst health state using other valuation techniques, such as TTO or SG. Nevertheless, the rationale for this approach is unclear, when part of the motivation to explore the DC model as a potential candidate to produce health state values comes from the limitations of existing valuation methods. Alternatively, the DCE may be designed in such a way that the derived health state values can be related to the value of the state “dead.” A simple manner to achieve this seems to be by DCE designs in which respondents are presented one bad health state at a time and asked if they consider it better or worse than being dead. The value difference between these bad health states and being dead would then be estimated from the observed probabilities between the bad health states and being dead. Nevertheless, Flynn et al. 18 have asserted that the precision of the final estimates for the health states, in particular the region around “dead,” may be largely based on the presence of respondents who consider none of the presented health states to be worse than dead. A problem is that the DC model will not accurately capture the error distribution and therefore produces biased estimates. Furthermore, under random utility theory, responses of those who consider all life worth living are perceived to reflect an infinite value difference between health states and dead. This is not necessarily an accurate representation of their preferences, and causes an estimation problem. The values derived from the DC model will then depend on the proportion of respondents who exhibit this preference.

These problems in estimating DC models are less likely to arise in studies comparing health states to each other rather than to being dead. By mixing these two designs, the ability to relate the health state values to being dead may be maintained, while limiting (not omitting) the effect of the aforementioned biases. The procedure has been demonstrated by McCabe et al. 16 and Salomon 5. These authors mixed the state “dead” in the choice set as a health state, so that a parameter for the state “dead” is estimated as part of the model.

Because none of the various methods to anchor DC-derived values on the full health–dead scale required for QALY computation is without problems, it is hard to say which strategy should be used. Experimentation with the various anchoring strategies is therefore required and convergence with alternative methods for health state valuation needs to be explored, to give advice on this manner and to see if any of the proposed strategies is capable of producing health state values that may be accepted by the research community.

This article considers the application of DC modeling for deriving health state values. Research on novel, enhanced, and feasible measurement tools is conducted by the EuroQol group to support improvement of the group's health status measurement instrument, the EQ-5D. This work is motivated by the perceived limitations of the traditional valuation techniques and by the prospects of DC models for health state valuation. We analyzed congruence across methods (DC, rank, VAS, and TTO) and across samples with the aim of determining whether DC modeling produces value estimates that are comparable to traditional methods. The main focus of the study was to compare DC values to values elicited with the standard TTO technique.

Methods EQ-5D States

The EuroQol EQ-5D is a generic measurement instrument to describe and value health states 19. The EQ-5D classification describes health states according to five attributes: mobility, self-care, usual activities, pain/discomfort, and anxiety/depression. Each attribute has three levels: “no problems,”“some problems,” and “severe problems.” Health state descriptions are constructed by taking one level for each attribute, thus defining 243 (35) distinct health states, where “11111” represents the best and “33333” the worst state. An EQ-5D health state may be converted to a single summary index by applying a formula that essentially attaches weights to each of the levels in each dimension. This formula reflects the values of EQ-5D health states as obtained from respondents in a sample of interest. Usually, this is a representative sample of the general population, but in the current study, both a student sample and a general population sample was used.

Not all EQ-5D states were included in the experiment. We constructed a DCE of 60 pairs of EQ-5D states, following the methodology described below. For the three other judgmental tasks in our study protocol, a set of 17 EQ-5D health states was selected. The set comprised five very mild, four mild, four moderate, three severe states, and state “33333.” The 17 states are: 11112, 11113, 11121, 11131, 11133, 11211, 11312, 12111, 13311, 21111, 22222, 23232, 32211, 32223, 32313, 33323, and 33333. The same 17 states were used in the Dutch EQ-5D TTO valuation study 20.

Respondents

For practical reasons, this study included a general population sample (target N = 400) and a student sample (target N = 200). The comparisons across valuation methods and of strategies for anchoring values obtained using DC models relative to dead and full health were done on basis of student data. DC responses were also collected from the general population in order to draw tentative conclusions about the possibility to extrapolate results from the student sample to the general population.

Students were recruited at Erasmus University in Rotterdam, The Netherlands. Each student was offered €20 for participating. The general population sample consisted of members of an Internet panel. This panel included approximately 104,000 people. Stratified sampling was used to select a research sample from the panel that was representative for the Dutch general population in terms of age, sex, and education. The stratified sampling procedure was performed in three rounds, so the final round allowed for over- or undersampling of specific groups if the desired distribution over the strata had not been attained yet. The incentive offered to the panel members consisted of a €2.50 donation to a charity chosen by the respondent and a chance to win gift certificates or other prizes in a lottery.

People in the general population sample were only administered the DCE. The students completed (in this order) the DCE, ranking, VAS, and TTO task in the presence of one of the researchers or a research assistant. To become familiar with the type of health state descriptions, all respondents were administered the EQ-5D prior to the judgmental tasks.

Judgmental Tasks

DCE. In the DCE, all respondents were presented with a forced choice between two EQ-5D states. After this paired comparison task, the students were prompted to answer a second question related to each of the two health states separately. This extra question offered “dead” as a choice, phrased as, “Would you rather be dead than living in this health state?” In the remainder of the article, we will refer to the two outcomes as DCE data and DCEdead data, respectively.

The DCE was programmed as a computer experiment. The respondents logged in to a Web site where they were presented with a number of choices between two EQ-5D states that were randomly selected from the choice set. Our general population sample received nine DCs; students received 18 DCs, and thus compared 36 states to being dead. It was a pragmatic decision to opt for random selection of choices for an individual, rather than using a blocked design, based on the fact that level balance also was no criterion for design construction, and confidence that systematic effects would be filtered out given the large number of questions and the large sample size.

Ranking, VAS, and TTO. The ranking, VAS, and TTO tasks were performed as described in Lamers et al. 20. The valuation procedure may be summarized as follows. First, students rank-ordered the 17 EQ-5D states selected for these tasks, supplemented with “dead” and state “11111,” by putting the card with the “best” health state on top and the “worst” one at the bottom. Next, students valued the rank-ordered health states on the EuroQol VAS using a bisection method that specified the order in which various states needed to be valued. The TTO valuation task followed the VAS valuation. TTO was executed using a computer-assisted personal interviewing method that followed standard TTO protocols based on the original UK study protocol 21. This implies that the health states were presented in random order, that the TTO task was facilitated by a visual aid, and that the respondents were led by a process of outward titration to select a length of time t in state “11111” (perfect health) that they regarded as equivalent to 10 years in the target state (for states better than dead) or to select a length of time (10-t) in the target state followed by t years in state “11111” (for states worse than dead).

Experimental Design of the DCE

The DCE design was constructed using a Bayesian efficient approach, which to our knowledge has not been applied in health economics before. Most DCEs in health economics have applied orthogonal designs. These allow the uncorrelated estimation of main effects, assuming that all interactions are negligible. A limitation of orthogonal designs is that orthogonality is compromised if, for the purpose of data analysis, categorical multilevel variables need to be transformed into a set of dummy variables. Moreover, in optimal orthogonal designs, the efficiency of the design is optimized for the situation that choices are made randomly. This is true under the restrictive assumption that the estimates of the parameters in the utility model are equal to zero (β = 0). This implies that two choice options within a pair have a 50% probability of being preferred, irrespective of their attribute levels. If β = 0 does not hold, the design will not be optimally efficient for producing information in regard to the true parameter effects 22, 23. Both issues with orthogonal designs apply to EQ-5D valuation, so we decided to look elsewhere.

To construct a Bayesian efficient design, a computer algorithm was used (see Appendix at: http://www.ispor.org/Publications/value/ViHsupplementary/ViH13i8_Stolk.asp) that was obtained at that time from Rose and Bliemer, and described in 24, but which is publicly available now in the software package nGENE. The algorithm entailed an iterative procedure whereby a great many designs, each with the desired number of choice situations, were randomly selected from the full factorial design and compared by their D-error, which was computed on the basis of expected values of the model parameters. In the Bayesian framework, these expected values are known as priors. Because the priors were not perfectly known, they were included as distributions from which they were sampled rather than as point estimates in the design algorithm. This way, when priors deviate from their expected values, the impact on the efficiency of the design is minimized. To that end, the Bayesian efficient design algorithm uses nested Monte Carlo simulation. The best design remaining after 2000 iterations, each containing 1000 draws for the priors, was selected for this study. The probability that this design is the optimal one is small because a more efficient design is likely to exist. Even if not optimal, the design will still be efficient, given the large number of iterations in the Monte Carlo simulation.

The DC model we intended to estimate included main effect terms for the five categorical three-level EQ-5D domains (transformed into a set of 10 dummies) and the so-called N3 term. This is a nonmultiplicative interaction term that is frequently used in EuroQol valuation models. It allows for measuring the “extra” disutility when reporting severe (level 3) problems on at least one EQ domain 19. In addition, it was considered that the model would need to include an alternative specific constant as recommended in the literature 25 to control for unobserved systematic effects on choices, such as a tendency to always choose the same option. Accordingly, based on degrees of freedom, a minimum number of 12 pairs are required to estimate all model parameters. It was decided to increase this number to 60 pairs to allow for extension of the model with interaction terms, if relevant.

The priors for the main effects were obtained by taking the weighted average of the parameter estimates from three TTO-based EQ-5D studies 20, 21, 26. We used a standard error of 20% surrounding these priors to account for the possibility that parameter estimates modeled on the basis of DCE data might be different from those elicited with TTO. The prior parameter estimates of the interactions were set to 0 (Table 1).

Table 1. Model parameters for the Bayesian efficient design Main effects* Priors for main effects Interactions (priors = 0) MO2 −0.108 MO2*SC2 SC2*UA2 UA2*PD2 PD2*AD2 MO3 −0.434 MO2*SC3 SC2*UA3 UA2*PD3 PD2*AD3 SC2 −0.140 MO2*UA2 SC2*PD2 UA2*AD2 PD3*AD2 SC3 −0.346 MO2*UA3 SC2*PD3 UA2*AD3 PD3*AD3 UA2 −0.090 MO2*PD2 SC2*AD2 UA3*PD2 UA3 −0.240 MO2*PD3 SC2*AD3 UA3*PD3 PD2 −0.147 MO2*AD2 SC3*UA2 UA3*AD2 PD3 −0.463 MO2*AD3 SC3*UA3 UA3*AD3 AD2 −0.119 MO3*SC2 SC3*PD2 AD3 −0.354 MO3*SC3 SC3*PD3 MO3*UA2 SC3*AD2 MO3*UA3 SC3*AD3 MO3*PD2 MO3*PD3 MO3*AD2 MO3*AD3 * The abbreviations MO2 to AD3 represent the five categorical three-level EQ-5D domains transformed into a set of 10 dummies. The first level (no problems) was used as reference category.

The algorithm produced a design of 60 pair-wise comparisons of two EQ-5D states. To further improve the design, we identified and altered dominant choices in which logical consistency predicts that one alternative will always be preferred. Nine dominant choices were identified. In five pairs, the worst state was improved to escape from dominance; in the other four, the best state was made worse. The alterations were made randomly, but in accordance with the following rules: 1) the D-efficiency of the design was improved with the alterations; and 2) the new health state was not included yet in the choice set. This strategy resulted in a choice set of 60 pairs including 106 unique health states (94 states were included once, 10 twice, and 2 were included three times).The final set of 60 states is presented in Table 2. The D-error of this design was 1.11.

Table 2. Final set of 60 pairs of EQ-5D health states for the discrete choice experiment (asterisk marking the nine states that were manually altered) Choice Option 1 Option 2 Choice Option 1 Option 2 1 21231 22323 31 13211 21233 2 23223 31113 32 33311 22133 3 11112 12221 33 32112 23312 4 33322 23312 34 21112 22111 5 22331 23233 35 32211 13333 6 32133 22312 36 13131 13113 7 33123* 22233* 37 22313 23231 8 23212 32121 38 31313 32231 9 32322 33131 39 12123 33321 10 11231 32111* 40 22311 32123 11 33222 11312 41 11133 21123 12 13122 21212 42 31311 21313 13 22221 13212 43 21212 32213 14 22312 11212 44 11121 22112* 15 22132 12321 45 13313 31221 16 12332 31333 46 21321* 12111 17 22333 33332 47 33323 23122 18 31222 12112 48 11223 32321 19 31131 13111 49 23313 32222 20 12233 13132 50 31323 22321 21 31131 12121 51 33113* 32332 22 33131 21323 52 22131 21212 23 33122 31132 53 23222 31113 24 11133 32211* 54 12222 33121 25 12231 21121 55 31132 21333 26 12312 13131 56 12213 31232 27 21111* 11311 57 23312 13123 28 11223 12313* 58 21211 32313 29 13231 31231 59 31133 21331 30 31123 12212 60 13321 13231 Analysis

Observed values derived from rank, VAS, and TTO responses for 17 states. The rank data were analyzed using the “law of comparative judgment” (LCJ) model, as introduced by Thurstone 27, 28. To model the rankings within the Thurstonian framework, the rankings are transformed (“exploded”) into paired comparisons. The analytical procedure assumes that the difference in value between two health states can be inferred from the proportion (i.e., probabilities) of respondents who preferred one health state to another. The resulting matrix of probabilities is subsequently transformed into Z values (i.e., normal distribution). The LCJ values are obtained by taking the mean of all the columns of the Z matrix, as described by Krabbe 28.

Mean VAS and TTO values were obtained with approaches commonly used in EQ-5D valuation studies (described, e.g., in 20, 21). Observed VAS values were obtained on a scale with the end points “best imaginable health” (= 100) and “worst imaginable health” (= 0). To use these values in health state valuation, they need to be rescaled such that state “11111” has a value of 1 and being dead has a value of 0. Rescaling was performed at the respondent level on the basis of the observed VAS scores for the various health states, and the scores that were recorded for “dead” and “perfect health,” using the following equation 19:

image

The same procedure that was applied in the Dutch valuation study 20 was used for estimating values from TTO responses. For states regarded as better than dead, the TTO value is t/10; for states worse than dead, values are computed as −t/(10 – t). These negative health states were subsequently bounded at minus 1 with the commonly used transformation v′ = v/(1 – v). Linear regression analysis was used to interpolate values for all EQ-5D states from the values for the 17 states that were observed.

Estimated value prediction models and rescaling methods. For the TTO task, the predicted values for all 243 EQ-5D states were derived after interpolation from the values for the 17 states that were included in the TTO task. The TTO model included an intercept, interpreted as any deviation from full health, as well as dummy variables for the 10 main effects and for the N3 parameter.

We modeled and rescaled DCE-derived values in two different ways. The applied DC models were a conditional logit model (estimated only on the DCE data, Stata: clogit) and a rank-ordered logit model (estimated on DCE and DCEdead data, Stata: rologit), as explained below.

Neither the TTO nor the DC model adjusts for the fact that there are several observations per respondent.

First, we used the conditional logit model to analyze the DCE data obtained from the 60 pair-wise comparisons of EQ-5D states. The model included dummy variables for the 10 main effects and the N3 parameter. The values derived from this model are on an undefined scale. To link the DCE-derived health state values to the QALY scale, we used TTO values for the worst health state (33333) and the best health state (11111) as anchor points for rescaling. For the general population, we used TTO values obtained from the Dutch EQ-5D valuation study (i.e., −0.329 20). For the student sample, we used the empirical TTO values derived in this study. We will refer to the resulting values as the DC values.

Alternatively, we derived health state values from the DCE data on the QALY scale by anchoring the values on the value for being dead (thus: 0). For this purpose, we modeled the information obtained from both the DC and DCdead data. The data of these two response tasks were combined to infer how the respondent would have rank ordered the two EQ-5D states and “dead” from most to least preferred. These rank orderings were analyzed using a rank-ordered logit model. Besides the dummy variables for the 10 main effects and the N3 parameter, this model also includes a parameter for the state of being dead, which can be used to rescale the values and put them on the full health–dead (1–0) scale, as demonstrated by McCabe et al. 16. The value for being dead is anchored at zero by dividing all coefficients by the coefficient for “dead.” By additionally restricting the value of full health to 1, values are produced in the 0 to 1 range for states better than dead, and negative values for states worse than dead. We will refer to the resulting value set as DCdead.

Across-method and across-sample comparison. Intraclass correlation coefficients (mixed model, average measures) and mean absolute differences were computed to estimate the degree of correspondence between different methods. The intraclass correlation coefficients were also used to compare the DC derived values of students and the general population. Except for the DC model (Stata 10 SE), all statistical analyses were performed in SPSS (V. 17.0; Chicago, IL).

Results Respondents

Data were elicited in a sample of 444 persons in the general population and 209 students. The general population sample was representative in terms of sex, age, and level of education (Table 3). All students completed the rank, VAS, and TTO tasks. They also completed the DC task, but because of a problem with data storage, responses of five students were not saved. DC responses of those who continually chose only one option were removed from the data set. This applied to six people in the general population sample and none of the students. The DC model was therefore estimated on responses of 204 students and 438 people in the general public. Their responses included no missing values.

Table 3. Characteristics of the two samples Sample (N = 444) General population norms* (%) Students (N = 209) Male, % (N) 48.2 (214) 50.1 30.6 (64)  18–24 3.8 (17) 5.9 79.7 (51)  25–34 7.9 (35) 9.0 18.8 (12)  35–44 10.8 (48) 11.3 1.5 (1)  45–54 9.7 (43) 10.1 —  55–64 10.4 (46) 8.6 —  65–74 5.6 (25) 5.2 — Female, % (N) 51.8 (230) 50.0 69.4 (145)  18–24 4.7 (21) 5.8 82.7 (120)  25–34 9.2 (41) 9.0 16.5 (24)  35–44 11.5 (51) 11.1 0.8 (1)  45–54 10.4 (46) 9.9 —  55–64 10.1 (45) 8.5 —  65–74 5.9 (26) 5.7 — Marital status, % (N)

留言 (0)

沒有登入
gif