Conventional economic evaluations in health typically employ the quality-adjusted life year (QALY) framework, in which the value of a chronic health state is defined as the product of life expectancy and health-related quality of life. Health states are valued on a scale where 1 represents full health, and 0 represent death, or health states considered equivalent to death. This framework is a useful approach for facilitating comparison between interventions that have benefits manifesting in different ways. The assumptions that are required for this framework are widely known, and are testable (Bleichrodt et al., 1997). For example, the constant proportional trade-off assumption has been considered widely, with recent studies exploring methods of adjusting for non-linearity of utility with respect to time (Craig et al., 2018; Jonker et al., 2018a).
There are a variety of methods that have been used to estimate tariffs for health states, the key sacrifice-based methods being Time Trade-Off (TTO), Standard Gamble, and Discrete Choice Experiments (DCEs) (Brazier et al., 2007). The use of DCEs has grown significantly in health generally (Soekhai et al., 2019), and in the valuation of health states specifically (Mulhern et al., 2019). A key reason for this growth is that they produce similar results irrespective of mode of administration, a result which does not hold for TTO (Mulhern et al., 2013; Norman et al., 2010). Thus, they can be administered online, usually without an interviewer, reducing cost and increasing potential geographical spread of valuation surveys.
Discrete choice experiment studies can be subdivided into those that present health states with different life expectancies versus those that either have a fixed life expectancy, or do not state a life expectancy. An example of the latter would be the DCE that is included as part of the EuroQol Group's standard protocol for the valuation of health states in the EQ-5D-5L (Devlin et al., 2018; Ramos-Goni et al., 2017). An issue with the use of an approach without durations or with fixed durations, is that, while it provides values on a latent scale, it does not easily anchor on the required 0-1 scale. The common solution to this, as initially advocated by Flynn et al. and subsequently introduced into the health economics literature by Bansback et al., is to include a duration attribute (Bansback et al., 2012; Flynn, 2010). The principle of this is that it allows a quantification of the trade-offs that respondents make both between dimensions of quality of life, and between quality of life and length of life; trade-offs that are essential for the estimation of QALY tariffs.
In the most commonly used DCE duration (DCEd) elicitation format (also known as DCE with duration, or DCETTO), two different impaired health states are presented with different levels of duration (c.f. Figure 1). Respondents are required to base their choice between the two health states on the overall utility of each choice option, which is defined in the QALY framework as the product of the respondents' utility for each health state multiplied by the corresponding duration. Without such a multiplicative utility function, it is not possible to calculate theoretically appropriate QALY tariffs. However, a multiplicative utility function requires respondents to be able to and willing to perform a series of complicated evaluations. That is, in each choice task, respondents have to evaluate the relative attractiveness of the health states, multiply each by different duration levels, and subsequently choose the option that gives them the highest overall utility. From a theoretical perspective, respondents could easily simplify the choice tasks by avoiding the required multiplication with duration and instead treat duration as a standard, additive attribute. Such a linear additive utility function would correctly take into account that longer duration of life has positive utility to the respondent, and that health problems have a negative utility. Moreover, there is nothing in the choice format that prevents respondents from adopting a linear additive utility function. Hence the use of a linear additive utility function does not violate the question that is asked to respondents; however, it does contradict the assumption built into the QALY framework and analyzing data assuming a multiplicative utility function for respondents that used a linear additive utility function could induce substantial bias in the resultant QALY tariffs.
Example Discrete choice experiment (DCE) duration choice task. Note that immediate death is included as an alternative-specific, third choice option in both datasets that are analyzed in this paper. Including an immediate death state is optional and not universally recommended for DCE duration valuation studies. The estimates of the immediate death parameter are not used in the main text; please see the online supplemental for an assessment of the impact of anchoring the quality-adjusted life year (QALY) tariffs based on the immediate death parameters
Thus far, there is no evidence that substantiates that respondents actually use the theoretically imposed multiplicative utility function. At the same time, there is also no evidence that respondents are not using the theoretically required utility function, yet there is ample evidence that at least a subset of respondents in DCE research tends to simplify complicated choice tasks, for example by only focusing on a few instead of all of the included attributes (i.e., so-called attribute non-attendance, see e.g., Hole et al., 2016, Jonker et al., 2018b). Accordingly, this paper aims to establish the type of utility function that respondents most likely use in DCE-duration datasets, and aims to provide a quantitative assessment of the impact that respondents who simplify the choice tasks can have on the resulting QALY tariffs. Hence the two key purposes of this work are to (1) establish whether or not people actually make choices that match the theoretically required multiplicative utility function (as opposed to resorting to a simpler yet equally feasible linear additive utility function) and (2) estimate the impact on (i.e., bias of) the estimated QALY tariffs when not adequately taking into account that some respondents may not have used the theoretically required multiplicative utility function.
2 METHODS 2.1 Modeling approach The general idea of the modeling approach used in this paper is that the overall utility (Uijt) that respondent i obtains from alternative j in choice task t can be derived from one of two a-priori equally reasonable and theoretically sound utility functions, and , that is, (1)In Equation (1), the selection parameter () reflects the respondents' probabilities of having used the first utility function as opposed to the second, and the error term (εijt) is assumed to be independently and identically Gumbel distributed.
The first utility function is the standard multiplicative utility function and defined as the quality of the health state (Hijt) multiplied with the duration of life (Dijt) in years, that is, (2) The quality of the health states () is defined as the dot product of the K dummy coded health state characteristics () and preference coefficients (), that is, (3)with the first element of equal to 1 and the last equal to 0 if j = 1,2 and the opposite if j = 3. Accordingly, is defined as the perfect health intercept and as the immediate death intercept, although the latter only needs to be included if the DCE design includes immediate death as an alternative-specific, third choice option (cf. Figure 1). The second utility function is the additive utility function and defined as the sum of the health state quality () and the utility attributed to the dummy-coded duration of life () levels, that is, (4) As before, the quality of the health states () is defined as the dot product of the K dummy-coded health state characteristics excluding the perfect health intercept that is only relevant in the multiplicative utility function (), and the associated preference coefficients (): (5)whereas is defined as the dot product of the M dummy-coded duration levels (Qijtm) and associated preference coefficients (), that is, (6)In the default model specification, all are fixed at 1. This implies that a standard multiplicative utility function is used for all respondents, which is the default modeling option for QALY tariff estimations using DCE duration data. To test the hypothesis that there is at least a subgroup of respondents that is more likely to use a linear additive utility function than the theoretically required multiplicative utility function, the alternative model specification includes as model parameters to be estimated. The second specification is thus a latent class model with two classes, in which the first class captures the theoretically required multiplicative utility function and the second class the linear additive utility function.
In latent class models, each respondent is assigned to the different classes with its own probability. To establish the fraction of respondents who were more likely to have used the simpler linear additive utility function rather than the multiplicative utility function, respondents' mean estimates are used. Respondents with are considered to have used a linear additive whereas respondents with are considered to have used a multiplicative utility function. Additionally, to obtain an estimate of the sensitivity of the presented results, all respondents were also classified into one of three groups. These groups were (1) respondents that almost certainly used a multiplicative utility function, (2) respondents that almost certainly used a linear additive utility function, and (3) respondents for which the observed choice tasks provide insufficient information for a sufficiently reliable classification. Four different cut-off values were used to assign respondents into each of these groups, that is versus , versus , versus , and versus , each being increasingly more conservative than the default versus cut-off rule.
All model specifications were programmed in the BUGS language, which means that Bayesian Markov Chain Monte Carlo (MCMC) methods were used to fit the model parameters. The following prior distributions were used:Specification 1: The multiplicative logit model
, ∼Normal(0,0.01)
Specification 2: The latent class logit model
, ∼Normal(0,0.01), ∼Normal(0,0.01)
Both models were fitted using OpenBUGS using two MCMC chains of 20,000 draws for the multiplicative and 40,000 draws for the latent class logit models, respectively. Half of the draws were discarded as burn-in iterations and convergence was evaluated based on a visual inspection of the MCMC chains and the diagnostics as implemented in the OpenBUGS software.
Because the parameter estimates of specifications 1 and 2 are on different (latent) utility scales they are not directly comparable. The parameter estimates on the utility scale are included in the online supplemental, whereas parameter estimates transformed onto the QALY scale are included in the main text. These QALY tariffs were calculated by dividing all elements of by the first element of , which represents the full-health intercept, and are directly comparable. Based on the position of the immediate death parameter it was possible to further re-scale the QALY tariffs, the estimates of which are reported in Appendix B as part of the sensitivity analysis. Note that it is not possible to calculate QALY tariffs from the linear additive specifications.
2.2 Datasets usedThe two datasets both come from Australia, and consider preferences for health states in two different instruments, specifically the EQ-5D-5L (Norman et al., 2013) and the SF-6D (Norman et al., 2014). The details of the data collection, survey design, and base case analysis are given elsewhere. Briefly, both studies conducted general population valuation studies using an online panel of respondents who had previously stated a willingness to participate in such research. Both studies asked respondents to state a preference between combinations of health states and duration and used an efficient DCE design in which different impaired health states were combined with different durations of life. Both studies had a relatively large sample size (973 for the EQ-5D-5L and 1017 for the SF-6D), and both assumed that respondents considered a multiplicative function (i.e., that all = 1). The studies differed in number of choice tasks per respondents (10 for the EQ-5D-5L and 15 for the SF-6D).
3 RESULTSTable 1 presents the aggregated class membership percentages calculated using the various cut-off values based on the mean estimates. When all respondents are either assigned to the additive or multiplicative utility function, 76 and 71 percent of the respondents in the latent class conditional logit models are considered to have used an additive utility function in the EQ-5D and SF-6D data, respectively. Conversely, only 24 and 29 percent used the required multiplicative utility function for the QALY tariff calculations.
TABLE 1. Aggregated class membership (in percentages), by cut-off value Dataset Cut-off values Additive utility Unclear Multiplicative utility EQ-5D versus 76 0 24 versus versus 74 5 21 versus versus 73 7 20 versus versus 70 11 19 versus versus 64 18 18 SF-6D versus 71 0 29 versus versus 66 9 25 versus versus 63 14 23 versus versus 56 24 20 versus versus 48 33 19When more conservative cut-off values are used and, consequently, an “uncertain” category of respondents that could have used either utility function is introduced, the percentage of respondents who are thought to have definitely used a linear additive utility function decreases. As shown in Table 1, the share of the linear additive utility function reduces from 76 to 64 and from 71 to 48 percent if the most stringent cut-off rule is applied. But with more stringent cut-off rules the percentage of respondents who are designated as having used a multiplicative utility function also decreases. Hence, irrespective of the cut-off values used, the percentage of respondents assigned to the linear additive utility function is larger than the percentage of respondents assigned to the multiplicative utility function.
Tables 2 and 3 present the calculated QALY tariffs from the EQ-5D and SF-6D datasets, respectively. Whereas the full sample results are based on all respondents, irrespective of whether they used a multiplicative or linear additive utility function, the multiplicative latent class results reflect the QALY tariffs solely derived from those who used the theoretically required multiplicative utility function. As shown, most Bayesian 95% credible intervals do not include zero and almost all parameters have the expected sign. For both datasets, the latent class QALY weight estimates are very different from those derived from the entire sample, which confirms that a model-based correction of the QALY tariff estimates for the influence of respondents who used a non-multiplicative utility function has a strong effect on the calculated QALY tariffs.
TABLE 2. EQ-5D-5L quality-adjusted life year weights* Attributes/levels Entire sample Multiplicative class only Full health 1.00 (n/a) 1.00 (n/a) Mobility 2 −0.08 (−0.13,−0.04) −0.15 (−0.27,−0.03) Mobility 3 −0.10 (−0.14,−0.05) −0.15 (−0.26,−0.03) Mobility 4 −0.28 (−0.33,−0.24) −0.36 (−0.50,−0.23) Mobility 5 −0.37 (−0.42,−0.32) −0.40 (−0.54,−0.28) Self-care 2 −0.07 (−0.11,−0.02) −0.03 (−0.15, 0.11) Self-care 3 −0.10 (−0.14,−0.05) −0.13 (−0.25,−0.00) Self-care 4 −0.24 (−0.28,−0.20) −0.30 (−0.42,−0.18) Self-care 5 −0.34 (−0.39,−0.30) −0.44 (−0.58,−0.31) Usual activities 2 −0.11 (−0.15,−0.06) −0.07 (−0.20, 0.07) Usual activities 3 −0.13 (−0.17,−0.08) −0.17 (−0.29,−0.05) Usual activities 4 −0.29 (−0.34,−0.25) −0.25 (−0.39,−0.12) Usual activities 5 −0.31 (−0.35,−0.26) −0.24 (−0.37,−0.11) Pain/discomfort 2 −0.07 (−0.12,−0.03) −0.13 (−0.24,−0.01) Pain/discomfort 3 −0.08 (−0.12,−0.03) −0.15 (−0.25,−0.03) Pain/discomfort 4 −0.27 (−0.31,−0.22) −0.39 (−0.52,−0.27) Pain/discomfort 5 −0.35 (−0.40,−0.31) −0.61 (−0.78,−0.46) Anxiety/depression 2 −0.16 (−0.20,−0.11) −0.30 (−0.42,−0.19) Anxiety/depression 3 −0.24 (−0.28,−0.20) −0.34 (−0.46,−0.23) Anxiety/depression 4 −0.44 (−0.49,−0.39) −0.66 (−0.84,−0.52) Anxiety/depression 5 −0.42 (−0.47,−0.37) −0.72 (−0.89,−0.57) “Pits” (5-5-5-5-5) −0.79 (−0.90,−0.69) −1.40 (−1.80,−1.08) Sample/class size (%) 100% 24% * 95% Bayesian credible intervals in parentheses. TABLE 3. SF-6D quality-adjusted life year weights* Attributes/levels Entire sample Multiplicative class only Full health 1.00 (n/a) 1.00 (n/a) Physical functioning 2 −0.04 (−0.07,−0.01) −0.14 (−0.24,−0.04) Physical functioning 3 −0.08 (−0.10,−0.05) −0.13 (−0.23,−0.03) Physical functioning 4 −0.14 (−0.16,−0.11) −0.27 (−0.38,−0.17) Physical functioning 5 −0.15 (−0.17,−0.12) −0.36 (−0.47,−0.25) Physical functioning 6 −0.31 (−0.34,−0.28) −0.48 (−0.61,−0.37) Role limitations 2 −0.09 (−0.12,−0.07) −0.09 (−0.19, 0.01) Role limitations 3 −0.06 (−0.09,−0.04) −0.07 (−0.16, 0.03) Role limitations 4 −0.13 (−0.15,−0.10) −0.14 (−0.23,−0.05) Social functioning 2 −0.02 (−0.05, 0.01) 0.05 (−0.06, 0.17) Social functioning 3 −0.03 (−0.05,−0.00) −0.01 (−0.10, 0.10) Social functioning 4 −0.11 (−0.14,−0.09) −0.05 (−0.14, 0.04) Social functioning 5 −0.12 (−0.14,−0.09) −0.17 (−0.26,−0.08) Pain 2 −0.08 (−0.11,−0.05) −0.09 (−0.19, 0.02) Pain 3 −0.18 (−0.21,−0.16) −0.21 (−0.31,−0.11) Pain 4 −0.21 (−0.24,−0.18) −0.23 (−0.33,−0.12) Pain 5 −0.30 (−0.32,−0.27) −0.39 (−0.51,−0.28) Pain 6 −0.29 (−0.32,−0.26) −0.39 (−0.52,−0.27) Mental health 2 −0.07 (−0.09,−0.04) −0.12 (−0.22,−0.03) Mental health 3 −0.08 (−0.11,−0.05) −0.15 (−0.24,−0.06) Mental health 4 −0.19 (−0.22,−0.16) −0.33 (−0.44,−0.23) Mental health 5 −0.29 (−0.31,−0.26) −0.36 (−0.47,−0.27) Vitality 2 −0.01 (−0.03, 0.02) −0.02 (−0.11, 0.08) Vitality 3 −0.04 (−0.07,−0.01) −0.07 (−0.17, 0.04) Vitality 4 −0.21 (−0.24,−0.19) −0.22 (−0.32,−0.13) Vitality 5 −0.26 (−0.28,−0.23) −0.29 (−0.38,−0.19) “Pits” (6-4-5-6-5-5) −0.38 (−0.44,−0.32) −0.83 (−1.07,−0.62) Sample/class size (%) 100% 29% * 95% Bayesian credible intervals in parentheses. 4 DISCUSSION 4.1 Overview of resultsThe analyses reported here give a strong indication that, in two different datasets using different instruments, the majority of participating respondents did not use the multiplicative utility function that is required for the construction of QALY tariffs. These respondents did not necessarily simplify the choice tasks because they were paying insufficient attention or were using a heuristic. In contrast, the additive utility function correctly takes into account that more health problems are a bad thing and that more duration is a good thing, with the overall attractiveness of the profile being determined by the sum of the two (cf. Equations 4–6-4–6). Still, the presented results indicate that these respondents violated a key assumption of the QALY framework. This is a significant finding because tariffs that are based on the entire sample differ significantly from tariffs that are derived from the subset of respondents who use the theoretically required multiplicative utility function. More specifically, including respondents who used a linear additive utility function in the QALY tariff calculations by assuming they used a multiplicative approach results in a sizable upwards bias in the tariff with smaller (i.e., less negative) decrements.
The study has a number of strengths. The findings translate across multiple datasets, suggesting it is not a problem unique to a single instrument. Moreover, as shown in the online supplemental, the presented results are robust to the accommodation of preference heterogeneity in the modeling approach, remain robust when the estimates of alternative specific immediate death health states were used to anchor the calculated QALY tariffs at zero, and, using Monte Carlo simulations, confirmed to be based on identified latent class logit models with sufficient statistical power.
The study also has several potential limitations. First, fitting latent class logit models requires adequate information to be able to distinguish, at the individual level, between the additive and multiplicative utility functions. Unlike models with a single, fixed utility function, there is limited opportunity to borrow strength from the population-level estimates, meaning that the parameters crucially rely on the information obtained from the individual-level data. In this respect, being able to fit the models as included in this paper was possible because of the efficiently optimized DCE designs in both the EQ-5D and SF-6D datasets. However, neither of the datasets that were used was specifically optimized to be able to distinguish between different utility functions. With more appropriately optimized DCE designs and/or with a larger number of choice tasks per respondent it seems reasonable to assume that fewer respondents would be classified in the intermediate “uncertain” category.
Second, respondents who neither used a multiplicative nor additive utility function are always assigned to one o
留言 (0)