EQ-5D-5L measurement properties are superior to EQ-5D-3L across the continuum of health using US value sets

Theoretical value set characteristics

Of the three available US value sets for the EQ-5D, the 5L value set had the largest range of scale of 1.573 (vs. 1.109 for 3L and crosswalk). The 5L health state also assigned the largest percent of health states with index values less than 0, i.e., worse-than-dead. Only 1.2% and 4.1% of the health states in the crosswalk and 3L value sets were WTD compared to 19.8% of the 5L value set health states. (Table 2) The 5L value set also had the smallest utility difference between 11111 and the health state with the next highest utility value. This utility difference was 0.057, 0.112, and 0.140 for the 5L, crosswalk, and 3L value sets, respectively. The mean single-level transition across all health states was largest for the 3L value set with a mean (SD) of 0.111 (0.029). The crosswalk had the smallest mean single-level transition of 0.061 (0.017), whereas this value for the 5L was 0.078 (0.014).

Table 2 Theoretical value set characteristics

All mean single-level transitions from each health state described by the value sets were plotted using scatter plots as a function of the starting EQ-5D index in Fig. 1. From these scatterplots, the 5L value set demonstrated improved interval measurement properties as the mean single-level transitions are closest to the mean and consistent throughout the range of health state severity as measured by level sum score. The 3L and crosswalk value sets each have a clear outlier for the mildest health state (11111) caused by the relatively large distance between 11111 and the next best health state for both value sets. The distance is 0.140 for the 3L and 0.112 for the crosswalk (Table 2).

Fig. 1figure 1

Mean single level transitions by utility of starting health state. Each panel depicts the scatterplot of mean single-level transitions for a US EQ-5D value set. The horizontal line in each graph is the mean single value transition across all single-level transitions for the plotted value set

Furthermore, the potential for interval measurement properties was demonstrated by the smoothed kernel density plots (Fig. 2). The 5L value set distribution was closest to a normal distribution with a single maximum point, whereas the 3L value set had multiple local maxima. The crosswalk value set only had a single maximum, but the distribution was skewed.

Fig. 2figure 2

Kernel density plots. 5L—EQ-5D-5L value set (243 health states), 3L—EQ-5D-3L value set (3125 health states), crosswalk: matched 5L–3L value set (243 health states)

Empirical value set comparisonDiscriminative ability—respondent characteristics

In terms of statistical efficiency, in the US valuation data, the 5L value set tended to be more discriminative than the crosswalk (F-statistic ratio: 1.111 95% CI 0.989–1.240) and 3L (F-statistic ratio: 1.102 95% CI 0.861–1.383) across levels of general health (Fig. 3). Furthermore, across categorical groupings of EQ VAS, the 5L was the most discriminative (F-statistic ratios 1.050–1.430) in both the US valuation and the parallel fielding datasets (Fig. 3).

Fig. 3figure 3

F statistic ratios by EQ VAS and self-reported health state in parallel fielding data

Within disease states, the 5L value set was also consistently more discriminative than the 3L and crosswalk value sets for varying EQ VAS with few exceptions (Fig. 3). The crosswalk value set was more discriminating than the 5L value set in diabetes, rheumatoid arthritis/arthritis, and stroke, and F-statistic ratios were 0.981, 0.935, and 0.962, respectively. Other F-statistic ratios ranged from 1.077 to 1.513, indicating greater relative efficiency of the 5L value set over the crosswalk and 3L value sets.

Responsiveness—simulated utility values by EQ VAS

In the US valuation dataset of general population respondents, the simulated utility values for each of the three compared value sets were similar across the range of EQ VAS values (0–100). The mean 5L utility value varied from 0.749 (95% CI 0.732–0.764) to 0.876 (95% CI 0.866–0.885) compared to the crosswalk values of 0.790 (95% CI 0.780–0.800) through 0.871 (95% CI 0.864–0.878) and 3L values of 0.806 (95% CI 0.795–0.815) to 0.889 (95% CI 0.882–0.897) (Additional file 2: Appendix B). These simulated index values were plotted as ribbon plots in Fig. 4. For each value set pictured in Fig. 4, the dark solid line represents the average simulated index value at a given EQ VAS. The medium shading and light shading represented the interquartile range and the 95% confidence interval of the simulated index values, respectively. In the US valuation dataset, the simulated utility values were similar across the entire spectrum of EQ VAS values for all three value sets (Fig. 4a).

Fig. 4figure 4figure 4

Ribbon plots for simulated mean and 95% confidence interval utility index values by value set by visual analogue value

Larger utility differences were noted between value sets in the parallel field dataset. The mean 5L utility value ranged from 0.489 (95% CI 0.465–0.512) through 0.734 (95% CI 0.716–0.750) compared to the crosswalk values range of 0.630 (95% CI 0.616–0.645) to 0.783 (95% CI 0.771–0.793), and US 3L values ranged from 0.625 (95% CL 0.609–0.641) to 0.784 (95% CI 0.772–0.795). (Additional file 2: Appendix B) In the student group of the parallel fielding dataset, the three value sets produced closer utility values across the EQ VAS spectrum (Fig. 4c).

For all health conditions in the parallel fielding dataset, the 5L value set produced lower utility values than the 3L and crosswalk value sets for all EQ VAS values (Figs. 3d–k). For health conditions such as rheumatoid arthritis/arthritis, cardiovascular disease, and depression, the 5L value set may be more discriminative across different levels of health and/or responsive to changes in health. In most health conditions, the 5L index values changed more rapidly with differences with EQ VAS, i.e., steeper slope, between VAS values of 25 and 75. This trend is less evident in stroke and personality disorders (Additional file 2: Appendix B, Fig. 4).

This study represents a key addition to the literature in comparing the available US EQ-5D value sets and also introduces a novel simulation method for empirical responsiveness comparison across the entire spectrum of health using cross-sectional data. These results demonstrated that the US 5L value set had more desirable theoretical and empirical measurement properties than the US 3L and crosswalk value sets. The improved interval measurement properties of the 5L were supported by the scatterplots of the mean individual-level transitions and kernel density plots of index values (Fig. 2). These figures highlight key benefits of the 5L value set—consistent, predictable transitions between adjacent health states across the entire scale. The crosswalk value set had the smallest mean single-level transition of the three value sets, but this observation can be attributed to many health states (3125) over a shorter range of scale (1.109). Related to both the value set range of scale and the increased levels of severity in its descriptive system, the US 5L value set was found to be generally more discriminative than the 3L and crosswalk value sets in both datasets.

The 5L was also the most responsive of the three value sets; within the simulation analyses, the responsiveness of value sets was most distinct between EQ VAS values of 25 and 75, with the steeper slope of the 5L value set demonstrating greater responsiveness. The slopes of the compared value sets were similar between low (0–25) and high (75–100) EQ VAS values, and responsiveness distinctions were less conclusive in patients with poor and good health, respectively. However, if the discriminative ability is used as a proxy measure for responsiveness, the 5L was found to be more discriminative in the students’ group of parallel fielding data and the US valuation respondents in terms of F-statistic ratios. These can be considered as two healthy groups similar to patients with EQ VAS greater than 75. Therefore, a key shortcoming of the 3L (i.e., decreased sensitivity to change) in healthier patients may be addressed by 5L and the corresponding value set [9]. An evidence gap remains in understanding the measurement properties of US value sets in patients with very poor health. This could not be pursued in the current analyses as a few severely ill (i.e., had EQ VAS values < 50) patients were included in the empirical datasets.

This study builds upon the Law et al. study through the application of the official US value sets using a novel method to compare instrument/value set performance [16]. The increased discriminatory ability of the 5L identified in this study is generally consistent with findings in other countries and studies, including a recent empirical head-to-head comparison of value sets for multiple countries [7, 16, 22]. However, previous evidence comparing responsiveness to change between value sets is mixed—some studies reported 5L had improved responsiveness while others found no or even reduced responsiveness [12, 23,24,25]. These discrimination and responsiveness findings may be disease state and/or geographically dependent [26, 27]. Further evaluations of value set responsiveness in specific disease states using longitudinal data may be limited by the lack of such data availability. The novel, simulation-based method outlined in this study can be applied to cross-sectional data to investigate the responsiveness of the value sets across the entire health spectrum (e.g., EQ VAS 0–100). This method enables broader insight than previous studies by showing the relative performance of measures/value sets across a broad range of levels of health. In this way, our results and future application of this method to other datasets can help to inform choice of measure and value sets prior to clinical trial initiation. The method may also be extended to comparisons of other instruments if health anchors external to the instruments’ descriptive systems is included in addition to the other instruments.

Based on these findings, general consequences of the choice of descriptive system and/or value set for health measurement and cost-effectiveness may be identified. The 5L instrument and its US value set can better distinguish patients with different levels of health. Additionally, changes between 5L index values over time may be greater than changes measured using the 3L and crosswalk value sets when anchored on EQ VAS changes. The 5L value set index values are more sensitive to changes or differences in health. These larger utility differences for improvements in health may also result in a lower incremental cost-effectiveness ratio if survival benefits are similar between comparators.

This study was limited by the few available data sets with 3L and 5L responses provided by the same respondent. These analyses were also not conducted using trial data or longitudinal data; however, evaluation using such datasets would constrain results to only the observed changes whereas these analyses provide evidence on how changes in underlying health may be reflected in index values and potential implications for QALYs across the entire spectrum of observed health. The responsiveness analyses were only conducted using EQ VAS as an anchor; additional evaluations are necessary to confirm these findings using other measures of health. The analyses were all conducted using the EQ VAS administered following the 5L descriptive system; an “order effect” may be present where the EQ VAS value was influenced by the descriptive system administered immediately prior [11]. However, the 3L EQ VAS was not available in both datasets used, so sensitivity analyses could not be conducted.

留言 (0)

沒有登入
gif