Comparison of preference-based health-related quality of life measures for chronic neck pain: a pooled analysis of data from three RCTs

STRENGTHS AND LIMITATIONS OF THIS STUDY

Our investigation indicates that the EuroQol-5 Dimension 5 Levels might marginally enhance the assessment of functional status in individuals suffering from chronic neck pain.

The outcomes of this research could aid scholars in executing health economic evaluations of musculoskeletal disorders, notably severe neck pain.

Although the three randomised controlled trials encompassed participants diagnosed with chronic neck pain, there existed a subtle variance in the inclusion criteria across studies.

The potential context effect, denoting the impact of preceding elements or encounters on responses within surveys comprising multiple questions.

Background

Musculoskeletal pain impacts patients’ health-related quality of life (HRQOL) in terms of physical health and function.1 Neck pain not only imposes a substantial socioeconomic burden on individuals’ daily lives but also disproportionately affects young and productive population groups.2 In 2017, there were 65 300 incident cases of neck pain globally, with the number of prevalent cases reaching 2.887 million.3 Consequently, there is a pressing need for an instrument that enables close and accurate evaluation and reflection of HRQOL in patients with neck pain.

HRQOL instruments, such as the EuroQol-5 Dimension (EQ-5D) or Short Form 6 Dimension (SF-6D), are particularly important in economic evaluations as they generate utility values by translating health states into index values (health utilities) used in these assessments. Given the increasing importance of economic evaluations in healthcare resources allocation, health technology assessment guidelines recommend quality-adjusted life years (QALYs) as a primary outcome due to their ability to compare different health conditions.4 5

Among preference-based HRQOL measures, the EQ-5D and SF-6D are widely used. The SF-6D version 1 (SF-6Dv1), derived from the 36-Item Short Form Survey, evaluates six dimensions: physical function, role limitation, social function, pain, mental health and vitality.6 The EQ-5D-3L (EQ-5D 3 Levels), one of the most widely used generic preference-based measures, assesses HRQOL across five dimensions with three response levels, providing 243 distinct health states. Its updated version, the EQ-5D-5L (EQ-5D 5 Levels), improves granularity with five response levels per dimension, offering 3125 distinct health states.7

Researchers frequently encounter challenges when choosing between the SF-6D and EQ-5D for disease-related studies, as interventions usually offer only partial relief, and the selection of the instrument may result in variations in QALY gain. Differential psychometric properties between EQ-5D-5L and SF-6Dv1 have been previously explored in patients with low back pain (LBP) in China.8 Additionally, studies comparing the SF-6Dv1 and EQ-5D have reported inconsistencies in utility scores across different patient groups,9 particularly in East and Southeast Asian populations, where studies indicate that utility scores derived from the SF-6Dv1 and EQ-5D are not always consistent, further complicating measure selection.10 These observed inconsistencies highlight the need for further investigation to determine which preference-based HRQOL measure best reflects disease-specific characteristics in different patient populations.

In light of these measurement challenges, this study aimed to identify the preference-based HRQOL measure that best reflects disease-specific features in patients with neck pain by comparing the characteristics of different instruments, ultimately providing guidance for selecting the most appropriate HRQOL measure for future health economic analysis in this population.

Methods

Data from three multicentre randomised controlled clinical trials (RCTs) on neck pain were included for data analysis in this study. All three RCTs were conducted between 2017 and 2020 in Korea, and patients were recruited from four hospitals designated by the Ministry of Health and Welfare and one university teaching hospital. The first RCT compared treatment effectiveness between Chuna manual therapy and usual care (oral medication and physical therapy) groups for patients with chronic neck pain persisting for more than 3 months and a Visual Analogue Scale (VAS) score ≥5 (ClinicalTrials.gov NCT03294785).11 The second RCT compared the effectiveness of acupuncture combined with Doin therapy versus acupuncture alone in patients with chronic neck pain persisting for more than 6 months with a VAS score >5 (ClinicalTrials.gov NCT03558178).12 The third RCT was a comparative study on the effectiveness of pharmacopuncture and physical therapy in patients with chronic neck pain persisting for >6 months with a VAS score >5 (ClinicalTrials.gov NCT04035018).13 This pooled analysis was conducted by coauthors who were also researchers in the three original RCTs. As the analysis used only de-identified data and aligned with the original objectives of the clinical trials, additional institutional review board approval was not required. For comparative analysis between disease-specific instruments and instruments for the assessment of HRQOL, the following outcomes were collected from patients: the Numeric Rating Scale (NRS) for neck pain, VAS for neck pain, Neck Disability Index (NDI), Northwick Park Questionnaire (NPQ), EQ-5D-5L and 12-Item Short Form Health Survey (SF-12) scores. In total, data from 313 patients were collected across the three RCTs described above. In this study, the values measured at baseline and at week 5 (primary endpoint after treatment) were used. More details can be found in the study protocols and publications regarding the results of these RCTs.11–13

Pain and function outcomes

NRS and VAS scores were assessed for neck pain outcomes, and NDI and NPQ scores were assessed for function outcomes.

HRQOL outcomesEQ-5D-5L

EQ-5D is the most widely used preference-based measure for evaluating HRQOL to indirectly assess certain health states related to QoL. It uses a pre-assigned preference score for each functional level after a multidimensional investigation of these health states. The EQ-5D-5L questionnaire consists of five dimensions (mobility, self-care, usual activity, pain/discomfort and anxiety/depression), and responses can be categorised into five levels.14 The Korean-adapted version, which has been shown to be reliable and valid, was used in the present study,15 and the utility scores were calculated based on the EQ-5D-5L valuation study conducted on the Korean population.16

SF-6D

The utility scores of SF-6Dv1 were calculated based on data collected through the SF-12 health survey.17 The SF-12 consists of 12 items that assess eight health domains. Physical health-related domains include general health, physical functioning, role physical and body pain. Mental health-related domains include vitality, social functioning, role emotional and mental health.17 The RCTs included in this study used the Korean version of the SF-12, the validity and reliability of which have been verified.18 The SF-6Dv1 consists of six dimensions: physical functioning (six levels), role limitations (four levels), social functioning (five levels), pain (six levels), mental health (five levels) and vitality (five levels).6 However, when the SF-6Dv1 is derived from the SF-12, the dimensions differ, with physical functioning having three levels and pain having five levels.17

Statistical analysis

The Spearman’s rank correlation test was performed considering the mean and SD of continuous variables, frequency and percentages (%) of categorical variables and normality assumptions of the outcome variables (VAS, NRS, NDI and NPQ scores), EQ-5D-5L and SF-6Dv1.

R software packages (R V.4.1.3, www.r-project.Org; R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analysis.

Patient and public involvement

Patients and the public were not involved in the design or conduct of this study.

ResultsGeneral characteristics

The data of 313 patients with neck pain were analysed. Among them, 66.77% (n=209) were women; the mean age was 43.55 years, which corresponds to that of the productive population. The mean VAS score was 59.84±11.64, the mean NDI score was 32.96±11.08 and the mean EQ-5D-5L and SF-6Dv1 scores were 0.76±0.11 and 0.68±0.12, respectively, indicating slightly higher EQ-5D-5L scores (table 1).

Table 1

Basic characteristics

Spearman’s correlation

Spearman’s correlation analyses between the pain outcomes VAS and NRS scores and EQ-5D-5L score revealed negative correlation coefficients of −0.277 and −0.262, respectively, at baseline. Likewise, VAS and NRS scores were negatively correlated with the SF-6Dv1 score, showing correlation coefficients of −0.207 and −0.182, respectively, indicating a slightly stronger correlation of the pain scale scores with EQ-5D-5L score than with the SF-6Dv1 score. The correlations were weak at baseline when pain was severe; when pain levels were reduced, the correlations also increased.

In terms of correlations with functional outcomes, when the values at baseline and week 5 were comprehensively considered, SF-6Dv1 showed moderate negative correlations ranging from −0.487 to −0.579 with the function-related instruments NPQ and NDI, while EQ-5D-5L exhibited stronger negative correlations, ranging from −0.636 to −0.711 (table 2). The correlation between differences in function-related outcomes and utility measures was relatively lower for both instruments.

Table 2

Spearman’s correlation between pain and function scores and utility scores

Distribution differences of the utility score (comparison of distribution)

At baseline, the mean difference in scores between EQ-5D and SF-6Dv1 was 0.074; an arrowhead (∧) distribution pattern was observed in the Bland-Altman plot (figure 1). When the average utility score was between 0.6 and 0.8, EQ-5D-5L scores were higher than SF-6Dv1 scores; however, when the average utility score was ≥0.8, SF-6Dv1 scores were higher than EQ-5D-5L scores. SF-6Dv1 scores showed a relatively wide distribution across the average utility score; for utility scores lower than 0.7, the difference between SF-6Dv1 and EQ-5D-5L scores increased. At week 5, the mean difference between the EQ-5D-5L and SF-6Dv1 scores decreased (mean, 0.065). In the distribution of values, EQ-5D-5L scores were still higher around the median value of the average utility score. The distribution showed a concentrated pattern in the section where the average utility score was 0.6 or higher. Notably, at week 5, the distribution of scores in the left section where the average utility score was 0.6 or lower almost disappeared, indicating that the distribution of SF-6Dv1 scores shifted to patients showing high HRQOL.

Figure 1Figure 1Figure 1

Bland-Altman plot for EQ-5D-5L and SF-6Dv1 scores at baseline and week 5. EQ-5D-5L, EuroQol 5-Dimension 5 Levels; SF-6D, Short-Form 6-Dimensions version 1.

Item differences of the utility score (comparison of items between HRQOL-related instruments)

In the case of SF-6Dv1, at baseline, the ratio of SF-6Dv1 values of level 3 or higher was 68.37% in the domain of role limitation and 70.93% in the domain of pain (online supplemental figure 1 and table 3). At week 5, the ratio of SF-6Dv1 values of level 3 or higher was 53.03% in the role limitation domain and 32.9% in the pain domain; thus, among all the SF-6Dv1 domains, the greatest improvement was observed in terms of pain. The role limitation and pain domains exhibited a greater impact on HRQOL than did the other domains. When deriving the SF-6Dv1 scores from the collected SF-12 responses, the number of response levels in the survey items changed, making it difficult to directly compare the floor and ceiling effect (table 3).

Table 3

Distribution of levels across EQ-5D-5L and SF-6Dv1 dimensions at baseline and week 5

As shown in table 3 and online supplemental figure 2, among the five domains of EQ-5D-5L, the pain/discomfort domain had a low ceiling effect of 1.6% and 11.82% at baseline and week 5, respectively, indicating that pain/discomfort is the most prevalent problem for patients with neck pain (table 3). The self-care domain demonstrated a high ceiling effect of 69.97% and 77.32% at baseline and week 5, respectively, indicating that self-care is the domain causing the least problems among the domains of EQ-5D-5L. Among the 313 patients, at baseline, 1.28% of the patients responded that they had no problems in any of the five domains of EQ-5D-5L, and at week 5, 9.90% of the patients responded that they had no problems. No floor effect was observed in the assessment using EQ-5D-5L (table 3).

Discussion

EQ-5D and SF-6D are the two most commonly used instruments for economic evaluation across various diseases, with several studies comparing their use in different conditions.19 20 Both measures employ different health-state valuation methods: EQ-5D uses the time trade-off (TTO) method, whereas SF-6Dv1, derived from SF-12, uses the standard gamble method.17 21 With the introduction of SF-6Dv2, the valuation methods have expanded to include both the TTO method and discrete-choice experiments with a duration dimension.22 23 This study aimed to identify which preference-based HRQOL measure, EQ-5D or SF-6Dv1, best reflects disease-specific characteristics in patients with chronic neck pain, providing guidance for future health economic analysis.

In this study, both health utility scores were more sensitive to functional outcomes (NDI and NPQ) than to pain outcomes (VAS and NRS). While both measures have a pain dimension, they also include dimensions related to role limitation (usual activity in EQ-5D) and physical function (mobility in EQ-5D), which may explain the stronger correlation with functional outcomes than with pain. Treatments that improve functional outcomes may lead to higher utility scores compared with those that primarily address pain, potentially resulting in more favourable outcomes in health economic analysis.

This study also examined which instrument better captured disease-specific characteristics of neck pain from baseline to week 5, including all levels of neck pain severity. Unlike previous studies, such as one conducted in China comparing EQ-5D-5L and SF-6Dv1 scores at baseline for patients with LBP,24 this study compared scores at baseline and at week 5. Regardless of the HRQOL tool used, stronger negative correlations were observed in functional outcomes at the primary endpoint compared with baseline and weaker correlations in pain outcomes, with EQ-5D-5L showing slightly stronger negative correlations at all time points compared with SF-6Dv1. At baseline, when neck pain was severe, the correlation was weak; however, by week 5, when pain had reduced, the correlation for function scores (NDI and NPQ) significantly increased. This suggests that HRQOL is better reflected in patients whose pain has improved, whereas HRQOL measures may not fully capture severe neck pain.

Although there was a correlation between these utility measures and functional scores, EQ-5D-5L and SF-6Dv1 were not interchangeable due to differences in their distribution. This finding is consistent with results reported in previous LBP studies24 and systematic reviews.10 The discrepancies arise from differences in health state classifications and valuation methods, particularly in dimensions such as physical functioning, self-care and usual activities. Moreover, the inclusion of a vitality dimension in SF-6Dv1 could contribute to the distribution differences. Our study confirms that the distribution of health utility values between the two instruments varies depending on disease severity.

While a previous study compared EQ-5D-3L and SF-6Dv1 in patients with non-specific neck pain, our study differs in that we performed a head-to-head comparison between EQ-5D-5L and SF-6Dv1 at multiple time points. In the earlier study,25 EQ-5D-3L showed a wider range of values, while in our study, SF-6Dv1 exhibited a slightly broader distribution. Additionally, whereas previous research25 indicated that EQ-5D had a lower mean score compared with SF-6Dv1, our study found that EQ-5D-5L had a slightly higher mean score. Previous studies using EQ-5D-3L also identified a ceiling effect,25 and while our study anticipated potential differences with the use of the five-level version, some indications of a ceiling effect were still observed in EQ-5D-5L.

This study has several limitations. Although the three RCTs included in this study all involved patients with chronic neck pain, there were slight differences in the inclusion criteria. In two RCTs, participants had chronic neck pain for more than 3 months, while in the other RCT, patients had pain for more than 6 months. Recent publications suggest that neck pain lasting more than 12 weeks (3 months) is significant, while pain persisting for more than 6 months is considered a more stringent criterion.26

Another limitation is the potential context effect, which refers to how preceding items or experiences influence responses in multi-item questionnaires. Participants in this study may have been influenced by the context effect when answering consecutively presented questions.27 Additionally, this study used SF-6Dv1, derived from the SF-12, which has fewer levels for physical functioning and pain dimensions compared with EQ-5D-5L. SF-6Dv2,28 published in 2020, introduced methodological updates, but as the RCTs in this study were conducted between 2017 and 2020, SF-6Dv1 was the most current version available. Future studies should explore the differences between EQ-5D-5L and SF-6Dv2 to better assess health-related quality of life in patients with chronic neck pain.

Considering the correlations between functional outcomes, pain and improvements due to treatment, EQ-5D-5L appears to be a viable option for health economic evaluation in chronic neck pain. However, this study focused on patients with severe and chronic neck pain lasting more than 3 months and with NRS scores above 5, so caution is needed when applying these findings to mild neck pain. For patients with mild pain, the potential ceiling effect of EQ-5D-5L may need to be taken into account, as it may limit the detection of subtle changes in utility at baseline, possibly leading to a slight underestimation of treatment effects.

Conclusion

In conclusion, this study demonstrates that both EQ-5D-5L and SF-6Dv1 offer unique insights into health-related quality of life in patients with chronic neck pain. While both instruments are sensitive to functional improvements, EQ-5D-5L displayed a slightly stronger correlation with functional outcomes, suggesting that it may be particularly useful in economic evaluations focused on interventions that enhance physical function. However, the observed differences in properties between EQ-5D-5L and SF-6Dv1, including the potential ceiling effect in EQ-5D-5L, underscore the importance of carefully considering instrument selection based on specific disease characteristics and patient profiles.

The findings from this study offer a valuable foundation for researchers selecting preference-based utility measures in health economic analyses for musculoskeletal diseases impacting HRQOL, such as chronic neck pain. By providing clarity on the strengths and limitations of EQ-5D-5L and SF-6Dv1, this research aids in guiding instrument choice to better capture disease-specific HRQOL aspects, ultimately supporting more informed decision-making in healthcare resource allocation. Additionally, further research comparing EQ-5D-5L and SF-6Dv2 may refine HRQOL assessment, enhancing the reliability of economic evaluations in chronic neck pain and related musculoskeletal conditions.

Data availability statement

Data are available upon reasonable request. The data presented in this study are available on request from the corresponding author. The data generated during the study are not publicly available due to privacy/ethical restrictions.

Ethics statementsPatient consent for publicationEthics approval

This study was approved by the institutional review board (IRB) of Jaseng Hospital of Korean Medicine and Kyung Hee University Korean Medicine Hospital at Gangdong before patient enrolment (JASENG 2016-09-008, JASENG 2017-08-006, JASENG 2017-08-007, JASENG 2017-08-008, KHNMCOH 2016-10-004, JASENG 2018-04-005, JASENG 2018-05-006, JASENG 2018-05-007, JASENG 2018-05-008, KHNMCOH-2018-03-003, JASENG 2019-06-008, JASENG 2019-06-009, JASENG 2019-06-010 and JASENG 2019-06-011). Each participant completed an informed consent form after receiving information regarding the trial during the first visit. To ensure the protection of trial participant data, all investigators were trained to follow the Declaration of Helsinki, Korean Good Clinical Practice Guidelines, study protocol and standard operating procedure. This pooled analysis was conducted by coauthors who were also researchers in the three original randomised controlled trials. As the analysis used only de-identified data and aligned with the original objectives of the clinical trials, additional IRB approval was not required. Participants gave informed consent to participate in the study before taking part.

留言 (0)

沒有登入
gif