Responsiveness of the Oswestry Disability Index and Zurich Claudication Questionnaire in patients with lumbar spinal stenosis: evaluation of surgically treated patients from the NORDSTEN study

In the present study, ODI and ZCQ showed good responsiveness for assessing clinical outcomes in patients treated surgically for LSS with or without degenerative spondylolisthesis. The internal responsiveness was good for both tested PROMs. The external responsiveness and the ability to discriminate between “success” and “non-success” were strong for follow-up and relative change scores, whereas they were moderate for absolute change scores. The 30% threshold for the ODI relative change used in the NORDSTEN trials was within the range of cut-off values with accuracy.

The two PROMs, ODI and ZCQ, are specific for spinal conditions. Since ODI was originally developed for patients with low back pain, and ZCQ was developed for LSS patients, the finding of similar and good responsiveness for both in surgically treated LSS patients is interesting.

Follow-up score, absolute change score, and relative change score for the PROMs are alternative response parameters used in evaluations. The relative change (percentage) score has been recommended to account for the influence of the baseline score on the outcome score [17,18,19,20]. In the present study, the follow-up score and the relative change score performed better than the absolute change score, which is in accordance with a previous study from The Norwegian Spine registry [20].

Internal responsiveness was good. We found a large effect size and SRM for both ODI and ZCQ, of the same magnitude as Fujimori et al.‘s investigation of patients operated for LSS at one-year follow-up [11]. In the present study, external responsiveness evaluated by Spearman correlation coefficients was higher than that reported by Fujimori et al. (coefficients around 0.50), whereas the ROC test accuracy was similar.

Contrary to a previously published article on this topic [11], we do not provide a definitive ranking of the instruments compared. Such rankings might be sensitive to random variation and misleading because one puts too much confidence in one instrument being better than others. We intentionally focus on the similarities between ODI and ZCQ, both in numerical results and concerning clinical relevance and usefulness.

Formerly published cut-off values for ODI and ZCQ defining a clinical, minimal, or substantial important difference have been calculated with various methods and based on follow-up, absolute, or relative change scores. Different external anchors have been used; however, the patient’s perceived global assessment of outcome or satisfaction (GPE-scales) is the most used. Patient response has been given on a five- or seven-point Likert scale, and the anchor has been the two or three best answer options. The calculations have been based on different study designs, such as clinical or register studies, and the time for follow-up has varied. In addition, most studies have used heterogeneous cohorts of various spine conditions. Therefore, a wide range of cut-off values have been proposed, and comparison is difficult. The present paper’s results demonstrate that a range of cut-off values gave similar results for the proportion of correctly classified patients, and these must be balanced against sensitivity and specificity.

Previous reports of the clinically important cut-off values for ODI in surgically treated LSS patients were in the range of possible cut-off values reported in the present study both for follow-up scores [20, 21] and for absolute change scores [11, 17, 20]. A report about the cut-off value for absolute change score for ZCQ symptom severity and physical function [11] seemed to be lower than in the present study. The explanation might be that they used a five-point, not seven-point, Likert scale. In reports from others, the relative change for ODI has been suggested between 10 and 40% [19, 20]. The 30% threshold for defining treatment as a clinical” success” recommended and predefined in the NORDSTEN trials, is based on a registry study with one-year follow-up [20] and a “gathering to consensus” paper [17], is within accuracy. In the registry study [20], the anchor was completely recovered/much improved. Because of the strict anchor (not including slightly improved), high sensitivity was favoured to ensure the detection of true positive “successes”. The present study showed that the 30% ODI cut-off was within the interval giving a high correct classification rate. However, the percentage of correctly classified patients would also be high using higher cut-offs than 30%. When planning future comparative studies, one should consider not only using one cut-off but also performing sensitivity analyses using different cut-offs with high accuracy. It may also be reasonable that patients’ expectations are higher during a clinical study with follow-up than when they are part of a registry (more disappointed and answering “slightly improved”). In the registry, there will also be a more heterogeneous patient population.

Strength and limitations

The present analyses were based on a large cohort of surgical patients with a high follow-up rate. International guidelines for outcome measures were followed, along with translated and validated PROMs.

There is limited consensus about the best anchor for measuring changes in disease severity by PROMs. We selected the GPE scale since it has been commonly used and recommended [18, 22]. There is advice to use a seven-point rating scale of change and setting the cut-off for clinically relevant improvements between “much improved” and “slightly improved” [19, 22]. The present paper’s GPE scale was about outcomes, but some studies have also used satisfaction [21]. Despite being commonly used, the GPE scale also has some weaknesses [16, 19, 20]; there is a possible recall bias in responding two years after the surgical intervention, the scale is domain unspecific, one does not know what kind of deterioration patients had in mind, or if other diseases were interfering, or if patients were more satisfied with care than treatment, and in addition, variation in mood may influence the patient’s response. Another critical concern is that in evaluating the PROMs (ODI, ZCQ), another subjective measurement (GPE scale) was used. In the present study, both tested PROMs correlated moderately to strongly with the GPE scale, as should be expected. Fujimori et al. found a discrepancy between the questionnaires’ improvement and the GPE scale [11]. Since they used a five-point Likert scale for GPE, it might be harder to reveal improvement. Recall bias and individual expectations of outcome may also play a role in a cultural frame.

Strictly, we did not ask the patients if the change was clinically important or a “success”. Still, we considered that the anchor answers, “completely recovered” and “much improved” were indicators of a significant improvement at follow-up. These concepts were discussed in some recent studies [23, 24].

Patients lost to follow-up represent a potential source of bias. In the present study, there was < 10% lost to follow-up, which made a high risk of bias unlikely. Also, a recent study based on data from the Norwegian Spine Registry showed that non-respondents had similar clinical outcomes [25].

These present analyses provided corroboration for the responsiveness of two commonly used outcome measurements in a large sample of patients with LSS treated surgically. Even though the responsiveness was comparable, when choosing an instrument for a study, one should remember that these PROMs were developed for different purposes. Our results for surgically treated LSS patients may not be reproduced if the patients, for instance, had some conservative treatment. ODI having as good responsiveness as ZCQ in the present study might be related to surgery preceding the observed change in scores for the included patients. Furthermore, ZCQ focuses on all symptoms in lower limbs and walking trouble, whereas ODI measures the influence of back and leg pain on daily life function.

留言 (0)

沒有登入
gif