A consecutive sample of patients (n = 258) recruited from the Department of Orthopaedic Surgery at the Singapore General Hospital between August and December 2005 completed the questionnaires. All patients were diagnosed with knee OA by their attending physicians based on clinical and radiographic features. Each subject was interviewed by a trained interviewer using the WOMAC and the EQ-5D. The Institutional Review Board of the hospital approved this study.
InstrumentsThe WOMAC, a 24-item disease-specific functioning measurement, consists of three domains, namely, pain (5 items), stiffness (2 items), and physical function (17 items). Each of these 24 items is graded either on a five-point Likert scale or on a 100-mm visual analog scale 6, 36. In this study, we used the Likert scale WOMAC (version LK 3.0). Items are scored from 0 to 4 (i.e., no, mild, moderate, severe, and extreme problems). Domain scores are calculated by summing constituent item scores (i.e., pain score ranges from 0 to 20, stiffness from 0 to 8, and physical function from 0 to 68). Total score is calculated by summing the three domain scores (range 0–96), with higher scores reflecting worse pain, stiffness, and physical function.
The EQ-5D measures HRQoL using a self-classifier. The self-classifier consists of a five-item descriptive system and assesses health status in the domains of mobility, self-care, usual activities, pain/discomfort, and anxiety/depression 37. Each item has three response levels (i.e., no problems, some problems, and extreme problems). Its psychometric properties have been established in patients with OA 30-35.
Statistical AnalysisFirst, regression analyses using methods suitable for health utility data, which are often not normally distributed 30, 35 and has a ceiling value of 1.0, were conducted. Regression models fitted using ordinary least squares (OLS) are consistent regardless of distribution of outcome measures and have been used in previous studies 23, 28, 29. Nevertheless, some researchers prefer the censored least absolute deviations (CLAD) estimator to the OLS based on the argument that a CLAD model accounts for ceiling values 22, 26, 38. The present study used both the OLS and CLAD estimators.
Second, in the regression analyses, several alternative representations of the WOMAC were considered as explanatory variables: 1) WOMAC total score; 2) WOMAC domain scores (i.e., pain, stiffness, and physical function); 3) WOMAC domain scores plus pair-wise interaction terms (i.e., pain × stiffness, pain × function, stiffness × function, pain × pain, stiffness × stiffness, and function × function) to account for possible nonlinearities; and 4) WOMAC individual item scores with stepwise model selection method in the OLS model, but not in the CLAD model. The reasons for including or excluding demographics in regression models varied across the published studies. To maintain the consistency with other published studies mapping disease-specific instruments to the EQ-5D 22, 23, 26, demographics were not included in the analysis. Nevertheless, the impact of including demographics on predicting utilities is an important area to be explored in future studies. The outcome variable was EQ-5D score calculated using the Japanese scoring algorithm 39.
A number of criteria were used to examine the goodness of fit of each model 29. Mean absolute error (MAE) is the average of the absolute difference between observed and predicted values. In the present study, MAE was identified as the primary criterion for goodness of fit as it is an easily and a directly interpretable measure. We also reported the root mean squared error (RMSE), the positive square root of the average squared prediction error. In contrast to MAE, RMSE attaches greater weight to larger errors. To account for variability in these goodness-of-fit diagnostics, an iterative random sampling procedure proposed by Grootendorst et al. 29 was used. Specifically, the whole sample was randomly split into two groups, one for estimation and the other for validation. The estimation sample was used to fit each candidate model, and the validation sample was used to obtain the MAE and RMSE. This process was repeated 500 times, each time, with a random split, generating 500 MAEs and RMSEs. Mean MAE, mean RMSE, and corresponding 95% confidence intervals (CIs) were calculated. The lower the mean MAE and RMSE, the better the goodness of fit of a model. The preferred model was the one with the best goodness of fit. We presented one random split as an illustration.
Finally, the coefficients of the preferred model were determined using the whole study sample. The precision of this preferred model was examined at two levels. At the individual level, the prediction error was computed using the difference between observed and predicted EQ-5D scores for each of the 258 patients. At the group level, the prediction error was estimated by applying a nonparametric bootstrapping with replacement method 29. Specifically, various group sizes of patients (n = 50, 100, 200, and 400) were randomly sampled. For example, a patient was randomly chosen from the original data set and his/her predicted EQ-5D score and prediction error were recorded. This patient was then placed back into the data set (hence the term “with replacement”). This process was repeated until the sample size of each group (i.e., n = 50, 100, 200, and 400) was reached. For each group, mean predicted EQ-5D scores and mean prediction error were calculated, which formed one bootstrapping replicate. By repeating the above-mentioned process 5000 times, we generated a distribution for the group mean predicted EQ-5D scores and corresponding group mean prediction errors for each of the groups. The 2.5th and 97.5th percentiles of the distribution were therefore used to estimate the 95% CI for the prediction error.
All statistical tests were two sided and conducted at a significance level of 5%. Data were analyzed using R version 2.4.1 (the R Development Core Team).
留言 (0)