This study is reported using the standards of the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement (Collins et al., 2015). Direct response mapping was used to estimate disability weights based on WHODAS 2.0–36 items and WHODAS 2.0–12 items. In direct mapping, a mapping function (such as a regression equation) was used to predict disability weights using scores from the WHODAS 2.0 items as predictors. Such a mapping function can be applied to a new dataset to convert the source measure into the target measure based on the assumption that the associations of WHODAS 2.0 items with disability weights are generalizable (Wijnen et al., 2018).
2.1 DataThe “household mode” of the MCSS was administered in nationally representative samples (male and female adults aged above 18 years, non-institutionalized and living in private households) in the following 14 countries: China, Colombia, Egypt, Georgia, Indonesia, India, Iran, Lebanon, Mexico, Nigeria, Singapore, Slovakia, Syria, and Turkey. These countries had a combined total of 92,006 observations, with sample sizes varying from n = 1,183 (Slovakia) to n = 9,994 (Indonesia) (Üstün et al., 2001). The MCSS-survey included only 19 of the 36 items that comprise the WHODAS 2.0 full version and eight items of the WHODAS 2.0 short version since the MCSS health state description section had to include several additional items given the scope of that study (see appendix 1 for a list of the included items of the WHODAS 2.0 in the MCSS survey). Hence, the mapping was performed using both these 19 as well as these eight items as predictors for disability weights. Furthermore, the MCSS survey data included the demographic variables age, gender, marital status, educational level, and work status.
The MCSS-data were used to estimate disability weights using a health state valuation function based on a set of six core domain levels (i.e. mobility, self-care, usual activities, pain, affect, cognition), rated on a 1 to 5 Likert scale in which 1 indicated “No difficulty” and 5 indicated “Extreme difficulty/cannot do” (Murray & Evans, 2003a, 2003b). Respondents were provided descriptions of hypothetical health states along a set of core domains and asked to evaluate these health states using the visual analogue scale (VAS). Next, 18 different regression models that varied in such characteristics as the level of interactions considered were estimated in which VAS-scores were related to vignette-adjusted levels on six domains (Murray & Evans, 2003a, 2003b). Corresponding VAS-results were adjusted using a scale distortion parameter that was based on the multi-method exercises included in the MCSS (i.e. time trade-off, standard gamble, and person trade-off method) to adjust for end-aversion bias in the visual analogue scale (Murray & Evans, 2003a, 2003b). This resulted in the following transformation in which raw VAS-scores were transformed into adjusted VAS-scores, representing the health state valuation:which was taken from Murray & Evans, (2003a, 2003b) and where the constant 0.64 was determined to be the optimal value for the scale distortion parameter.
Out of the 18 available regression models estimating raw VAS-scores based on the six core domains reported in (Murray & Evans, 2003a, 2003b), we applied the main effects model based on the assumption of a normal distribution for reasons of transparency and interpretability. This decision was justified based on evidence that there was only minimal impact of adding interaction terms to the health state valuation function (Murray & Evans, 2003a, 2003b). Moreover, regression models with interactions contained large coefficients with wide confidence intervals that might have been due to overfitting (Sayak, 2018). Lastly, the model assuming the normal distribution performed better in the mid-range of observed VAS values (Murray & Evans, 2003a, 2003b), which is the range most relevant to the clinical populations to which we intend the mapping functions to be applied. As the health state valuation function was based on six core domains (see above) we only included patients who completed all six questions, and hence, for whom it was possible to estimate a raw VAS-score. Adjusted VAS-scores, rescaled to disability weights, were then used as dependent outcome (see Figure 1).
Schematic overview of the derivation of disability weight
2.2 Analysis The mapping algorithm was constructed using a machine learning approach (Wiens & Shenoy, 2018). Hence, the best performing mapping function/algorithm was constructed using the following sequential steps: 1)Selecting predictors. Models were run using information on WHODAS 2.0–36 and WHODAS 2.0–12 and/or demographics and/or individual countries as predictors, resulting in the following 10 sets of predictors for which the ability to predict disability weight was compared: (1) all available individual WHODAS 2.0–36 items; (2) all available individual WHODAS 2.0–36 items and demographics; (3) WHODAS 2.0–36 domain scores; (4) WHODAS 2.0–36 domain scores and demographics; (5) all available WHODAS 2.0–36 items with demographics and country dummies; (6) all available WHODAS 2.0–36 items with demographics and country dummies and country interactions (with all other variables). As the WHODAS 2.0–12 does not use domain scores, only the models considering individual items were included: (7) all available individual WHODAS 2.0–12 items; (8) all available individual WHODAS 2.0–12 items and demographics; (9) all available WHODAS 2.0–12 items with demographics and country dummies; (10) all available WHODAS 2.0–12 items with demographics and country dummies and country interactions (with all other variables).
2)Splitting the data in a training and test set. The dataset was split into a training set used for model selection and a test (hold-out) set used for model assessment by selecting a random 75% of the data for training and the remaining 25% of the data for testing for each country in the dataset.
3)Data preparation. Before model fitting, missing data were imputed. Missing demographic information was imputed using the median (for continuous variables) or a label “missing” (for categorical variables). In line with the Manual for WHO Disability Assessment Schedule, missing items were imputed using the mean of the other items within the same domain (Üstün et al., 2010). If not available, missing items were imputed using the participants' mean score on all available domains. If no domains were available for a participant, column means are used for imputation per missing item. (See below for information on frequency of missingness.). In the Singaporean questionnaire, age was asked in a categorical way. Hence, this age variable was converted into a numeric variable by imputing the median age for each category.
4)Fitting various statistical learning models (i.e. mapping algorithms) on the training set. To maximize interpretability, linear regression (i.e. ordinary least squares) and least absolute shrinkage and selection operator (LASSO) regression were used as statistical learning methods. LASSO augments the linear regression approach of minimizing the sum of squared errors with a penalty term (i.e. lambda or shrinkage parameter) proportional to the size of each (absolute) standardized beta, which has the effect of excluding variables from the model, and thus leading to a simpler model (James et al., 2013). The hyperparameter for the LASSO regressions, being the size of the penalty term, was tuned by means of a grid search going from practically no (lambda = 10−3) to a large penalty (lambda = 103). To optimize the fit of the various models while preventing overfitting of the training set, 10-fold cross-validation was used. The root mean squared error (RMSE) and R-squared were considered for each model and RMSE was used to determine the best performing model. The RMSE represents the standard deviation of the prediction errors, such that the lower the RMSE, the better the model fit. The R-squared expresses how much variance is explained by the model relative to how much variance there is to explain (Field, 2013). The higher the R-squared, the better the model fit. Analyses were done in R (4.0.3), a statistical programming language (Chambers, 2008). The caret package was used for the machine learning analyses, including cross-validation and hyperparameter tuning (Kuhn, 2012).
5)Evaluating the model on the test set. For both WHODAS 2.0–36 and WHODAS 2.0–12, the models with the best cross-validated performance on the training set with and without country as predictor (i.e. to obtain both a generic and country-specific mapping algorithm) were assessed by evaluating the performance on the test set (after applying the same rule-based data preparation steps to the test set). Moreover, to provide an estimation on how well the model generalizes to other countries not included in this study, an analysis was performed in which a model was trained on 13 out of the 14 country-specific datasets with the remaining 14th country-specific dataset serving as a test set. Model training on the 13 countries was done using leave-one-country-out cross-validation.
The syntax of the analyses is available upon request at the corresponding author.
2.3 Sensitivity analysesModel performance on the test set was determined using alternative strategies for handling missing data: (1) by considering only records with no missing data; (2) by imputing missing WHODAS domain scores using the mean of those domain scores for other respondents, instead of the mean of other domain scores within the same respondent; and; (3) by imputation using the k nearest neighbours (kNN) algorithm as implemented in the caret package (using the default of five nearest neighbours). kNN imputation imputes missing data using the (non-missing) values from the k closest neighbours to the observation with missing data. In addition, models were estimated for each country individually.
留言 (0)