Selecting PedsQL items to derive the PedsUtil health state classification system to measure health utilities in children

The PedsQL

The PedsQL 4.0 Generic Core Scales is a validated instrument that assesses HRQoL across four dimensions: 1) Physical Functioning (8 items), 2) Emotional Functioning (5 items), 3) Social Functioning (5 items), and 4) School Functioning (3–5 items depending on age group) (Appendix Table 1) [15, 16]. Both child self-report (5–18 years) and parent proxy-report (2–18 years) versions are available. The items in the different versions are very similar and differ only in developmentally appropriate vocabulary and first- or third-person tense. For each item, respondents are asked to choose from a series of five severity levels: 0 = Never, 1 = Almost never, 2 = Sometimes, 3 = Often, 4 = Almost always. Level responses are converted to non-preference-based HRQoL scores and can be reported in terms of domain scores, a Physical Health Summary Score, Psychosocial Health Summary Score, and overall Total Score [16].

Overview of analysis

With 23 items, each ranging five severity levels from “Never” to “Almost always”, the PedsQL defines 523 unique health states. It is necessary to reduce the length of the PedsQL to construct a HSCS that is feasible for preference valuation methods. One useful technique to help inform which items to include or exclude from a HSCS is Rasch analysis [23]. Rasch analysis can be used to evaluate measurement functioning and psychometric properties of existing instruments by providing empirical evidence on how well items in a dimension measure the construct of interest (e.g., physical functioning) [24, 25]. In this study, a two-step process was used to select items to include in each dimension of the PedsUtil HSCS (Fig. 1). The first step was to exclude any poorly functioning items in each dimension by examining various Rasch criteria. The second step was to then select a single item to represent each dimension among the remaining items based on Rasch and other psychometric criteria, as well as input from child health experts and parents. This study was granted an exempt determination by the University of Michigan Institutional Review Board (IRBMED # HUM00182088).

Fig. 1figure 1

Steps to Constructing the PedsUtil Health State Classification System and Scoring System

Data source

All secondary analyses were conducted using data from the Longitudinal Study of Australian Children (LSAC), a national-level population-representative study that collects data from 10,000 children and families every two years [26]. The LSAC delivers a comprehensive dataset on the development of children over time and is one of the very few large-scale nationally representative studies of children in the world. A nationally representative sample, which includes a wide spectrum of healthy and unwell children, was used for data analysis to ensure that the resulting HSCS can be applied to such populations. The LSAC sampling design is detailed elsewhere [27]. The LSAC was approved by the Australian Institute of Family Studies Ethics Committee, and families provided written informed consent to participate [28].

This study used data from the first seven waves (2003–04 to 2015–16) of the LSAC (n = 45,207) (Appendix Table 2). This dataset contains fully completed responses to the parent proxy-report version of the PedsQL at each wave of data collection for the same children at different ages from 2–17 years with the exception that the LSAC only administered 19 out of 21 PedsQL items for children aged 2–3 years (the two items on school absence were omitted). Consequently, only 19 PedsQL items were included in the dataset for children aged 2–3 years. This dataset also included information on child special healthcare needs status (yes/no) defined as “a condition which has lasted or is expected to last for at least 12 months, which causes the child to use medicine prescribed by a doctor, other than vitamins, or use more medical care, mental health or educational services” [29]. Child special healthcare needs status was determined for each child using data from the last available wave since younger children are less likely to be identified as having special healthcare needs because not enough time may have passed for their symptoms to have fully manifested or been recognized.

Data analysis – confirmatory factor analysis

Prior to item selection, the dimension structure of the PedsUtil HSCS must be established. Confirmatory factor analysis was previously conducted using data from the LSAC to establish this core dimension structure; technical details of this analysis are reported elsewhere [30]. The findings from this study supported a 7-dimension structure of the PedsUtil HSCS: 1) Physical Functioning (6 items); 2) Pain (1 item); 3) Fatigue (1 item); 4) Emotional Functioning (5 items); 5) Social Functioning (5 items); 6) School Functioning (3 items); and 7) School Absence (2 items). Following dimension identification, a single item was selected to represent each dimension of the HSCS using the methods described below; single-item dimensions (i.e., Pain and Fatigue) were not empirically evaluated in the item selection process as they were already represented by one item.

Data analysis – Step 1: item exclusion

The purpose of the first step in the item selection process was to eliminate unsuitable items based on their poor psychometric performance. Data were fitted to the Rasch partial credit model to test how well the observed data meet expectations of the measurement model. If there was any misfit, adjustments were made until a well-fitting model was achieved, but items that exhibited misfit were considered for exclusion. Since Rasch models assume unidimensionality, a separate model was estimated for each multi-item dimension using RUMM2030 [31]. Analyses were stratified by age group (i.e., 2–5 years, 6–13 years, and 14–17 years) to select items that would be applicable across a wide range of ages. These specific age groupings were selected to represent the different developmental stages of children, as well as to reflect the study design of the LSAC. Three main Rasch criteria were used to assess item performance and are briefly described below. Refer to Appendix A for more details of each criterion.

Item level ordering

Item-threshold probability curves were first examined to determine if disordering was present [32]. For items that exhibited disordered thresholds, ordering of items was achieved by collapsing adjacent item response levels. If there was more than one possible combination to merge levels, the combination that demonstrated the best overall fit while also achieving a more balanced distribution across levels was selected. Disordered items were evaluated for exclusion as they failed to respond to the full range of severity across the dimension.

Differential item functioning (DIF)

Once all items were ordered, DIF by sex and child special healthcare needs status was examined since the PedsUtil HSCS needs to apply across diverse pediatric populations. Both uniform and nonuniform DIF were tested for using analysis of variance [33]. Items exhibiting DIF were separated into different person factors and the Rasch model was refit. If splitting the item did not improve model fit, the item was considered for removal from the Rasch model. Items exhibiting DIF were assessed for exclusion as they threaten construct validity and are of limited value for making cross-population comparisons.

Rasch model goodness-of-fit

After issues of disordered thresholds and DIF were resolved, overall model fit was assessed by examining the item-trait interaction statistic, reported as a \(^\) statistic. If overall model fit was poor (i.e., p-value < 0.01 with a Bonferroni correction), the fit of the individual items was examined. Items with fit residuals greater than the standard cutoff \(\pm\) 2.5 and with statistically significant individual \(^\) statistics were dropped from the model sequentially, beginning with the worst fitting item [32]. This procedure was repeated until only well-fitting items remained and the overall item-trait interaction statistic was nonsignificant. Items that were dropped from the Rasch model poorly represent the underlying dimension being measured, thus were considered for exclusion.

Robustness check

In order to enhance robustness, Rasch analysis was conducted on five subsamples of the LSAC dataset for each age group for a total of 15 subsamples. Stratified random sampling was used to obtain subsamples of approximately 500 responses, which is the recommended sample size for Rasch analysis [34]. Sampling was stratified on child sex, age, and special healthcare needs status (Appendix Table 2). Each item per age group was given a total score (out of five) indicating the number of subsamples that the item performed well on all Rasch criteria. In general, any item that performed poorly across all five subsamples in any age group (i.e., score of 0/5) or was the worst fitting item in any age group (i.e., lowest total score) was excluded from the PedsUtil HSCS.

Data analysis – Step 2: item selection

Following Step 1, a single best item was selected for each dimension among the remaining items. A range of criteria (described below) was considered for item selection.

Rasch analysis

Individual item goodness-of-fit statistics were assessed, and the item with the better fit to the Rasch model was generally considered to be the better item to represent the dimension. The spread of item thresholds was also examined. An item that covers a wider severity range was considered to better represent the dimension than an item that covers a narrow range.

Other psychometric criteria

Internal consistency (i.e., correlation of an item score and its dimension score) and floor and ceiling effects were examined. Items with low correlation were considered to not be representative of the dimension and items exhibiting large floor or ceiling effects were regarded to be poor candidates as they may poorly respond to the full severity range of the dimension. These criteria were evaluated in relative terms between items as done in previous studies [18, 19, 21] rather than applying strict thresholds.

Expert and parent opinion

Expert and parent opinion were collected to supplement Rasch and psychometric analyses as statistical analyses alone may not be able to identify the single best item for each dimension. Moreover, stakeholder engagement was used to assess content and face validity of the PedsUtil HSCS. Previous studies have similarly engaged with various stakeholders to aid in item selection [20, 35].

A US-based convenience sample of five pediatricians and one clinical trialist were recruited to provide input on item selection for all age groups, and 12 parents were recruited to provide input on item selection for each age group of their children. The clinicians included general pediatricians and specialists. The parents included parents of children with special healthcare needs (e.g., diabetes, asthma, musculoskeletal conditions, depression, anxiety, and ADHD) and of typically functioning children from ages 2–17 years (Appendix Table 3). Participants were asked to select which item best represents each dimension of the PedsUtil HSCS and to provide justifications for their choices. Refer to Appendix B for more details.

Final item selection

The research team evaluated results from all criteria to make the final decisions for item selection. The final PedsUtil HSCS was reviewed with an external health status measurement expert to ensure that the items selected were cohesive and amenable to preference valuation methods required to construct the PedsUtil scoring system.

留言 (0)

沒有登入
gif