The Design and Methodology for a Pilot Study of Home and Community-Based Services Outcome Measures

High-quality, psychometrically-sound measurement is essential to obtaining useful information about the impact of Home- and Community-Based Services (HCBS) on recipients. HCBS outcome measures are psychometrically sound when they have evidence to support the argument that data obtained from them are both reliable and valid1. Without sufficient evidence supporting the reliability and validity of measure outcomes, decisions for which they are used may not be adequately informed. In the absence of reliable, valid outcomes, federal and state agencies cannot make accurate judgements with respect to compliance with regulations. States are unable to make informed decisions about appropriate funding allocations, and provider agencies are left in the dark when addressing areas in need of quality improvement. At the individual level, HCBS recipients are left with inaccurate and potentially misleading information when it comes to making decisions about which provider(s) or service(s) to select.

The National Core Indicators In-Person Survey2 (NCI IPS), the Council for Quality and Leadership (CQL) Personal Outcome Measures3 (POMs), and Consumer Assessment of Healthcare Providers and Systems Home and Community-Based Services4 (CAHPS) are three popular approaches used to assess HCBS outcomes. The NCI IPS was built for state level assessment and monitoring and is useful for monitoring quality assurance. The CQL POMs is a well-developed and validated tool with good psychometric properties while also being relevant to a wider group of disability populations. Both the CQL POMs and the HCBS CAHPS have the advantage of being developed for use at the individual and provider level5.

A shared limitation for many existing outcome measures is that many items cannot be considered person-centered6 (i.e., responded to by the person with a disability and eliciting the extent to which their individual need or desired level of an outcome is being met. See Roberts & Abery5 for a nuanced discussion of person-centeredness.) Another shared limitation is the lack of evidence for use longitudinally or their sensitivity to measuring change5. In fact, the Human Services Research Institute which administers the NCI program did not develop the measures to be used longitudinally and explicitly state that the measures are not to be used in this manner. In addition, states have only employed cross-sectional sampling methods when administering the NCI in the past. Consequently, there is no empirical evidence supporting its longitudinal use. Moreover, the NCI is limited by being developed explicitly for state-level data and is not applicable for use at the individual or provider level.

The HCBS CAHPS, while ostensibly developed for use at the individual and provider level, lacks sufficient psychometric evidence to justify its use7. There are several results that were reported in the HCBS CAHPS final report4. Reliability coefficients fell well below recommendations of scientific acceptability, exploratory factor analyses were reported as confirmatory, factor analysis results (e.g., factor loadings) were unreported, and no external validity testing was performed despite claims to the contrary. In addition, item removal and revisions were performed after psychometric testing had concluded, so the final recommended version of the measure had no data on outcome reliability and validity.

The Rehabilitation Research and Training Center on HCBS Outcome Measurement (RTC/OM) funded by the National Institute on Disability, Independent Living, and Rehabilitation Research (NIDILRR) under the Administration on Community Living (ACL) was tasked with developing person-centered HCBS quality and outcome measures based on the National Quality Forum’s (NQF) HCBS Framework8. To accomplish this goal, the RTC/OM first validated the importance of NQF framework domains and subdomains with a national sample of HCBS recipients, family members, providers, and program administrators/policy makers using a participatory planning and decision making (PPDM) process6. This process gathered participant input on the importance of each NQF domain and subdomain. RTC/OM staff subsequently used these data in conjunction with a gap analysis to revise the NQF framework. The two changes to the framework were elevating the concept of transportation to a subdomain with the community inclusion domain, and employment from being a subdomain within the community inclusion domain to being its own domain. Subsequently, technical expert panels composed of content experts, providers, family members and people with disabilities prioritized the HCBS quality and outcome measures for development. The six measures produced from this effort focus on the following outcomes: (1) Meaningful Activity, (2) Social Connectedness, (3) Choice and Control, (4) Employment, (5) Transportation, and (6) Freedom from Abuse and Neglect (see Table 1). The measures were also designed to have features lacking in the previously-discussed measures (i.e., ability to be used longitudinally, modular in format, and free to use.)

For each measure, the team developed blueprints summarizing background literature for each construct and outlining the structure of each concept. To develop specific items, the team first drafted guiding questions for each measure. Guiding questions specify the intended inferences and supporting assumptions for each measure (i.e., questions the measure is intended to answer). These include specific inferences related to each measure domain and subdomain. Items from existing HCBS instruments were reviewed to ensure full coverage of the domains/subdomains. Based on this review, the RTC/OM team created initial items to capture the domains and subdomains identified in the blueprint for each measure concept. Following the completion of draft items, a panel of content and measurement experts provided ratings and qualitative feedback as to whether each item measured its intended domain. Measures and conceptual definitions were revised following the expert panel review. An adapted cognitive testing protocol9, 10 based on the Cognitive Aspects of Survey Methodology (CASM) model11, 12 was also completed with five members of each target population for each measure and the data were used for final measure revisions prior to the pilot study. A recent technical report13 has more information on the expert panel and cognitive testing protocols and results.

Each measure was conceptualized as having two tiers. Tier 1 (i.e., global items) consists of a small number of questions assessing broad aspects of the construct. These items are intended to gain a general sense of participants’ impressions of the concept. Tier 2 (i.e., specific items) contains granular items that provide more detailed and, in some cases, actionable information. Measures were constructed so that multiple specific questions in Tier 2 were “clustered” with a global question in Tier 1. Items in these clusters were hypothesized to have stronger relationships with each other (quantified by linear correlation) than with items outside these clusters. Broadly, these two-tiered and clustering approaches were utilized to: 1) test the validity of the constructs, 2) gather insight with respect to how specific items are related to the broad measure construct, and 3) potentially reduce the length of measures13.

The item responses to the RTC/OM measures were predominantly structured to be scalable. Scalable items have response formats that are ordered. They define the underlying construct and are collectively used as multiple indicators to “sample” a person’s level on the construct. The benefit of this approach is that combining item responses (e.g., sum scores) has a meaningful interpretation. The three scalable item formats used for the measures were a four-point, bi-polar Agreement scale (Strongly Disagree, Disagree, Agree, Strongly Agree) and two four-point, uni-polar Frequency scales (Never, Sometimes, Most of the Time, Always; None, Rarely, Some of the Time, Most of the Time). When items were not scalable (e.g., demographic) they were either rekeyed into a composite variable or excluded from psychometric analysis.

The measures were constructed as primarily positively-worded items (e.g., “I participate in activities that are meaningful to me”). There are downsides to this approach. Developing measures with only positively-worded items can mask other problematic response styles, notably acquiescence responding (always responding “positively” regardless of content). If participants engage in acquiescence responding it is difficult to distinguish them from participants whose positive responses are genuine or control for these response styles with methods such as factor analysis14. A balance of positively- and negatively-worded items would then seem ideal, but previous research has shown that negatively-worded items can cause confusion and lead to errant responding, especially for those with intellectual or developmental disabilities15, 16, 17, 18. Moreover, researchers have suggested that negatively-worded items are themselves substantively distinct constructs (not merely antonyms) or give rise to extraneous “method factors”, the latter being related to the reading ability of responders19|p.113-115.

留言 (0)

沒有登入
gif