Initial evaluation of measurement properties of the Work Environment Impact Questionnaire (WEIQ) - using Rasch analysis

Data collection and respondents

The data were collected through convenience sampling in three work settings, representing different kinds of work environments with different work tasks. The data collection was conducted in the following three settings: among assisted living personnel at a public service organisation (group 1, in short, assisted living personnel); vocational rehabilitation personnel at public service health organisations (group 2, in short, rehabilitation personnel); and personnel at a governmental research institute (group 3, in short, researchers). The main work tasks for the assisted living personnel (group 1) are to support the individual in their home with personal hygiene and daily chores such as cleaning, shopping, and cooking. Rehabilitation personnel (group 2) mainly operate within rehabilitation settings, where their primary responsibilities involve assessing work ability and implementing rehabilitative interventions for individuals with injuries or illnesses, aiming to assist them in restoring or sustaining their work ability. The researchers (group 3) were mainly in offices with computer tasks and meetings both in person and online, but could also include external activities.

Group 1 answered the WEIQ by paper and pencil and used a return envelope at their workplace. Group 2 answered the WEIQ either by paper and pencil or online in connection with their participation in a customized course focusing on assessment of work ability, and group 3 answered the WEIQ online after requesting participation via e-mail. All prospective respondents received an information letter with information about the study, explaining that it was voluntary to take part in the study and that their individual answers would be unidentified after completing the WEIQ. All study procedures performed were in accordance with the ethical standards of the 1964 Helsinki declaration and its later amendments. Informed consent was obtained from all individual participants, who completed either the paper-pencil or the online questionnaire after reading the information letter. The persons’ return of the questionnaires was regarded as informed consent, and persons who did not want to participate in the study simply did not answer the WEIQ. No personally identifiable information or code was asked for in the questionnaire, which made the collected data anonymous, i.e., the answers to a questionnaire could not be derived to any person.

In total, WEIQ responses from 288 respondents were received and included, of which 221 (77%) were women. The number of responses from each group were as follows: group 1, n = 81 (77% women); group 2, n = 125 (88% women); and group 3, n = 82 (60% women).

Measurement

For each of the 33 items in the WEIQ, which assesses self-perceived work ability, a 4-point Likert scale is used. The respondents are asked to rate their degree of satisfaction, where 0 corresponds to the rating dissatisfied, 1 corresponds to the rating partly dissatisfied, 2 corresponds to satisfied and 3 corresponds to the rating of very satisfied.

As satisfaction is rated, one might be tempted to think that the coupling person-item attributes are leniency and quality [27]. However, the respondent’s ratings of satisfaction reflect the percived demands in relation to his or her ability, i.e., the person-environment fit. Thus, the latent construct in WEIQ is associated to the persons’ abilities, and corresponding item attribute tasks difficulties.

Data analysis

The Rasch model enables separate measures for the individual (i.e., the person ability, θ-value) and the item (i.e., the task difficulty, δ-value) on a conjoint interval scale corresponding to the measurement continuum of self-perceived work ability. With the Rasch analysis, one assesses whether requirements for internal validity and for invariance are met by examining the extent to which observed data accord with the expected values defined by the measurement model. Both statistical and graphical tests are used for eventual differences between observed data and expected values [28] and must be considered as an iterative process with the theoretical underpinnings of the construct purported as measured. The analyses of WEIQ were conducted in RUMM2030 and structured around three central questions outlined by Hobart & Cano [23]:

Is the scale-to-sample targeting adequate for making judgements about the performance of the scale and the measurement of people?

The Rasch analysis provides items and persons hierarchically ordered according to their relative difficulty (i.e., task difficulty, δ-values) and relative ability (i.e., person ability, θ-values) on the same interval continuum of logit. Thus, the item-person value distributions were examined individually as well as in relation to each other, both numerically and graphically. The better the person’s values match the item values, the greater the potential for a precise measurement [23].

Has a measurement ruler been successfully constructed?

To examine the extent to which observed data accord with the expected values, several tests are required, and these tests are considered in an iterative process. First, in a polytomous scale, monotonicity of items is expected, i.e., the thresholds should be sequentially ordered [29]. If disordered thresholds occur, collapsing categories could solve this [24]. Likewise, collapsing categories can help if there are very few respondents using one response option. Second, item values should provide a meaningful story of what it means going from lower to higher difficulty. As there is no ordinal theory underpinning the items in the WEIQ, the item values were judged according to their clinically logical order and linked to relevant known qualitative aspects. Third, how well the items statistically fit the model was assessed by Fit Residuals, chi-square, and Item characteristic curve (ICC). Ideally, the individual item fit residuals should be between − 2.5 and + 2.5; the chi-square values should not be statistically significant; and the dots of the class intervals should follow the ICC to support good fit [23]. Fourth, local dependency (LD) were evaluated by comparing item fit residual correlations against a relative cut off, that is residual correlations greater than 0.20 above the average correlations indicate local dependency [30]. To handle LD, testlests of sub sets of items were created [31]. Fifth, to ensure invariance across groups, it is crucial that item estimates do not differ between different groups. Thus, tests for Differential Item Functioning (DIF) were statistically evaluated between the three work settings, followed by stepwise item splits and repeated analyses where DIF were present [32]. Due to multiple tests, Bonferroni correction was applied. When DIF was present, this was also assessed in qualitative terms to provide further clinical justification for required splits. Last, unidimensionality was tested according to the Smiths method [33], i.e., the patterning of the first factor in the principal component analysis (PCA) of residuals was used to define two subsets of items, i.e., both positively and negatively correlated items. Subsequently, person ability and θ-values were estimated for each subtest and compared by using an independent t-test. To support unidimensionality, it is recommended that the proportion of persons outside ± 1.96 should not exceed 5%.

Are the people in the sample is measured successfully?

To assess if the persons are successfully measured item-person distributions, reliability and person fit residuals were evaluated. The mean person values indicate whether the sample is centred or off centred on the items. Skewed person values imply less measurement precision. The person-separation index (PSI) is a reliability indicator where 0 implies all error and 1 implies no error and should ideally be over 0.8 (i.e., corresponding to a separation ratio of G = 2) [34], which implies that the measurement uncertainty is not larger than half the object standard deviation [35]. Lower person fit residuals indicate that the respondent’s responses are characterised by low variability, while higher person fit residuals indicate that the respondent’s responses are characterised by irregular high ratings on difficult items and irregular low ratings on easy items. Ideally, for individual reliable assessments, the person fit residual should lie within − 2.5 to + 2.5 [23].

留言 (0)

沒有登入
gif