COPSOQ III in Germany: validation of a standard instrument to measure psychosocial factors at work

Non-statistical checks and sample characteristics

It is very important to know to which extent a questionnaire is really measuring what it is supposed to measure. Content validity is not necessarily a matter of statistics, but of the certitude that items and scales really cover their subjects and, of course, that the selection of subjects is wise. The selection of the 32 international core items ensuring completeness and relevance of subjects, was obviously the international network’s task. The steering committee and all network members discussed the substantial and statistical values of all items and scales with this goal [1]: In a first Delphi-like phase beginning in 2013, the steering committee asked network members and external researchers to make propositions on favoured subjects plus related scales and items, to comment the emerging collection, to make new propositions, etc.. A preliminary questionnaire was then introduced at the network conference in Paris in 2015. This phase was followed by empirical testing until 2017. Last decisions and agreements were made, and finally presented in 2019 as COPSOQ III at a conference in Santiago de Chile.

This process makes clear that content validity of COPSOQ III is predominantly assured by literature and expert knowledge, not by studies on potential survey participants. It is the international network’s principle that core items and any national additions shall be based on tried and tested, and thus already validated questions. Additional file 1 shows that for the German questionnaire this specification was followed. Thus, there was no need to check for consistency by comparing results of questions with any results of questions that have an identical content in the same survey.

The criterion of stability in terms of test-retest-reliability would usually mean to count how often a respondent would give the identical answers in a certain period of time. This check was excluded, first for practical reasons: It would have been time-consuming, expensive, and difficult to explain in a company setting that the participants should fill in the questionnaire many times in short intervals. But second, it seemed unnecessary, as almost all items and scales had been originally tested before their inclusion in earlier versions of the COPSOQ. Stability in terms of inter-rater-reliability was not checked, as this criterion simply does not apply to an instrument bound to self-observation.

The degree to which results are independent of the way the questionnaire is used can be defined as its objectivity. In this sense, the process of asking questions, giving answers, and data analysis is highly objective. First, the questionnaire is always filled in by the participants themselves – there is no interviewer who could influence their answers. Second, the surveys in all companies follow the same fixed scheme, and third, procedures to transform answers into statistical data are predefined and invariant.

We regard the criteria of usability as being met, since contact persons from companies where surveys were conducted consistently confirm that the surveys run smoothly. Privacy standards are according to the General Data Protection Regulation (GDPR) of the EU. The online version is easy to find and to access, the paper-pencil-version can be distributed and returned reliably per mail. In a text-field asking for practicability included in all German surveys, questions and answers are usually said to be simple and easy to understand. In the sample analysed, the average time to fill in the questionnaire online was 24 min (median 20 min). The average participation rate was 61.4% (median 62.2%). Unfortunately, there is no information available to distinguish between those who took part in the surveys and those who refused.

Table 1 gives an overview of the sample. From a socio-demographic perspective 48.3% of the participants in the sample were female and 51.7% male – a third response option was included starting in 2019 and therefore not yet offered in most of the included surveys, so these results are not displayed. These values are quite close to the official numbers for the working population Germany (46.6% female, 53.4% male) [18]. Concerning age, the group up to 24 years was the smallest one comprising 6.2%. The three groups with an age from 25 to 54 years ranged in size between 20 and 30%, while the group 55 years and older encompass 17.7%. In comparison with official numbers, the oldest age group is underrepresented by 5.8%, while all other age-groups showed smaller differences of 2 to 4%.

Table 1 Study sample: socio-demographic and occupational characteristics

A view on work contracts and working hours indicated that with 87.0% a vast majority of participants had permanent contracts and 73.9% were working full time. These rates are quite similar to the official rates of 91.1 and 70.2%. Working in the evening or at night was typical for 26.1% of the participants, and 44.1% worked on bank holidays or weekends. A view of the hierarchical position showed that 19.8% worked as supervisors. Regarding occupational areas according to KldB 2010, “Business organisation, accounting, law and administration” was the largest sector with 20.9%. This is probably because it is aggregating administrative work in public sector, as well as in companies. Almost of the same size are the sectors “Health care, the social sector, teaching and education” with 19.5% and “Production of raw materials and goods, and manufacturing” with 19.3%. The latter represents Germany’s industrial tradition. In accordance, “Agriculture, forestry, farming, and gardening” played a minor role with 2.1%.

Descriptive analysis, correlations, and explorative factor analysis of scales

All 31 scales of the questionnaire are presented in Table 2. For each scale mean, standard deviation, and fractions with ceiling, floor, and missing values were calculated to check for sensitivity and variation. Covering the years 2015–2020, around 250,000 cases were available to analyse 25 of the 31 scales. The other 6 scales were integrated step-by-step until 2017, therefore the case numbers were necessarily lower. In a complete-case perspective, meaning to count only cases with no missing value, the total number of cases was 134,896. In a single scale perspective, the average rate of missing values was 3.6% with a range of 0.7–8.4%, while 22 out of 31 scales showed less than 5% missing values.

Table 2 Descriptive statistics and reliability of scales

The mean values of the German COPSOQ III version’s scales varied from 20.0 points for “Intention to leave Profession / Job” to 77.0 points for “Sense of Community”. The standard deviations of all scales reached from a minimum of 16.9 points to a maximum of 28.4 points. These values cannot be interpreted in a normative way. Of course, it would be favourable for a company, for example, if few persons were intending to leave (low is positive), and many enjoy working with each other (high is positive). But COPSOQ guidelines are not fixated upon any cut-off values, however legitimated, to be “the true” values. Thus, the really important information is that even the lowest and the highest mean values are in a distance of at least 20 points from 0 / 100 as the extreme ends of the possible value range. Floor effects – here defined as the percentage of answers coded zero – ranged between 0.2–48.2%. There were 5 scales with 20% and more on this category (“Insecurity over Working Conditions”, “Dissolution”, “Job Insecurity”, “Intention to leave Profession / Job”, “Unfair Treatment”), while 16 scales had less than 5% answers to this extreme. Ceiling effects – defined as the percentage of answers coded 100 – ranged between 0.4–25.8%. There were 2 scales exceeding 20% (“Sense of Community”, “Meaning of Work”), while 18 scales showed less than 5% answers on this extreme category.

Cronbach’s α and the intraclass-coefficient ICC were calculated to assess reliability and homogeneity for multiitem scales. There is a broad consensus that a value of α ≥ 0.7 shows acceptable reliability, and a value for ICC ≥ 0.5 is an indicator for an acceptable degree of congruence. In total, 28 scales showed a good or even very good reliability in relation to Cronbach’s α and another 24 scales showed satisfying or even good ICC. There were only three scales with low α and also low ICC: “Dissolution”, “Degrees of Freedom”, and “Feedback”.

It is understood to be a sign of good psychometric quality, when the relations between items of the same scale are close (but not too close). However, it is the opposite way with relations between scales. It is important to know, to what degree different scales represent different work factors. In Additional file 2 the internal validity and distinctiveness of scales in terms of correlation coefficients (Pearson’s r) is presented. Usually if r is lower than 0.1 the correlation is said to be negligible. Values up to |0.29| are said to stand for a weak correlation, while up to |0.49|, and |0.5| and more are interpreted as moderate and strong correlations, respectively. In this sense, out of a total of 465 correlations, r was weak in 318 cases (68.4%), moderate in 125 cases (26.9%), and strong in 22 cases (4.7%) with 0.64 as the highest value.

The strong correlations among work factors and effects are not difficult to explain. “Burnout Symptoms” could e. g. be recognised as health aspects and are as such tied to “General Health”. High ratings on “Quantitative Demands” can e. g., often mean having to work overtime creating difficulties balancing work and free time, or in other words, “Work Privacy Conflicts”. By asking for typical aspects of leadership, “Trust and Justice”, “Recognition”, “Quality of Leadership”, “Support at Work”, and the “Predictability of Work” are linked with each other. With regard to content, the question of how conflicts are solved by a superior (item in “Quality of Leadership”) is e. g., closely related to the question if conflicts are resolved in a fair way by the management (part of “Trust and Justice”). The extent that a superior is good at work planning (item in “Quality of Leadership”) is e. g., in part a question if all information needed to do the work well is received (also a question in “Predictability of Work”).

Explorative factor analysis (EFA) is an appropriate means to check statistical relations for a multitude of scales. The Tables 3 and 4 show the results of two EFA (extraction method: principal component analysis; rotation method: varimax with Kaiser normalization; eigenvalue of at least 1 as criterion) treating work factors and effects separately in accordance with the generalised model of cause and effect [11]. In the tables all factor loadings lower than |0.4| are hidden for better readability.

Table 3 EFA on psychosocial work factors: rotated factor matrixTable 4 EFA on effects: rotated factor matrix

In Tables 3 components were extracted out of the 24 psychosocial work factors with the sum of squared loadings explaining 56.2% of the total variance. In Table 4 it can be seen that out of 7 scales on effects, 2 components were extracted, covering 61.3% of the total variance. These results were satisfactory, as widespread rules of thumb claim that an acceptable model should explain at least half of the total variance and the proportion of scales to factors extracted should be no less than 3:1.

Table 5 Regression models on satisfaction and health effects

In Table 3, the factors numbered 1–3 combined a larger number of scales than factors 4 and 5. Component 1 showed high loadings for “Meaning of Work”, “Commitment to Workplace”, “Possibilities for Development”, but also a weaker loading for “Influence at Work” and could therefore be called “Influence and Possibilities for Development” in terms of dimensions in Fig. 1. Factor 3 strongly connected “Support at Work”, “Sense of Community”, “Quality of Leadership”, and “Feedback”, and could represent the dimension of “Social Relations and Leadership”. Obviously, there is a certain fuzziness between component 1 and 3 as “Quality of Leadership” and “Trust and Justice” are loading on both components.

In this perspective the clear correspondence of factors 2 and 4 with Fig. 1 is to be highlighted. Component 2 combined “Demands” as “Emotional Demands”, “Work Privacy Conflicts”, “Quantitative Demands”, “Hiding Emotions”, and “Dissolution”, while component 4 represented the “Additional Factors” as there are “Insecurity over Working Conditions”, “Job Insecurity”, and “Work Environment / Physical Demands”. Factor 5 finally seemed to connect scales of different dimensions, belonging either to the dimensions of “Influence” or “Social Relations” in the a priori model.

A high degree of distinctiveness is found among the 7 scales of effects. In Table 4 it can be seen that all scales loaded high on one of the two explored factors. Factor 1 stands for (dis-)satisfaction with working conditions combining “Work Engagement”, “Job Satisfaction”, and the “Intention to leave Profession / Job”. Factor 2 indicated health status in relation to work with “Presenteeism”, “Inability to Relax”, “Burnout Symptoms”, and, with a weaker tie to work, “General Health”.

Regression models und group analysis

How much incongruence psychological models and theories may ever show, they all draw a distinction between causes and effects which are in so far related to each other as the first will shape the latter. This general consideration is important for everyone wanting to influence satisfaction or health state by applying improvement or preventive strategies in workplaces. If a given situation can be understood as a reaction (effect) of a specific kind of working conditions (causes), this will help to identify effective starting points for intervention measures. This idea leads to a statistical analysis of relationships between scales by means of linear regression. Table 5 illustrates the results of 7 multiple linear regression models (variables included stepwise). The satisfaction and health scales are each defined as outcome variables to be predicted by the 24 work factors plus gender and age group as independent variables. Because of the large number of scales, the results are presented in a compressed manner. At first, the table sums up the variance explained (model fit, determination coefficient R2) by a model including all statistically significant independent variables (out of 24 workplace factors plus age group and gender). The most relevant, top five workplace factors (plus gender or age group if included in the model among the first five workplace factors) are shown. Further workplace factors are not shown, as statistical significance alone is not a helpful criterium because it will emerge for small effects due to the mere size of the sample.

First, “Job Satisfaction” (R2 = max. 0.60 which means 60% of its variance is explained by the model) was predicted much better than all other effects. Next was “Work Engagement” (R2 = max. 0.44), followed by “Burnout Symptoms” (R2 = max. 0.37) and the “Intention to leave Profession / Job” (R2 = max. 0.34) with moderate explained variances. “General Health” (R2 = max. 0.22), “Presenteeism” (R2 = max. 0.19) and the “Inability to Relax” (R2 = max. 0.19) had relatively low values.

There were 17 out of 24 possible scales included in at least one of the models as predictors. Gender and age group were included in some of them, too. The most frequent independent factors in the models were “Work Privacy Conflicts” and “Unfair Treatment”, playing their roles in 7 and 5 models, respectively. All other factors emerged in up to 4 of the models. Furthermore, the scale on “Work Privacy Conflicts” was on the first rank in all models of health. The models on satisfaction tended to depend on scales that form the dimension “Influence and Possibilities”, like “Commitment to Workplace” or “Meaning of Work”, and hardly on scales from “Additional Factors” or on gender or age group. Just the other way round, “Insecurity over Working Conditions” and “Job Insecurity”, gender, and age group frequently appear in models on health. The newly included scale on “Dissolution” contributed to predict the “Inability to Relax”, which is deemed plausible.

The inference from causes to effects leads to the question if COPSOQ-scales can identify general types of working conditions. Diagnosticity / sensitivity was checked by examining the degree, to which well-known occupational groups with fixed activity patterns and thus “stress profiles” could be mapped. For this purpose, exemplary analyses of variance (ANOVA) were carried out for “Emotional Demands” and “Quality of Leadership”. The variance of these two scales shall be explained by occupational areas after KldB 2010 with eta2 as the measure of discrimination. Figure 2 depicts the mean values of the two scales for 9 occupational areas, sorted by the mean values for Emotional Demands.

Fig. 2figure2

Emotional Demands vs. Quality of Leadership by occupational areas

As a matter of fact, different occupational groups face different “Emotional Demands” (total mean = 47.7; STD = 27.9). The rounded mean for working in agriculture, forestry, farming and gardening was 38 points. Commercial services, trading, sales, the hotel business and tourism is in the middle with 45 points, and health care, the social sector, teaching and education had to face 69 points. This is a span of 31 points between minimum and maximum values, and variance explained by occupational group is 15% (eta2 = 0.15), while for “Quality of Leadership” (total mean = 52.9; STD = 25.4), it is 1% (eta2 = 0.01). The range of mean values is narrow for this scale with a span of only 9 points between construction, architecture, surveying, and technical building services with 48 points (rounded) and traffic, logistics, safety and security, and commercial services, trading, sales, the hotel business and tourism with 57 points.

留言 (0)

沒有登入
gif