The Egyptian EQ-5D-5L Extensive Pilot Study: Lessons Learned

Data Source

This study used cTTO data and QC reports of the Egyptian EQ-5D-5L valuation study [15]. A total of 1,303 interviews were conducted in the period between July 2019 and March 2020 by 12 interviewers and two principal investigators (PIs). Ten interviews were test interviews done by the PIs. Once interviewers were recruited and trained, they did pilot interviews until the study team decided that they had acquired the necessary expertise to obtain good quality interviews based on the QC tool. Three interviewers were excluded due to interviewer effects seen in the data (113 interviews). The final analysis of this study included 206 pilot interviews and the 974 actual interviews that were included in calculating the Egyptian tariff [15]. Members of the general public were recruited from different Egyptian governorates using multi-stratified quota sampling to select a representative sample in terms of age, sex and geographical distribution. Each participant was interviewed face to face by a trained interviewer using the Egyptian translated version of the EQ-VT-2.1 protocol [2]. Interviews took place at the interviewers’ office or the participants’ home, workplace or other public places according to the participants’ preferences. The interviewer training was performed in four stages: interviewing the interviewers by the PIs, initial training followed by conducting pilot interviews then retraining [16].

Quality Control (QC)

The QC reports are composed of two main aspects, namely protocol compliance and interviewer effects, in addition to other meta data such as the number of iteration steps and the time spent on the better than dead (BTD) and worse than dead (WTD) section of the cTTO task [10]. Protocol compliance is assessed based on four criteria such as the time spent on the WC example and actual cTTO tasks should not be less than 3 min and 5 min, respectively, the presence of clear inconsistency in the cTTO rating or if the interviewer did not use the lead time in the WC example. The interview was flagged if the interviewer was not compliant with any of the above-mentioned criteria. A conservative threshold of four flagged interviews out of ten was established as the limit to stop and retrain the interviewer, after a further ten interviews for the same interviewer, if again four or more interviews were flagged, the interviewer should be excluded from data collection [10]. Interviewer effects were assessed for any unusual clustering or distribution by comparing the cTTO value distribution for each interviewer to the overall distribution of values for all interviewers. The QC reports were discussed through periodical online meetings: weekly during the pilot phase (every five interviews per interviewer) and every 2 weeks during actual data collection (every ten interviews per interviewer) between the Egyptian team and the EQ-VT support team, and the feedback received was discussed with all interviewers. All 12 interviewers were compliant with the minimum requirements of the protocol. However, three interviewers, along with the interviews they had conducted, were excluded from data collection process and data analysis due to the presence of strong clustering and inconsistent distributions for the cTTO data despite retraining and close monitoring, which could indicate poor engagement in the valuation tasks and interviewer' effects.

Data Analysis

Analyses were conducted using IBM SPSS Statistics for Windows, Version 22.0 (Armonk, NY, USA: IBM Corp) for sample demographic and QC indicators, STATA software version 14 was used to test for the protocol compliance, interviewer effects, clustering and predictive accuracy.

Sample Demographic Characteristics and QC Tool Indicators

Descriptive statistics were presented for sample socio-demographics and the QC tool indicators; we used percentages to present discrete variables, mean and standard deviation for continuous variables.

Protocol Compliance, Interviewer Effects and Clustering

Data were divided into batches of ten interviews by interviewer. We examined the rate at which interviews were flagged between the pilot phase and the actual data collection phase and calculated the rate of flagged interviews by interviewer to compare the effect of the pilot phase on improving protocol compliance, and to investigate whether the rate of flagged interviews decreased beyond the pilot phase or stopped decreasing within the pilot phase. This allowed the determination of whether there was a decreasing trend in flagged interviews along the study.

To test whether interviewer effects were reduced during the pilot phase and subsequent rounds of collected cTTO data, three-level mixed models were estimated where the variance in values was partitioned into variance attributed to responses, variance attributed to respondents, and variance attributed to interviewers by using responses nested in respondents, nested in interviewers on each of the subsamples of ten interviews per interviewer per batch. Intraclass correlation (ICC) coefficients were calculated to investigate whether there was a decreasing trend in the share of variance attributed to interviewers over the collected rounds of data.

Reduction of clustering on the easily obtained values such as (− 1, − 0.5, 0, 0.5 and 1) were compared and taken as an initial indication of quality improvement. Scatter plots were used to investigate whether clustering decreased over rounds of the collected data.

Predictive Accuracy

To test whether the pilot phase had a significant effect on the aggregate predictive accuracy of the models employed in the value set calculation, two samples were compared, the sample used for the value set calculation (n = 974) and a sample of equal size including the pilot data (n = 206) and the first 768 actual interviews. The omission of actual interviews in the second sample was balanced by interviewer, where the numbers of actual interviews excluded for each interviewer were equal to their pilot interviews. First, we applied the Egyptian value set to all health states valued in the pilot and actual data [15]. For each of the two samples, the mean absolute error (MAE) was computed by comparing the mean of the difference between the values assigned by respondents and the index values. As a comparison, we did a random draw of similar size of two other samples out of all data collected, pilot and actual data, and their performance was compared with that data. The random draws were repeated 10,000 times, to ensure robustness of the sample selection.

To determine whether the pilot data caused better predictive accuracy at the interviewer level after doing more interviews, the Egyptian value set was applied to the valuation data. Then, we calculated the MAE within each interview (ten responses per interview) by taking the mean of the difference between the index values and the values provided by the respondents. Subsequently, using scatter plots, decreasing trends in the MAE over time were visualized by plotting the MAEs within each interviewer over the sequence of interviewing.

Ordinary least square (OLS) regression analysis, with the respondent-level MAE as the dependent variable and the rank order in which the interviews were conducted by the interviewer (Time) as the independent variable, were conducted for each interviewer separately (Eq. 1). This allowed us to test whether the MAE improves when interviewers complete more interviews, in other words, whether the outcomes of a cTTO interview become more similar to the results of the final value set. In addition, we explored models that included a dummy variable (Pilot) that indicated whether data was pilot data (Pilot = 1) or non-pilot data (Pilot = 0) (Eq. 2), and also the interaction between whether the data are pilot data and the sequence of interviews (Time*Pilot) (Eq. 3). For each of these variables, p-values were calculated to test the significance of their relationship with the respondent-level MAE. Significant parameter estimates for the dummy variable showed that the MAE was larger or smaller in the pilot, compared to the actual data, and the interaction term showed whether the improvement in predictive error was larger in the pilot phase:

$$}_= _+_\mathrm+_,$$

(1)

$$}_= _+_\mathrm+_\mathrm+ _,$$

(2)

$$}_= _+_\mathrm+_\mathrm+_\mathrm*\mathrm+_.$$

(3)

In Eqs. (1), (2) and (3), \(}_\) represents the mean absolute error for the interview conducted with respondent \(i\). \(_\) represents the regression intercept, while \(_\mathrm\) represents the effect of interview sequence. \(_\mathrm\) and \(_\mathrm*\mathrm\) represent the effect of the pilot phase and the interaction between the pilot phase and interview sequence, respectively. \(_\) is the residual variance.

留言 (0)

沒有登入
gif