Evaluating conceptual model measurement and psychometric properties of Oral health-related quality of life instruments available for older adults: a systematic review

Evaluation of OHRQoL plays an important role in clinical practice. As a result, several instruments have been developed to evaluate functional, social and psychological aspects of oral diseases or conditions disorder [17]. In this study, we identified and evaluated 14 instruments designed to measure OHRQoL in older adults. Of these, only six overcame the minimum score in EMPRO (50.0) for their administration in older patients to be recommended (EORTC QLQ-OH-15, GOHAI, IPQ-RDE, OHIDL, OHIP, QOLIP-10). EORTC QLQ-OH-15 was the instrument that obtained the best evaluation by the experts, followed by OHIP, GOHAI, and OHIDL.

EORTC QLQ-OH-15 is a supplementary module of the EORTC QLQ-C30 for assessing OHRQOL in cancer patients, addressing aspects such as pain, sensitivity to food and drink, saliva, information received, and use of dentures [18, 19]. It was developed for the adult and older adult populations, and it has been validated for different populations and languages.

OHIP, GOHAI, and OHIDL are generic instruments for evaluating OHRQoL in patients with oral diseases [2, 17]. Applying OHIP may involve a greater respondent burden than GOHAI, so a shorter version of the instrument, such as OHIP-14 or OHIP-EDENT, is a possible option. However, shorter versions of OHIP place more weight on psychological or behavioural aspects, while GOHAI prioritises aspects related to functional limitations and pain [17]. Previous studies have compared the psychometric properties of GOHAI and OHIP-14 for the older adult population. It was found that both instruments are suitable for evaluating the impact of oral pathologies on OHRQoL; however, GOHAI is better than the short forms of OHIP at detecting problems in oral function [17, 20].

El IPQ-RDE, a generic instrument for detecting single and multiple dental conditions in older adults [21]. It measures different aspects from those measured in EORTC QLQ-OH15, OHIDL, GOHAI and OHIP, such as the chronology of the disease, control of the symptoms, treatment burden and prioritisation of the disease. IPQ-RDE is a promising instrument, and it is probable that when new evidence is available, with more studies and improvements in some of its attributes, this instrument will prove to be an excellent option for measuring OHRQoL in older adults.

The majority of the instruments for evaluating OHRQoL in older adults are not suitable for detecting changes in oral health since Responsiveness was measured by five instruments (EORTC QLQ-OH-15, GOHAI, OHIDL, OHIP and OIDP). OHIP showed the best performance for Responsiveness, followed by GOHAI and OHIDL, making them recommended for longitudinal studies and clinical trials. Responsiveness is essential for ensuring that the changes reported are real and not the result of measurement errors. OIDP also obtained a good score for Responsiveness; however, it had poor internal consistency and inadequate coefficients of Reproducibility, which may affect the data in instruments used for longitudinal studies. OIDP is a generic, self-administered instrument translated into five languages other than the original. It evaluates serious oral impacts on daily performance [22]. The evaluation of OIDP could only be improved by developing strategies to make score interpretation easier, to describe the burden (respondent and administrative) and to increase internal consistency and reproducibility.

A generic instrument can detect the impact of oral or orofacial diseases, allowing comparisons of diseases and conditions [17]. On the other hand, generic instruments may be less sensitive, specific or useful for evaluating a specific disease [17]. Previous studies have shown that the EMPRO score is higher for generic than for specific instruments [23], very similar to what was found in our study. Evaluation by experts showed that only two (EORTC QLQ-OH-15 and QoLIP-10) of the six specific instruments obtained a score higher than 50.0. The EORTC QLQ-OH-15 showed the highest overall score and good performance in most domains; however, generic instruments such as the GOHAI, OHIDL and OHIP showed better performance in domains such as reliability and validity.

Evaluation by the EMPRO tool is based on the quantity and quality of the evidence published for each instrument. The absence of information for some attributes in EMPRO evaluation penalises the scores since the missing information is given the lowest possible score [23]. One factor which could have affected the performance of these instruments is the fact that only one or two studies per instrument were evaluated, with poor or missing information for some attributes.

The overall score was not calculated for DSQ and OHAI, as information was missing for at least half of the attributes evaluated by EMPRO. In the case of DSQ, not only was there no information for many attributes, but those evaluated obtained very low scores. All aspects of this instrument need to be improved. OHAI obtained a good score for Conceptual and measurement model (score = 63.1) and ease of use (respondent burden: 83.3; administrative burden: 75.0); however, there were insufficient data for evaluation of Reliability, Validity, Interpretability and Responsiveness.

Apart from EORTC QLQ-OH15, IPQ-RDE, OHAI and OHQoL-UK-W, all the instruments were developed for self-administration. The mode of administration may influence the quality of the data, and the way in which older adults answer the instrument. Self-administered instruments may require greater physical and cognitive capabilities in the respondents [24]. This reflects the need for the clinician/investigator to consider the patient’s condition before selecting the most appropriate instrument for evaluating OHRQoL in the older adult population.

Strengths and limitations

The main strength of this study is that we also include instruments not explicitly developed for older adults but are currently used by clinicians and researchers in this population. Not including them would introduce a selection bias excluding valuable information on the validity, reliability and responsiveness of these instruments currently in use in this population.

The use of EMPRO is another strength of our study since it is designed to evaluate the performance of an instrument based on what is reported by all the studies that assessed a specific health problem. EMPRO has been shown to have high internal consistency, inter-rater agreement, and positive associations consistent with a priori hypotheses between EMPRO attribute scores and bibliometric quality indicators. In addition, according to the FDA (US Food and Drug Administration) guideline for patient-reported outcome measures [25], it is essential that the reliability, validity, sensitivity to change and the choice of interpretation method of an instrument be evaluated before use in the measurement of treatment benefit or risk in medical product clinical trials; all these properties are assessed attributes in EMPRO.

Our study presents certain limitations attributable to a variety of reasons. First, it is possible that we did not identify all the instruments of OHRQoL in older adults. To minimise this risk, we used a sensitive search strategy complemented by a manual search of the references and two online databases of PRO, as well as a duplicated review process. In addition, our systematic review has a limitation regarding language restrictions. We attempted to include research in various languages, including English, Spanish, Portuguese, French, German, and Italian. However, it is possible that some studies in other languages were not included in our inclusion criteria, introducing selection bias. Furthermore, the development instruments were included regardless of the age range of the participants in order to identify all the available information. Second, the cut-off point established as the threshold for considering EMPRO scores acceptable is questionable. This threshold was obtained with data from the first two EMPRO studies [12, 16]: the area under the receiver operating characteristic (ROC) curve evaluating the agreement between EMPRO attribute scores and the reviewers’ global recommendations was of 0.87 (data not shown but available upon request) and should be used only as a guideline for identifying gaps in the instruments. Third, the EMPRO evaluations may be biased by the individual experience of the evaluators; however, the evaluations were carried out by researchers with experience in the evaluation of PROMs, and at least one of the two evaluators belonged to the team that manages the EMPRO tool, minimizing this bias. Fourth, it is also important to bear in mind that the EMPRO criteria assess both the methodological quality of the studies and the results of the instrument metric properties, so there could be a risk that studies with adequate methodologies and poor results may obtain EMPRO scores above 50. However, to mitigate this potential risk, there are more EMPRO criteria focused on results than on methodological characteristics: 5 vs 2 in the conceptual and measurement model, 2–3 vs 1 for internal consistency, 2 vs 2 for reproducibility, 2–4 vs 2 for validity, 2 vs 1 for responsiveness, and 2 vs 1 for interpretability. Furthermore, in our EMPRO evaluation, all instruments with scores over 50 also have a good rating in the results criteria. Fifth, EMPRO global score is a summary of the five metric attributes assessed that facilitates a synthesis, but it is recommended to consider scores of each of these five attributes separately according to the purpose for applying the instrument. Sixth, because the EMPRO tool is based on the quantity and quality of the evidence published for each instrument, instruments developed recently, for which little evidence is available, may have been penalised. On the other hand, no overall score was calculated for instruments which did not present information for at least half of the attributes, in order not to penalise them too heavily for lack of information. Finally, we didn’t perform a meta-analysis since EMPRO makes a qualitative evaluation by experts with a consensus process of each OHRQoL instrument considering the variability of the data reported in the different studies to make a judgment and not just the average as would be the case with meta-analysis. In addition, the variability between studies related to the characteristics of the population and methods used to measure the different psychometric properties could generate a significant heterogeneity affecting the certainty estimate obtained with meta-analysis.

留言 (0)

沒有登入
gif