Patient-reported outcome measures for primary hyperparathyroidism: a systematic review of measurement properties

Protocol and registration

This systematic review was conducted according to the COSMIN Methodology for Systematic Reviews of PROMs [20,21,22] and reported following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) checklist. The protocol was registered on PROSPERO (CRD42023438287) and did not require ethics approval.

Search strategy and eligibility criteria

The databases Medline, EMBASE, CINAHL Complete, Web of Science, PsycINFO, and Cochrane Trials were systematically searched on 2 July 2023, and updated on 8 December 2023. The search strategy was developed in consultation with a clinical librarian (CW) to identify all primary research articles using any PROM in adult patients aged 18 years or older with PHPT (Additional File 1) [20]. Following the COSMIN search recommendations led to the exclusion of articles that were known a priori that should be included in this review. Therefore, a different search strategy was developed in consultation with our clinical librarian and clinical experts to ensure no relevant studies were missed, which included treatment strategies for primary, secondary, and tertiary hyperparathyroidism in the search to ensure high search sensitivity. No date restrictions were applied. We included any full-text articles published in English investigating PROM development, pilot studies, or evaluation of at least one PROM measurement property. At least one of the aims of the article had to be the development of a PROM or the evaluation of one or more measurement properties of a PROM for use in adults with PHPT. In articles including other conditions, patients with PHPT had to comprise 50% or more of the patients or subgroup analyses on PHPT-specific data had to be available. All forms of PHPT (i.e., classic, normocalcemic, normohormonal, hereditary, etc.) were included.

Studies that only used the PROM as an outcome measure or studies in which the PROM was used in a validation study of another instrument were excluded [20]. Articles that used PROMs but not with the intention to study the disease of PHPT were also excluded; examples of such studies include quality improvement studies (e.g., enhanced recovery after surgery, opioid minimizing perioperative pathways) and studies of surgical or anesthetic techniques. Case reports, conference abstracts, editorials, trial protocols, and theses were excluded. Review articles, consensus statements, and practice guidelines were also excluded but their bibliographies were searched to identify additional potentially eligible studies that were not identified through the database search.

Study selection and data collection

We used Covidence (Melbourne, Victoria; Australia) to screen articles for inclusion. Two independent reviewers (GL, JBL) screened all titles and abstracts for potential full-text review. Disagreements were resolved through discussions. If a consensus could not be reached, the full-text article was retrieved. Two independent reviewers (GL, JBL) then screened full-text articles for inclusion. Disagreements at this stage were resolved by a third reviewer (MK) or discussion among the reviewers (MK, GL, JBL).

Extracted information for each article included study characteristics (author, year, country of origin, language, patient characteristics, disease characteristics, setting, response rates), PROM characteristics (construct[s] measured, target population, mode of administration, recall period, subscales, number of items, response options, scoring), and the measurement properties of the PROMs. Following the COSMIN methodology and definitions [20, 21], articles were searched for studies on (1) PROM development (2), content validity (3), structural validity (4), internal consistency (5), cross-cultural validity/measurement invariance (6), reliability (7), measurement error (8), construct validity, and (9) responsiveness. Criterion validity was not considered as there is no known “gold standard” available for measuring the construct(s) of interest in the PHPT population.

Methodological quality and risk of bias

The methodological quality of each single study on a measurement property was extracted sequentially and assessed using the COSMIN Risk of Bias checklist by two independent reviewers (MK, JBL) [22, 23]. Each study was rated as very good, adequate, doubtful, or inadequate following the worst score counts principle. Disagreements were resolved through discussion.

The COSMIN Methodology for Assessing the Content Validity of PROMs was followed to evaluate PROM development and content validity for each PROM [21]. Existing ratings of the quality of PROM development were used when available [24,25,26]. Reviewer ratings were considered additional to the available evidence from the literature and were weighted less than the evidence from available development and content validity studies [21]. If there are no content validity studies, or only content validity studies of inadequate quality, and the PROM development is of inadequate quality, the rating of the reviewers will determine the overall ratings. Indirect evidence, when available, was considered for content validity only and not for other measurement properties.

Prior to evaluating structural validity, internal consistency, and cross-cultural validity/measurement invariance, each PROM’s measurement model was determined to be reflective or formative to ensure appropriate interpretations [20, 27, 28]. A “thought test” was performed to determine which model was used if one was not reported. If the PROM contained a mix of reflective and formative items, the PROM was assumed to be based on a reflective model and related measurement properties were evaluated.

In this review, a construct approach was taken to evaluate hypothesis testing for construct validity and responsiveness. Any construct known to be clinically relevant to PHPT was considered, such as fatigue, sleep disturbance, depression, anxiety, etc [7, 10, 12,13,14,15,16,17, 29]. Hypothesis testing criteria were adapted from the COSMIN manual [20]. For construct validity, these included: (1) correlation coefficients between the investigated PROM and the comparator instrument both measuring the same or similar construct(s) are 0.50 or more (2), correlation coefficients between the investigated PROM and the comparator instrument both measuring different construct(s) are 0.30 or less, and (3) effect sizes (e.g., standardized mean differences) between the scores of the investigated PROM in patients with PHPT and a different, unrelated condition are 0.8 or more. In consultation with clinical experts, patients are expected to improve three to four weeks after definitive surgical treatment (i.e., resection of the abnormal gland(s)) at least moderately. Therefore, for responsiveness, hypotheses included: (1) effect sizes of the investigated PROM are 0.30 or more, and (2) effect sizes of the investigated PROM and the comparator instrument both measuring the same or similar construct(s) are 0.30 or more.

Evaluation of measurement properties

The results of each study on a measurement property were evaluated against the Updated Criteria for Good Measurement Properties and rated as either sufficient, insufficient, or indeterminate [20, 23]. Results from individual studies were then qualitatively summarized per measurement property per PROM. The overall result was then rated against the Updated Criteria for Good Measurement Properties to derive an overall rating of sufficient, insufficient, indeterminate, or inconsistent for the measurement property per PROM. Inconsistent results were summarized and presented separately when explanations were available. Otherwise, the conclusion was based on the majority of consistent results.

Certainty of evidence

COSMIN’s modified Grading of Recommendations Assessment, Development, and Evaluation (GRADE) approach was used to grade the certainty of evidence considering the methodological quality of studies, total sample size, and consistency of results [20]. Specifically, the certainty of evidence was downgraded based on the risk of bias, imprecision, inconsistency, and/or indirectness, where applicable. For content validity, imprecision was not taken into account. The certainty of evidence was rated as high, moderate, low, or very low. For example, if no content validity studies were available for a PROM and PROM development was inadequate, the certainty of evidence was rated as very low. If only one study of inadequate methodological quality based on the COSMIN Risk of Bias Checklist was available, the certainty of evidence was downgraded from high to very low [20, 22]. For internal consistency, the certainty of evidence started at the level of structural validity. Following others, the certainty of the evidence was not graded for studies when the overall rating was indeterminate [23].

Recommendations for use

Each PROM was categorized following the COSMIN methodology as: category A, recommended for use; category B, potential to be recommended for use but requires further validation; or, category C, should not be recommended for use [20]. PROMs categorized as A have evidence for sufficient content validity (any level) and at least low certainty evidence for sufficient internal consistency; results obtained from these measures are considered trustworthy. PROMs based on a formative model were categorized as A if they have evidence for sufficient content validity (any level) and at least low certainty evidence for sufficient reliability. PROMs categorized as C have high certainty evidence for an insufficient measurement property. PROMs categorized as B are those not in A or C.

留言 (0)

沒有登入
gif