A review of the psychometric properties and implications for the use of the fertility quality of life tool

One hundred thirty-two articles were initially retrieved from PubMed, 77 from CINAHL, and 45 from PsycInfo, for a total of 254 results. After the removal of 101 duplicates, 153 articles were available to screen. Following title and abstract screening, 26 articles were excluded, leaving 127 for review. Following the inclusion and exclusion criteria, 74 articles were excluded. Sixty-five articles did not report any psychometric properties of the FertiQoL questionnaire in their study sample, four were not empirical research studies (reviews and books), three were only published as abstracts, one included the use of an ineligible patient population, and one did not use the FertiQoL to measure fertility quality of life. Fifty-three articles were ultimately included in the current review (See Fig. 1 for PRISMA diagram).

Fig. 1figure 1

PRISMA diagram for fertility quality of life

The majority of the articles collected data using a paper version of the FertiQoL instrument (n = 29), followed by online collection (n = 10), or a combination of paper and online data collection methods (n = 6). Eight articles did not specify whether data collection was completed using the paper or online version. Thirty-three studies were conducted using a female sample, two were male-specific, 11 were female and male dyads, and seven were uncoupled males and females, with an average age of 34.3 across all studies. Twenty-one countries were represented in the study results, with 19 studies originating from East Asia, 18 from Europe, 11 from the Middle East, 7 from North America, and one each from Australia and New Zealand. Additionally, six studies were multisite studies with participants from more than one country. See Fig. 2 for a map of countries represented.

Fig. 2figure 2

Global disbursement of participants

Fertility quality of life tool development

The FertiQoL was published in 2011 as a 36-item self-report questionnaire designed to measure the impact of fertility problems on quality of life in both men and women suffering from infertility [15]. The development of the FertiQoL was a collaborative effort among the European Society for Reproductive Medicine, the American Society for Reproductive Medicine, and Merck-Serono. It was led by 1) psychology professor and researcher, Jacky Boivin; 2) clinical health psychologist and assistant professor, Janet Takefman; and 3) clinical professor and psychologist, Andrea Braverman [18]. Two questions rate overall quality of life and physical health, 24 core questions assess the impact of infertility on quality of life, and an optional treatment-specific module contains 10-questions for participants pursuing infertility treatments [19]. While it is condition-specific (infertility), it is not specific to underlying causes of infertility, such as endometriosis or polycystic ovarian syndrome. It is acceptable for use in both men and women experiencing infertility, those pursuing treatment, and those who are not. Except for the optional treatment section, the FertiQoL is a static measurement where everyone completes the same number of questions [19, 20].

While no theoretical framework was specified for the development of the FertiQoL, authors mirrored the development protocol of the World Health Organization Quality of Life (WHOQOL) measure that emphasizes quality of life as a multidimensional concept consisting of a person’s perception of their physical and psychological health, level of independence, social relationships, environment, and personal beliefs [15, 21]. The FertiQoL was designed using classical test theory in collaboration with international psychosocial experts in reproductive health and a steering committee [15]. After conducting a literature review to generate an initial pool of 302 items dispersed among 14 domains, the pool was then reduced to 116 items after eliminating redundant and irrelevant items. Seventeen focus groups in five countries were conducted with infertility patients, excluding an additional 14 items, for a total of 102 items. The feasibility and acceptability survey exposed any problematic questions, and the item pool was reduced to the final measurement structure: 24 core items, two overall health items, and ten optional treatment items [15]. Psychometric evaluations, exploratory factor analyses, and factor loadings of the items revealed mostly high reliability and sensitivity for both the subscales and the total scales [15].

Data collection and scoring

The FertiQoL self-report questionnaire is available in both paper and electronic formats. While free to administer, no alterations can be made to the questionnaire, and creators should be acknowledged in any publication [22]. Scores, sample size, means, and standard deviations should be sent to the FertiQoL authors for publication on their website [22]. The paper format is available in 48 languages, and the electronic is available in 11. The only instructions necessary for completing the survey are: 1) select the response that most reflects how you feel in your current thoughts and feelings, and 2) only complete the questions with an asterisk if you have a partner [19].

Overall, minimal training is required to administer and score the FertiQoL questionnaire. Scoring is automatic when completing the FertiQoL online. Participants can provide a clinic name, identification number, and email address where they would like the results sent. Alternatively, scores must be computed manually or using an Excel algorithm when administering the paper format, with five core and two treatment questions requiring reverse scoring before scaling the raw subscale and total scores. There are two Excel options for scoring the FertiQoL: 1) the researcher can manually enter scores for each participant into the corresponding question box and score it themselves, or 2) scores can be entered for each question, with the Excel algorithm providing the raw and scaled subscale and total scores for each person. Individuals collecting and processing survey data need a basic understanding of Excel and its functions, mainly the ability to input scores into correlating boxes. If participants complete the online version and provide an email address, the results obtained from the online FertiQoL will also be delivered electronically to their email in Excel format. Participant data can then be combined into one Excel datasheet to view answers to individual questions, subscale scores, and total scores within one file.

There are certain risks to privacy that could be encountered when completing the FertiQoL online because individuals are required to provide initials, date of birth, country of residence, and gender, increasing the ability to identify a participant. Without specific protocols preventing the collection of internet protocol (IP) addresses with an electronic survey, individuals may be at an increased risk of privacy breach. However, survey results can be de-identified and protected once data collection is complete. Because of the risk of privacy breach, individuals should be made aware of the measures taken by researchers and clinicians to protect their identity and personal health information.

Scoring the complete FertiQoL, including the optional treatment module, produces six subscales and three total scores [23]. The subscales include four scales with six questions each (Emotional, Mind/Body, Relational, and Social) and an optional module with two subscales looking at environment and treatment tolerability with four and six questions each, respectively. The four required scales comprise the Core FertiQoL score, while the two optional scales make up the Treatment score. These two scores combine to provide the total quality of life score.

Items are measured as continuous variables on a Likert scale that produces a value between zero and four. Likert scale options include very poor (0) to very good (4), always (0) to never (4), and an extreme amount (0) to not at all (4), with some items requiring reverse scoring [23]. Values are summed and scaled to provide subscale and total scores. Both total and subscale scores range from zero to 100, with higher scores indicating better quality of life. While scores are left to interpretation because of a lack of guidelines, the instrument creators provide access to a compilation of published means and standard deviations of total and subscales scores using the FertiQoL tool [24].

Validity

Validity is the ability of an instrument to accurately measure a construct that it intends to measure [20]. The three main types of validity are content and face validity, criterion validity, and construct validity, with each consisting of several aspects. Criterion validity refers to the degree that scores on a focal measurement adequately reflect that of a gold standard [20, 25]. Since there is currently no gold standard measurement for infertility specific quality of life, criterion validity has not been measured for the FertiQoL and will not be addressed in this review. Rather, this review will report on the content and face validity and construct validity of the FertiQoL.

Face and content validity

Face and content validity are subjective evaluations that ensure an instrument reflects the construct it intends to measure [20]. Providers and patients can assess face validity to ensure that an instrument appears to measure its intended construct. Face validity is often critical when developing disease-specific measurements, like the FertiQoL, because general measures may not seem relevant to participants, reducing the potential for completion and accuracy of a generalized tool [20, 25]. Alternatively, content validity is usually assessed by field experts, like clinicians and researchers, that ensure the entire construct is being measured [20].

The development of the FertiQoL instrument included extensive integration of results from several focus groups and debriefings comprised of the FertiQoL steering committee and psychosocial reproductive health experts from 11 countries (psychologists, counselors, social workers, researchers, patient user groups, physicians, and nurses), alongside individuals with infertility, where questionnaire items were assessed and deemed both relevant and comprehensive [15, 17]. Vital feedback from the focus groups and debriefings improved face and content validity by correcting wording and eliminating redundant items. An acceptability and feasibility study was also conducted and included 525 participants in 10 countries, with results further supporting prior assertions of face validity and acceptability by individuals with infertility [15].

Construct validityConvergent validity

Convergent validity is the degree to which scores on a measurement correlate with scores on a measure with which there is a hypothesized relationship [20, 25]. However, in the absence of a “gold standard” measurement, like fertility-specific quality of life, instruments assessing constructs with expected conceptual convergence, like general quality of life, relational satisfaction, anxiety, and depression, may be used instead [20]. To assess convergent validity using a generic quality of life instrument, Heredia et al. [26] used Spearman’s rho (ρ) to measure correlations between the Short Form 36 (SF36) questionnaire for general physical and mental health and the FertiQoL, whereas Hekmatzadeh et al. [27] used the shorter adaptation of the instrument, the 12-item Short Form Health Survey (SF-12) and Pearson’s r. Correlations were considered weak (\(<.3)\), moderate (\(\ge .3<.7)\), or strong (\(\ge .7),\) and statistically significant at p > 0.05.

The Core and Total scores of the FertiQoL were moderately positively associated with social functioning and mental health subscales of the SF-36 [26]. Both instruments (SF-12 and SF-36) exhibited agreement with moderate positive correlations between the FertiQoL Emotional subscale and mental health, role limitations from emotional problems, and vitality. Additionally, the SF-36 indicated a moderate positive correlation with social functioning. However, the FertiQoL Social and Mind/Body subscales showed more correlational variability with the two adaptations of the Short Form Health Survey, with the SF-12 exhibiting stronger convergence with the FertiQoL Social subscale and between the Mind/Body subscale and physical problems than the SF-36. More specifically, results from the SF-12 indicated a moderate positive correlation between the Social subscale of the FertiQoL and the social functioning domain (r = 0.49, p < 0.001), while the SF-36 found no significant correlation with the social domain (ρ = 0.117), but rather, a moderate positive correlation between the FertiQoL Social subscale and the SF-36 general health domain (ρ = 0.360, p < 0.05). Additionally, there was a moderate positive correlation between the Mind/Body subscale and role limitations from physical problems (r = 0.47, p < 0.001) and physical functioning (r = 0.68, p < 0.001) with the SF-12, but no significant correlations were found with physical functioning (ρ = 0.080), physical role limitations (ρ = 0.127), or bodily pain (ρ = 0.256) on the SF-36. However, results did suggest moderate correlations between the Mind/Body subscale and social functioning (ρ = 0.497), mental health (ρ = 0.524), vitality (ρ = 0.417), and emotional role (ρ = 0.417) on the SF-36. Although the two studies vary in correlational significance on certain subscales, overall results provide evidence of adequate convergent validity between measurements of general quality of life and the disease specific FertiQoL.

Since depression and anxiety are two well-known consequences of infertility, the Hospital Anxiety and Depression Scale (HADS; [28]) is often used to confirm convergent validity using correlation coefficients [29]. It has been utilized in multiple populations, including Iranian [27, 30], Turkish [31, 32], and Dutch women with infertility [33]. As hypothesized, significant negative correlations were found between the core total and subscales of the FertiQoL and HADS-Anxiety (HADS-A) and HADS-Depression (HADS-D) scales, with fertility quality of life increasing as depression and anxiety decrease. Weak to moderate associations have been found between the Relational subscale and the HADS-A (r = -0.2 – -0.49) and HADS-D (r = -0.32 – -0.50). Similar results have been found between the Relational subscale and multiple measurements of relationship quality. In a validation study, Donarelli et al. [34] found weak to moderate positive correlations between the FertiQoL Relational subscale and marital satisfaction (ρ = 0.31–0.36) and dyadic adjustment (ρ = 0.28–0.31), while moderate negative associations were found with sexual stress (ρ = -0.48) and marital commitment (ρ = -0.30 – -0.37). All other core subscales had moderate correlations with anxiety and depression. Moderate correlations exist between the core total and HADS-A (r = -0.56 – -0.64) and HADS-D (r = -0.51 – -0.67). Moderate correlations were reported for the Mind–Body subscale with the HADS-A (r = -0.48 – -0.65) and HADS-D (r = -0.38 – -0.66), the Social subscale with the HADS-A (r = -0.44 – -0.55) and HADS-D (r = -0.46 – -0.56), and the Emotional subscale with the HADS-A (r = -0.50 to -0.62) and HADS-D (r = -0.49 to -0.54). See Table 1 for a summary of correlation coefficients from the studies reporting on the HADS and FertiQoL convergent validity.

Table 1  Table 1 Pearson's correlations between FertiQoL and HADSStructural validity

Structural validity is a measurement of how well an instrument captures the hypothesized dimensionality of a complex construct using multiple subscales [20]. Structural validity is most commonly assessed using confirmatory factor analyses (CFA) or exploratory factor analyses (EFA). During the development of the FertiQoL, authors used EFA to explore subscale structure and corroborate the conceptual model [15, 20]. Aside from Hekmatzadeh et al. [27], subsequent studies used CFA to assess structural validity [20]. Donarelli et al. [34] reported a CFA using chi-square, comparative fit (CFI), goodness of fit (GFI), and root mean square error of approximation (RMSEA) indices for the FertiQoL with a good fit for the four-factor model and Relational subscale in 589 infertile Italian men and women. Maroufizadeh et al. [30] also used CFA, reporting chi-square, CFI, RMSEA, and standardized root mean square residual indices to determine goodness of fit of the Persian FertiQoL using a sample of 155 infertile Iranian women. Both studies confirmed goodness of fit with acceptable factor loadings on all items except for one question asking whether infertility had strengthened partner commitment [30, 34]. Alternatively, Hekmatzadeh et al. [27] confirmed the six underlying factors present in the complete Iranian version of the FertiQoL tool (Emotional, Mind/Body, Relational, Social, Environmental, and Tolerability) with a sample of 300 women with infertility in Iran. Results from the EFA with principal component factor analysis indicated all factor loadings were greater than 0.30 and all original questions remained. The FertiQoL has demonstrated structural validity, with studies confirming that the subscales adequately reflect the hypothesized underlying factors.

Reliability

Reliability refers to a measurements ability to provide consistent and stable scores that are free from error or variation after repeated measurements, under different circumstances, by different persons, or using different measurement versions [20]. Efforts to determine the reliability of the FertiQoL are mostly limited to assessments of internal consistency because of the potential for low temporal stability of psychological states [20]. The cycle of hope and despair cycle experienced with each menstrual or treatment cycle failure makes test–retest reliability problematic [8, 20, 35]. However, while a previous review found no evidence supporting the stability of the FertiQoL over time [17], a recent study by Chan et al. [36] investigated decisional conflict, regret, anxiety, depression, and fertility quality of life in 151 women in Hong Kong notified of an unsuccessful IVF cycle (T0). Participants completed the questionnaire again during their consultation 2–3 weeks later (T1) and finally, three months later (T2). Descriptive statistics suggested relative stability over time, with Core scores of 63.99 (T0), 64.67 (T1), and 63.96 (T2), Treatment scores of 62.03 (T0), 61.70 (T1), and 60.80 (T2), and overall FertiQoL scores of 63.34 (T0), 63.77 (T1), and 62.91 (T2). While the FertiQoL shows potential adequate test–retest reliability, additional studies are needed to support the currently limited findings.

Internal consistency

Internal consistency, a measurement of reliability related to the homogeneity of items on a scale or subscale [20], has been extensively documented in multiple studies and compiled by the original authors on the Fertility Quality of Life website [24], as well as by Koert et al. [37] in a recent systematic review that summarizes the updated psychometric properties of the FertiQoL. Internal consistency has been reported using Cronbach’s alpha coefficients in all studies using FertiQoL. Internal consistency was tested during the generation of the FertiQoL [15, 33] and subsequently in multiple countries to determine the reliability of different translations and use of the measure with individuals of multiple ethnicities and cultures. Internal consistencies were available for populations with infertility in the U.S., Canada, China, Denmark, Italy, Germany, Hong Kong, Hungary, Iran, Japan, Jordan, Korea, Netherlands, Poland, Portugal, Switzerland, Taiwan, and Turkey. See Tables 2 and 3 for updated internal consistencies with a description of the population sample and country of origin.

Table 2 Internal consistency reported by FertiQoL studies with subscales Table 3 Reported internal consistency subscale ranges and overall totals

Previous studies indicated that FertiQoL is generally reliable in diverse populations of men and women with infertility. Internal consistency alpha scores range from 0.43–0.92 for the four subscales included in the Core (Emotional, Mind/Body, Social, and Relational) and 0.78–0.92 for the Core total (combined core subscales). While only some studies reported internal consistency for the optional Treatment module, those indicated moderate reliability with scores ranging from 0.67–0.84 for the Environment subscale, 0.64–0.79 for the Tolerability subscale, and 0.69–0.91 for the overall Treatment total. The internal consistency for the complete FertiQoL total ranges from 0.78–0.94. While no specific rules exist defining satisfactory internal consistency, many agree that an alpha greater than 0.70–0.75 is generally considered acceptable [20, 80].

Overall, the four subscales that make up the core score of the FertiQoL showed moderate to high reliability. The Emotional (Cronbach’s \(\alpha\)=0.71–0.90) and Mind–Body (Cronbach’s \(\alpha\)=0.78–0.89) subscales showed high reliability with all alpha coefficients greater than 0.70. Aside from one study reporting low reliability (\(\alpha\)=0.43) in women with infertility from the U.S. and Canada trying to conceive between 12 and 48 months without medical intervention [40], the Social subscale (Cronbach’s \(\alpha\)=0.61–0.84; 4/19 studies \(\alpha\) <0.70) showed moderate reliability. Additionally, the Relational subscale has shown slightly lower reliability in several studies, with alphas ranging from 0.60 to 0.80 (9/19 studies \(\alpha\)=0.60–0.68). Furthermore, two studies reported lower reliability of the Relational subscale with men. Donarelli et al. [34] described lower reliability of the Relational subscale in Italian men (0.61 vs. women: 0.68), and Sexty, Griesinger [46] corroborated these results with lower reliability in German men (0.65 vs. women: 0.70), suggesting the need to use caution when interpreting FertiQoL results for this subscale, particularly with men. Despite the slightly lower reliability in the Relational subscale, the internal consistencies reported indicate that the majority of the FertiQoL has demonstrated acceptable reliability, suggesting that the subscale items reliably measure the same underlying latent trait.

Implications for practice

Currently, the FertiQoL scores are open to interpretation by the individual administering the instrument or those taking the assessment online. Although a previous review found no evidence of test–retest reliability and a lack of clinically important cutoff scores [17], recent studies have suggested that core FertiQoL scores may correspond to clinically significant thresholds, including anxiety (< 55 to 59) and depression (< 51 to 52) in Dutch and Turkish individuals [31, 32], and marital dysfunction (< 74) in Italian men and women with infertility [34]. Healthcare providers, including physicians, physician assistants, nurses, nurse practitioners, and medical trainees (medical students, undergraduate and graduate nursing students), should be educated on the potential impact that infertility can have on an individual’s quality of life. While it was not specifically designed to detect pathological st

留言 (0)

沒有登入
gif