Multimodal cognitive and behavioral interventions for patients with MCI: a systematic review and meta-analysis on cognition and mood

1 Introduction 1.1 Behavioral interventions for mild cognitive impairment

Mild cognitive impairment (MCI) is a prodromal stage of Alzheimer’s disease (AD) and other types of dementia. In patients with MCI (pwMCI), circumscribed cognitive abilities are commonly below age expectation despite generally intact daily functioning (Petersen, 2004; Smith and Bondi, 2013). However, while pwMCI remain independent in primary daily activities, they may encounter difficulties performing complex functional activities (e.g., managing finances, medications, or shopping) and request increased caregiver attention (Albert et al., 2011). MCI is associated with an approximate 12% annual conversion rate to dementia while the comparable normal control group rate is only 1–2% (Petersen et al., 1997, 2001; Shah et al., 2000). In longer-term follow-up studies approximately 80% of pwMCI converted to dementia within six years (Petersen et al., 1999).

While neurodegenerative forms of dementia are irreversible, non-pharmacological interventions (i.e., behavioral interventions such as physical exercise, note taking, social engagement, and computerized cognitive training) administered at an early stage (e.g., MCI) can preserve functional independence, slow cognitive decline, and thereby delay the onset of dementia (Gauthier, 2005; Levy et al., 2022). A review by Chandler et al. (2016) revealed the benefits of behavioral interventions in improving mood (k = 26, Cohen’s d = 0.16, 95% CI = [0.03–0.28]), functional ability (k = 31, d = 0.23, 95%CI = [0.16–0.47]), and metacognition (k = 26, d = 0.30, 95%CI = [0.15–0.58]) in pwMCI (Chandler et al., 2016). Since that review, numerous additional multicomponent interventions have been reported in pwMCI or other at-risk groups. Large multicomponent behavioral interventions such as Vivifrail, which consisted of physical resistance, balance, flexibility, and gait-retraining exercises for three months, have shown significant improvements in functional capacity, cognitive function, and depression (Casas-Herrero et al., 2022). Alternative interventions including lifestyle training might also play an essential role in mood and functional improvement (Gale et al., 2019; Yu et al., 2019).

These observed benefits lead to hypotheses that repeated cross-domain training might stimulate “compensatory scaffolding” and neuroplastic reorganization (Sherman et al., 2017). In other words, the combination of several approaches in a multicomponent treatment program interventions targeting multiple domains may exhibit additive efficacy. In one systematic review only multicomponent (k = 16, Hedges’ g = 0.40, 95%CI = [0.16, 0.63]) and multidomain-focused cognitive training (k = 13, g = 0.23, 95% CI = [0.108, 0.352]) yielded statistically significant improvement in cognitive outcomes post-intervention in pwMCI when compared to MCI controls (Sherman et al., 2017). Thus, combining multiple interventions has been increasingly emphasized as a tool to facilitate functional retention. Previous systematic reviews and meta-analyses have reported benefits in combining physical exercises with cognitively challenging activities in both clinical and non-clinical older adults (Zhu et al., 2016; Gheysen et al., 2018; Gavelin et al., 2021). In one meta-analysis, combined cognitive-physical interventions showed small-to-medium positive effects (k = 10, standardized mean difference (SMD) = 0.32, 95%CI = [0.17–0.47]) on global cognitive function and moderate-to-large effects (k = 4, SMD = 0.65, 95%CI = [0.09-1.21]) on activities of daily living (ADL) in MCI or dementia patients (Karssemeijer et al., 2017). In contrast, despite the significant benefits evidenced in most studies, a recent systematic review found no difference between combined cognitive-physical training and interventions with isolated elements in executive function, processing speed, attention, mood, and cardiorespiratory fitness (Yang et al., 2020). However, the review focused primarily on cognitive outcomes, which might not reflect the overarching efficacy of multimodal interventions across domains (e.g., quality of life and independent daily functioning).

1.2 Gaps in current systematic literature review and meta-analysis

A few limitations were identified in existing systematic literature reviews and meta-analyses. First of all, while the effects of combined interventions have been extensively studied in the past decade (see Supplementary material A), research has focused predominantly on comparative effectiveness analysis (Amofa et al., 2021; Levy et al., 2022), a tool commonly used to explore the additive effect of a specific arm instead of changes an overall program has exerted. For example, Imaoka et al. (2019) used comparative effective analysis to investigate the additive effect of soy peptide as a supplement to memory exercise in pwMCI but did not study the overall efficacy of both when compared to an untreated control group. Secondly, some studies and reviews have mixed samples of pwMCI with healthy older adults or early dementia patients (Li et al., 2011; Straubmeier et al., 2017; Bruderer-Hofstetter et al., 2018; Stephen et al., 2019; Santos Lopes da Silva et al., 2023) due to the small amount of available literature (Gheysen et al., 2018, k = 9; Han et al., 2022, k = 3; Karssemeijer et al., 2017, k = 5). Nevertheless, primary preventions in cognitively healthy older adults can serve distinctive roles from interventions for those with known risk of decline (i.e., secondary preventions). Secondary preventions usually incorporate compensation training and adjustment-related treatments to slow or prevent further decline (Smith, 2016). On the other hand, tertiary preventions for those with dementia diagnoses rely heavily on participants’ capacity to grasp the ideas, which might include differential strategies and evaluation systems from interventions designed for pwMCI. Thus, an essential question regarding the effectiveness of multimodal intervention as a secondary prevention in pwMCI remains unclear. Thirdly, there is a lack of consensus on targeted outcomes. Some studies focused primarily on mobility (Kiper et al., 2022; Mai Ba and Kim, 2022) while others focused on cognition (Yan et al., 2022). Lastly, while one meta-analysis (Meng et al., 2022) has synthesized clinical trials combining cognitive intervention and physical exercise on multiple cognitive domains in pwMCI, this meta-analysis excluded behavioral interventions other than physical exercise and included single intervention comparisons to study the additive effects instead of the overall impact of multimodal interventions. Furthermore, this study also suffered from a limited number of reports (k = 8) of randomized control trials (RCTs).

In addition, the definition of “multimodal” varied across studies and was often mixed with terms including “multicomponent” or “multifaceted.” For example, a combination of different physical exercises (Lau et al., 2015; Trautwein et al., 2020; Barisch-Fritz et al., 2022) or cognitive training targeting multiple domains (Tsolaki et al., 2011; Olchik et al., 2013) were treated as multimodal in several studies. While these interventions have included multiple strategies, the target was often limited to one area of concern instead of a comprehensive approach that can target multiple interrelated areas of concern simultaneously. Studies have also used the term “multimodal” to describe treatments conducted in different settings (e.g., home vs. clinic) or through different delivery methods (e.g., computer vs. paper). To establish an operational definition and delineate the targeted treatment types for this review, multimodal interventions generally refer to combining several training approaches that target different outcome domains in a treatment program (Giusti et al., 2017).

In summary, we believe that examining truly multimodal interventions that focus on or at least partition pwMCI for separate analysis might assist future explorations of comprehensive and efficient intervention programs for persons at the highest risk for dementia. Therefore, the aims of the current systematic review and meta-analysis are (1) to perform a synthesis of existing research of multimodal interventions on cognition and mood for individuals who meet the criteria of MCI and (2) to investigate the clinical implications and limitations of these results for future treatment planning.

2 Methods 2.1 Eligibility criteria

The eligibility criteria are consistent with the PICO criteria and the PRISMA 2020 reporting guidelines (Page et al., 2021a), and incorporate participants, interventions, comparators, and outcomes. Only RCTs were included in the review with no restrictions on cohort studies, longitudinal studies, and crossover designs.

2.1.1 Participants

Participants included patients with a clinical diagnosis of MCI due to any underlying etiology (e.g., MCI due to AD or Parkinson’s disease), regardless of age, gender, or cultural background. Samples of mixed MCI and healthy or demented older adults were excluded unless an independent analysis was undertaken to evaluate the effect on pwMCI. Because cognitive impairment with no dementia (CIND) was commonly used interchangeably with MCI, participants with CIND were also included. In addition, the Diagnostic and Statistical Manual of Mental Disorders (DSM-5) (American Psychiatric Association, 2013) introduced the term mild neurocognitive disorder (mNCD) to describe acquired cognitive impairments of all causes at all ages before proceeding to identify the etiology. In mNCD, individuals can report slight difficulty performing everyday activities while remaining functionally independent and demonstrate deficits in one or more cognitive domains, which corresponds to MCI symptoms. Therefore, patients with mNCD were also included in the review. However, prodromal AD or other cognitive states (e.g., a score below certain AD risk scales) were excluded due to the potential inconsistency when compared to pwMCI.

2.1.2 Intervention

Intervention eligibility criteria included multimodal behavioral or cognitive interventions to delay or prevent dementia in pwMCI. Any combination of behavioral or cognitive intervention with a pharmacological treatment was excluded unless it was used to compare with a nonpharmacological intervention program. Elective surgical procedures, such as deep brain stimulation, were also excluded. In addition, interventions with variations of the same treatment type (e.g., different physical exercises) were not considered multimodal and excluded. While studies with no cognitive or behavioral interventions or treatment were excluded, a combination of both cognitive and behavioral interventions was not required for inclusion. For example, cognitive training and cognitive rehabilitation were defined as two independent training methods that serve distinctive purposes in patients with dementia (Clare et al., 2003). Specifically, cognitive training consists of guided practice on tasks targeting particular cognitive functions while cognitive rehabilitation focuses on strategies compensating for functional difficulties in daily life. Therefore, interventions with cognitive training and compensatory rehabilitation were included. In a previous systematic review, Gavelin et al. (2021) introduced the concept of exergaming, which referred to video games that provided simultaneous training of different modalities (e.g., cybercycling, a videogame that requires both cycling and navigation strategies). Studies with exergaming were included if multiple modalities were identified.

2.1.3 Comparator

Eligible comparators included nontreatment control groups and alternative multimodal or single modality treatment. However, a comparative effective analysis that aims to investigate the effect of one single intervention arm by adding or withdrawing one of the arms from a multimodal program was excluded due to the lack of appropriate comparison to demonstrate the effect of the overall intervention program. In addition, a direct comparison between targeted multimodal intervention programs and a control group or a group with completely different treatments was required for data extraction.

2.1.4 Outcome measures

To synthesize outcome domains, we referenced two patient-related latent factors derived from our multimodal intervention trial (Smith et al., 2017). Using exploratory factor analysis, Defeis et al. (2021) suggested that common outcome measures in behavioral and cognitive intervention programs for pwMCI could be synthesized into a three-factor model that consisted of patient impairment, patient adjustment, and partner adjustment. This model has been examined and confirmed in a separate MCI intervention sample with high factor loadings and an almost identical structure (Defeis et al., 2021). Therefore, to evaluate the effects of multimodal interventions on patients, the primary outcomes of the current study were organized into patient impairment and patient adjustment categories with their highest loading and most assessed items—cognition and mood. While the quality of life and independent daily functioning outcomes were initially assessed, these outcomes were dropped due to the insufficient number of reports (k < 6) and low statistical power.

2.2 Information sources

This review only included published studies and abstracts written in or translated into English. PubMed, Embase and Cochrane Library were searched for articles published before January 1st, 2024. In addition, references from relevant publications and symposiums were examined and manually searched as an additional source of literature. Please see Supplementary material B for searching items.

2.2.1 Data management

Search results were imported into Mendeley Reference Manager (Mendeley Support Team, 2011), a software that allows the references to be saved in separate collections and compared for duplicates. The results were then imported to Covidence (Veritas Health Innovation, 2017), an online software with live updates of the collaborative progress and discrepancy for screening and data extraction. Two authors (GY and APL) independently reviewed and evaluated all the records and data in the software.

2.3 Data collection process

Targeted variables and measures were identified and extracted by GY and APL independently to an Excel spreadsheet and compared to ensure no errors. Outcomes included changes in cognition and mood. Outcomes were identified by searching the specific terms in the report regardless of measuring tools. Authors were not contacted when information regarding the primary outcome was not available in the text.

2.4 Data items

Participant age, study attrition rate, diagnostic criteria, specific multimodal intervention strategies and characteristics (duration, frequency, and follow-up duration), comparator characteristics (no treatment vs. alternative treatment), outcome measures, effect sizes for each outcome, and results reported by the authors were extracted and documented for all eligible publications.

2.5 Risk of bias in individual studies

The revised Cochrane Collaboration software (RoB 2) (Sterne et al., 2019) assessing the risk of bias in RCTs was employed in the current review. Detailed criteria of focus in each domain can be found in the Cochrane Handbook Chapter 8.2 (Higgins et al., 2019). An overall risk-of-bias judgment was obtained for individual domains by both GY and APL. Similarly, a consensus meeting was arranged to resolve any discrepancies during the process. Results of the risk-of-bias assessment were then visualized through another web-based R package, robis (McGuinness and Higgins, 2021). Because several studies included both targeted outcomes, each outcome was assessed separately and weighted equally in the evaluation. Figure 1 depicts the results of 26 parallel design evaluations conducted for 15 clinical trials.

Figure 1. Risk of bias traffic plot.

Comparisons of the baseline characteristics were employed to evaluate any effects raised by the randomization process. Studies that failed to report any differences between the intervention and control groups regarding demographic variables (e.g., age, gender, etc.) or targeted outcomes (e.g., cognition) raised concern about whether an appropriate analysis was used to estimate the effect of assignment (Domain 2) and whether baseline differences suggested a problem with randomization (Domain 1). “No information” on the randomization process (Domain 1) was given to a few studies, which led to a rating of “some concerns,” due to a failure to clarify the sequence allocation method. In addition, “probably no” was given to one study using consecutive recruitment with no information on the randomization strategy (Kurz et al., 2009). Studies with a larger than 5% dropout rate, according to the guidelines, were rated as “probably not” for whether the outcomes were provided for almost all the participants (Domain 3). If the reasons for attrition were provided and were irrelevant to participants’ cognitive functioning, the overall rating for the domain remained “low risk.”

2.6 Meta-analysis

The goal of a meta-analysis is to estimate the overall effect of treatments across studies. However, because studies vary in the quantity and quality of information, different weight is assigned to each study (e.g., higher weight assigned to larger studies) to calculate a combined effect. Due to the variability of sample sizes and characteristics among the reports included in the current study, we used the random effect model of meta-analysis, which assumes that each study is estimating a different effect size, to estimate the mean of a distribution of true effects for each outcome.

2.6.1 Effect measures

Effects sizes were assessed through standardized mean differences (SMDs) estimated by Hedge’s g, which is less biased by small sample sizes compared to Cohen’s d (Hedges, 1981; Lin and Aloe, 2021). Similar to Cohen’s d, Hedge’s g visualizes effects by separating them into multiple levels: small (0–0.2), small-to-medium (0.2–0.5), medium-to-large (0.5–0.8), and large effects (>0.8). Hedge’s g was collected as the primary effect measure when available or calculated manually when it was not originally reported. The following formula was employed for the calculation: g=μ,−μ2n1−1s12+n2−1s22∕n1+n2−2 , where μ denotes the changes in mean during the time frame, s denotes the standard deviation of change for each group, and n stands for the sample size of each group. Change from baseline standard deviation was imputed through the following formula extracted from the Cochrane Handbook (Higgins, 2008): SDE,change=SDE,baseline2+SDE,final2−2×Corr×SDE,baseline×SDE,final , where Corr was calculated using the following steps from studies with available change-from-baseline standard deviation for the same measure. To calculate the Corr for a specific outcome measure, we obtained (1) the correlation for the experimental group CorrE=SDE,baseline2+SDE,final2−SDE,change22×SDE,baseline×SDE,final , (2) the correlation for the control group, and finally (3) using each correlation to obtain the standard deviation of change for each group. In studies with only Cohen’s d, bias-correction was applied: g= dN∕df =(1–3/(4*(n1 + n2–2) − 1)) × d (originally from Hedges, 1981 but later adjusted by Borenstein et al., 2009). For studies with solely F-statistics, g was calculated using the R package ESC (Lüdecke et al., 2019). Due to the heterogeneity and dependency of effects among measurements in the cognitive domain, a multilevel meta-analysis was performed. Specifically, results for each outcome measure (level 1) were clustered by study (level 2) to create a pooled effect size for each study (level 3). Aggregated effect sizes and confidence intervals were then calculated through the between and within cluster variances via the R package Metafor (Harrer et al., 2021).

Results were reported primarily via changes from baseline or group-by-time interactions to indicate different trajectories between groups. Effect sizes were calculated manually for most outcomes by the primary reviewer (GY) to reflect between group differences in changes and to perform standardized comparisons among studies. An average effect size was employed for cognition in each report due to the heterogeneity of assessments. Because higher scores on the Alzheimer’s Disease Assessment Scale-Cognitive Subscale (ADAS-cog) and the Trail Making Test (TMT) reflect greater impairment, changes in these scales were reversed during calculation. For mood outcomes, score changes were reversed for anxiety/depression outcomes. General study and intervention characteristics are summarized in Table 1. A summary of intervention components, which were synthesized into physical exercise, social skills, cognitive training, cognitive stimulation, and others, is presented in Table 2. Results and measures were synthesized into different outcomes and factors and are presented in Table 3 and Figures 2A,B.

Table 1. Characteristics of multimodal intervention studies for patients with mild cognitive impairment.

Table 2. Multimodal intervention components.

Table 3. Summary of findings of multimodal interventions on primary outcomes.

Figure 2. (A) Forest plot for cognition outcomes. (B) Forest plot for mood outcomes.

According to the AMSTAR 2 guidelines, studies with a high risk of bias were excluded from the meta-analysis. The overall systematic review and meta-analysis were rated as “high quality” in AMSTAR 2 (Shea et al., 2017).

2.7 Heterogeneity

Between-study variance, Ƭau2, was calculated through total variance (Cochrane’s Q), which denotes the squared deviations of each study from the combined mean, and the degrees of freedom (df). Due to the small sample size and heterogeneity across study populations, the random effects model with maximum likelihood (Borenstein et al., 2007) was employed to compute the heterogeneity and combined effect of the studies. For cognition, the aggregated model was used to indicate heterogeneity attributed to the variance across studies. In addition, to account for the impact of sample size on Q, we calculated the total proportion of variance owing to heterogeneity (I2) for each outcome (Higgins et al., 2013). In general, I2 categorizes results into low (25%), moderate (50%), or substantial (75%) heterogeneity. The analyses were performed on Metafor (Viechtbauer, 2010).

2.8 Publication bias

Publication bias generally refers to the probability of bias stemming from unpublished results of studies with non-significant data (Borenstein et al., 2009). A common way of assessing publication bias is through the level of symmetry of a funnel plot, which depicts the relationship between effect sizes and standard error in each study. Because small studies are more likely to generate non-significant results and have a larger standard error, they are less likely to be published. The funnel plot inverted the y-axis (standard error) to position these smaller studies at the bottom while placing the larger ones on the top. Thus, the top of the funnel should distribute closely to the mean effect size whereas the bottom should scatter heavily on both the left and right sides (the shape of a funnel) when there is no publication bias. Aside from the graph, we also used the modified Egger’s regression test by Pustejovsky (Egger et al., 1997; Pustejovsky and Rodgers, 2019) to assess asymmetry of the funnel plots incorporating the standard error of between group SMD using the following formula: SE∗SMDbetween=n1+n2n1n2 . The resulting value is equivalent to a z-score with a similar rejection range above 1.96 or below −1.96 for a significance level below 0.05. These tests were all performed through Metafor and Dmetar (Viechtbauer, 2010; Harrer et al., 2021) in R (R Core Team, 2014).

3 Results 3.1 Study selection

A total of 482 results were identified after a systematic search of PubMed (k = 126), Embase (k = 106), and Cochrane Library database (k = 250). Among them, 105 duplicates were removed prior to screening, which yielded 377 results for review. A preliminary abstract/title review excluded 356 articles, of which the majority were study protocols or interventions targeting combined MCI and dementia populations. In the remaining 21 reports, 10 were excluded after a full-text review. A list of excluded reports was provided in Supplementary material C. Specifically, three studies were excluded due to a lack of multimodal intervention. Two studies used comparative effectiveness analysis. In addition, four studies were excluded because the group receiving multimodal interventions was not directly compared to the double-sham control group but to other single-modal interventions, and one study lacked randomized groups. In the end, 11 clinical trials were included from the databases for review.

Manual citation searching from previous literature reviews (Chandler et al., 2016; Karssemeijer et al., 2017; Gheysen et al., 2018; Gavelin et al., 2021; Han et al., 2022; Meng et al., 2022) found 28 results that did not overlap with the primary database search. After abstract/title screening, 12 remained for full-text screening. Of those clinical trials, two multi-group studies with no direct comparison between the multimodal and control groups, one study with a wrong comparator (i.e., the control group received mixed interventions), and one report with a mixed sample of MCI and dementia patients were removed. As a result, eight studies were included in the final review. A detailed PRISMA 2020 flowchart is demonstrated in Figure 3 (Page et al., 2021b).

Figure 3. PRISMA flow diagram.

3.2 Overview

Overall, 19 journal articles were eligible for the final review, and 18 were included in the meta-analysis. One report (Troyer et al., 2008) was excluded due to the high risk of bias (Figure 1). Of these, 18 reports of cognition (n = 1,555, mean age = 73.54 years old) and seven reports of mood (n = 343, mean age = 72.08 years old) were identified. A few reports failed to include an effect size or a p value for nonsignificant results, for which certain outcomes were not included in data extraction.

Participants’ mean ages were obtained from baseline characteristics for most of the studies except for Griffiths et al. (2020), which only included the number of participants in two age groups (60–69 years) and (70–79 years). Mean ages ranged from 67.82 to 87.20 with a standard deviation of 4.26. The attrition rate ranged from 0 to 34.78% with three studies (Kounti et al., 2011; Rojas et al., 2013; Bae et al., 2019) reporting above 30% dropout rates at the end of the intervention. Details regarding age, attrition rate, intervention methods, sample size, country, and follow-up durations are presented in Table 1.

3.3 Risk of bias

Some concerns were reported for most of the studies due to the lack of published protocols for a proper comparison between the actual analysis and an analysis plan before unblinded outcome data were available (Domain 5). Other common concerning criteria included whether participants were aware of their assigned intervention during the trial (Domain 2) and whether the allocation sequence was concealed from participants until enrollment (Domain 1). Maffei et al. (2017), Lam et al. (2015), Xu et al. (2020), and Montero-Odasso et al. (2023) were the only studies that explicitly stated that participants were not informed of their group assignment until the beginning of the intervention. Troyer et al. (2008)’s randomization process (Domain 1) was rated “high risk” due to missing information regarding allocation concealment and significant group differences favoring the control group on cognitive functioning at baseline. Therefore, the study was not included in the final meta-analysis. In the end, only two studies (Lam et al., 2015; Montero-Odasso et al., 2023) received an overall rating of “low risk.”

3.4 Heterogeneity

In general, heterogeneity was low for mood (Ƭau2 = 0.046, Q(7) = 10.33, p = 0.17, I2 = 29.7%) and minimum-low for the aggregated cognition outcomes (Ƭau2 = 0.040, Q(17) = 21.71, p = 0.20, I2 = 21.7%). Study characteristics such as sample size, education, frequency of intervention, and intervention modalities might serve as potential sources of heterogeneity. Heterogeneity as indicated by I2 represented between-study variability regardless of the number of studies. In this case, studies involving either mood or cognition outcomes only differed by sampling error, which did not appear to impact the overall aggregated meta-analysis model.

3.5 Publication bias

Egger’s test with adjustment did not indicate asymmetry in the funnel plot for cognition (bias = −1.59, intercept =0.62, t(16) = −1.29, p = 0.216) or mood (bias = 0.46, intercept = 0.51, t(6) = 0.37, p = 0.727), which reflects the absence of publication bias in both outcomes (Figures 4, 5).

Figure 4. Funnel plot for cognition outcomes.

Figure 5. Funnel plot for mood outcomes.

3.6 Primary outcomes 3.6.1 Cognition

Overall, the average effect sizes for cognition ranged from −0.20 (Rapp et al., 2002) to 1.88 (Yang et al., 2022). The pooled effect size was small to medium (g = 0.44, 95% CI = [0.21–0.67]). Notably, Rapp et al. (2002) and Xu et al. (2020) were the only two studies that reported no differential cognitive improvement between groups. In addition, minimal to small improvement was found in four reports (Kounti et al., 2011; Lam et al., 2015; Bae et al., 2019; Xu et al., 2020), small to medium effect (0.20 < d < 0.50) was found in five reports (Kurz et al., 2009; Tsolaki et al., 2011; Delbroek et al., 2017; Shimada et al., 2018; Griffiths et al., 2020), and medium to large effect (0.50 < d < 0.80) was reported in six studies (Buschert et al., 2011; Rojas et al., 2013; Maffei et al., 2017; Donnezan et al., 2018; Park et al., 2019; Montero-Odasso et al., 2023). Large effects (d > 0.80) were demonstrated in the two latest studies that were both conducted in Asia (Jeong et al., 2021; Yang et al., 2022).

3.6.2 Mood

The pooled effect size for mood was medium to large (g = 0.65, 95% CI = [0.37–0.93]). While mood was commonly measured at baseline to examine group balance post randomization, it was not used as an outcome throughout follow-ups. Among all the included studies, depression was the only outcome evaluated post-intervention except for Xu et al. (2020), which demonstrated a higher reduction (p = 0.026) in anxiety with multimodal interventions. Notably, the study did not find any benefits of multimodal intervention in reducing depression. Effects sizes ranged from 0 (Xu et al., 2020) to 0.98 (Kurz et al., 2009) for depressive symptoms and large (g = 1.86) for anxiety.

4 Discussion

The purpose of this systematic review and meta-analysis was to summarize and synthesize results from current literature on the effects of multimodal cognitive and behavioral interventions on cognition and mood for pwMCI. A systematic search of three databases (PubMed, Embase, and Cochrane Library) and reference lists revealed 18 journal articles for the review (Figure 3). Unfortunately, most studies involved some risk of bias according to the RoB2 Cochrane analysis tool for parallel (Figure 1) designs due to a lack of statistical plans in a preexisting protocol or missing the blinding process. These standards are high, however, for behavioral trials. Behavioral trials have only recently adopted standards regarding registration of protocols and data analysis plans. Such standards have historically been ‘optional’ for behavioral trials while regulatory organizations (e.g., the Food and Drug Administration) have required them for medication trials. Similarly, blinding is a real challenge for behavioral trials. It is impossible to blind a person to treatment when that treatment requires active engagement in physical exercise, cognitive training, psychotherapy, or the like. Rather, behavioral trials must attempt to be contended with expectancy (aka placebo) and practice effects by using active control groups and/or contact-time controls as was done in a few of the trials described above. Our preference for ‘untreated’ controls in systematic reviews and meta-analyses may therefore invite higher estimates of bias in behavioral studies. All the studies included cognition as an outcome variable while seven studies reported findings on mood. Results indicated low heterogeneity in cognition even after nesting outcomes within studies and in mood. Funnel plots and the adjusted Egger’s test both supported the lack of publication bias in both outcomes. However, since there were fewer than 10 reports for mood, the results might not obtain sufficient power.

Overall, multimodal cognitive and behavioral interventions for pwMCI had a small to medium effect (k = 18, g = 0.41, 95% CI = [0.21–0.67]) on cognition. Due to the complexity and diversity of cognitive outcomes, effect sizes were aggregated from available cognitive scores. Therefore, a post hoc analysis of focused cognitive domains was conducted. Specifically, global cognition improved in most of the studies (k = 14) except for Delbroek et al. (2017), Xu et al. (2020), and Bae et al. (2019). A subgroup meta-analysis demonstrated a small-moderate effect on global cognition (k = 14, g = 0.31, 95% CI = [0.09, 0.52]) (Figure 6A). However, benefits observed by the end of treatment might not be preserved in the long term. In the follow-up study Buschert et al. (2012) noted that the significant main effect of MMSE (F(1,18) = 8.50, p < 0.01,η2 = 0.23) observed in Buschert et al. (2011) mitigated at 15-month and 28-month (F(1,16) = 4.91, p = 0.041, η2 = 0.23) while ADAS-cog stably improved (F(1,18) = 6.38, p = 0.021, η2 = 0.26).

Figure 6. (A) Forest plot for global cognition. (B) Forest plot for executive function. (C) Forest plot for verbal memory. (D) Forest plot for non-verbal memory. (E) Forest plot for visuospatial ability. (F) Forest plot for semantic fluency.

Verbal (k = 8) and non-verbal memory (k = 2) were also commonly measured. Similarly, small-moderate effects were found in each domain (verbal memory (g = 0.20, 95% CI = [−0.03, 0.44]) and non-verbal memory (g = 0.45, 95% CI = [−0.24, 1.15])). See Figures 6C,D. In general, almost all the studies that included cognitive training also included memory as one of the major targeted training domains. Therefore, it was not surprising to observe improvement in verbal and nonverbal memory tests across studies with only one exception (Rojas et al., 2013). Nevertheless, instead of traditional memory training, Rojas et al. (2013) emphasized episodic memory encoding strategies via visual imagery, semantic knowledge, and executive control. This approach was commonly used to improve the speed of processing, attention, and useful memory instead of verbal memory. Aside from cognitive stimulation, cognitive training provided in this intervention involved theoretically motivated cognitive strategies to improve metacognition and self-efficacy in taking control of cognition. Thus, while the authors did not explain the lack of improvement of verbal memory, a potential reason might be related to the reduced capability to sufficiently exploit learned memory skills due to declined executive function and semantic ability.

Benefits on other cognitive domains have also been demonstrated repeatedly across studies (e.g., executive function (k = 9, g = 0.30, 95% CI = [0.09, 0.51]) and visuospatial skills (k = 4, g = 0.28, 95% CI = [−0.25, 0.81])). See Figures 6B,E. Training using dual-task games (e.g., playing memory games while pedaling) revealed significant improvements in executive function including speed of processing, reasoning, and inhibition. For example, Jeong et al. (2021) asked participants to complete cognitive tasks such as speaking and counting while doing fifty-minute of aerobic exercises and found improvement in processing speed, particularly in fast switching between letters and numbers (TMT-B; p < 0.01) or matching symbols to numbers according to a key (Digit Symbol Substitution Test; p < 0.01). The authors attributed this improvement to increased regular physical exercises and argued that changes in executive function were important for dementia prevention because both executive function and attention were significant predictors of AD in pwMCI (Jacobs et al., 1995).

Verbal fluency measured through semantic and category fluency tests was the domain with the lowest pooled effect size compared to other domains (g = 0.45, 95% CI = [0.18, 0.73]) (Figure 6F). Among the studies that assessed changes in verbal fluency and confrontational naming, two (Shimada et al., 2018; Griffiths et al., 2020) reported significant improvements while one noted comparable changes in both groups (Rojas et al., 2013; control: mean change = 2.40, p < 0.01; intervention: mean change = 2.40, p < 0.01). Similar to executive function, lower verbal fluency scores in older adults with MCI could predict progression to AD. Thus, while only a few studies investigated the interaction between group and time (Kounti et al., 2011; Lam et al., 2015; Shimada et al., 2018), the superior beneficial effects supported the importance of multimodal intervention in delaying AD progression. However, a longitudinal follow-up is still warranted in these domains.

The pooled effect sizes of mood were medium to large (k = 7, g = 0.65, 95% CI = [0.37–0.93]). Two studies found no significant improvements in depressive symptoms (Bae et al., 2019; Xu et al., 2020). Notably, the improvement in mood observed in Buschert et al. (2011) was not seen at either the 15- or 28-month follow-up (Buschert et al., 2012). While multiple potential explanations were postulated by the authors, social engagement in the controls seemed to play an essential role in the studies that failed to demonstrate changes in depressive symptoms. For example, after providing group-based health education classes to the control group, Bae et al. (2019) found no between group differences in mood at the end of the intervention, which might be related to increased social engagement in both groups. Yang et al. (2022) also mentioned the comforting and supportive environment group-based interventions have provided to the patients, which might also benefit their mood symptoms. Aside from social connections, using elements of psychotherapy also appeared to improve mood in pwMCI. For instance, Kurz et al. (2009) offered extensive psychotherapy training including self-assertiveness and stress management and found a 50% reduction of depressive symptoms in the intervention group with a large effect size (g = 0.98). Another factor that might assist in explaining the variable results in mood was concentration difficulties. Items regarding concentration and activity level were commonly presented in depression scales, which could in turn be affected by existing cognitive deficits. Thus, Buschert et al. (2011) removed these items from their analysis and indicated that an improvement in depression might also improve the speed of processing or sustained attention. While depression improvement was not clinically significant in several reports, studies suggested that it might reflect enhancement of self-esteem and well-being, which can further benefit cognitive performance (Buschert et al., 2011).

4.1 Clinical implications

In the past decade, clinical trials on pharmacological interventions have not demonstrated improvement in cognition for pwMCI (Ströhle et al., 2015; Fink et al., 2018). While the FDA has recently approved Aducanumab for early stages of AD, findings did not support cognitive benefits in pwMCI (Knopman et al., 2021). Even in RCTs that showed cognitive improvement of donepezil (SMD = -0.90), the benefit was rather subtle (1 point between group difference on the 89-item ADAS-cog scale) (Doody et al., 2009). In addition, research has emphasized the frequent treatment-emergent adverse events such as diarrhea, nausea, abnormal dreams, and even increased mortality in the treatment group (Winblad et al., 2008; Doody et al., 2009). A meta-analysis of 41 RCTs has suggested small to moderate effect sizes of cholinesterase inhibitors on cognitive function (SMD = 0.10–0.46) (Cooper et al., 2013). Thus, results from this meta-analysis showed generally comparable or larger effects of multimodal nonpharmacological interventions on cognition and mood, which are consistent with previous reports (Sherman et al., 2017) and further supported the utility of these interventions to maintain functionality and facilitate adjustment to cognitive changes.

4.2 Limitations of the studies

Studies failed to mention the race and ethnicity of participants, mainly due to the homogeneity of the populations. Impacts of racial/ethnic background on the effects of multimodal or single-modal interventions have not yet been studied. Another limitation of the studies pertains to the absence of control of repeated measure effects except for Kurz et al. (2009). Because most interventions were conducted within a short time frame, a repeated testing effect at the end of the intervention, especially in cognitive tasks, might have mediated the observed changes post-intervention (Roediger and Payne, 1982). Furthermore, only one report included dementia conversion rate as an outcome (Rojas et al., 2013). Conversion to dementia was seen in one trained and three non-trained patients at the 12-month follow-up, and significant declines in global cognition were seen in the non-trained group at the six-month follow-up assessment (Rojas et al., 2013). However, since the conversion rate was low in both groups and no significant improvement was observed in the intervention group immediately after the intervention, the results need further examination to determine whether long-term effects were present. Thus, a longitudinal analysis of whether these multimodal interventions have delayed dementia progression is needed.

The Yang et al. (2022) study was found to be an outlier on the Funnel plot, indicating potential heterogeneity/publication bias. Findings in the study suggested significant cognitive improvement in the intervention group but a decline in untreated controls. Despite observed deviations from other studies, further evaluation of study population, methodology, interventions, and outcomes did not demonstrate evidence of bias or poor data quality. Therefore, we speculated that the distinctive results might stem from the relatively intense schedule for a long intervention period (6 months). The study was also unique in its short and frequent follow-ups (1-, 3-, and 6-month follow-ups). However, these hypotheses might not completely explain the reason for the deviation, and the results of Yang et al. (2022) should be interpreted with caution.

4.3 Limitations of the review

One of the limitations of this review is the lack of consensus in MCI diagnostic criteria across reports. Most studies included older adults with an MCI diagnosis regardless of subtype. However, four reports included only single or multidomain aMCI (Buschert et al., 2011; Park et al., 2019; Jeong et al., 2021) and one used the term mNCD and MCI interchangeably (Griffiths et al., 2020). Additionally, this study did not investigate the effects of different modes of delivery (simultaneous vs. sequential). Sequential designs were defined as delivering intervention modalities in separate sessions during the same period (e.g., exercise followed by cognitive training). In contrast, simultaneous designs were usually delivered by asking participants to perform certain cognitive tasks while exercising at the same time or by using exergaming. Most of the interventions in the current review delivered different modalities through a sequential design whereas several dual-task trainings were administered using exergaming (Delbroek et al., 2017; Donnezan et al., 2018; Shimada et al., 2018; Park et al., 2019). In healthy and cognitively impaired older adults, simultaneous training was found to be more efficacious for cognition than sequential combinations of physical exercises and cognitive training (g = 0.32–0.38) (Zhu et al., 2016; Gheysen et al., 2018; Gavelin et al., 2021). However, whether simultaneous or sequential delivery is superior in pwMCI has yet to be studied. An analysis to compare the modes of delivery was beyond the scope of this review. Future research could focus on differences in efficacy associated with modes of delivery.

Another limitation pertains to the number of databases searched in the study. We only searched three major databases. However, research shows that using Embase combined with PubMed can cover approximately 88% of the available literature (Frandsen et al., 2021). Previous studies have also indicated high coverage rates when combining the Cochrane Library and EMBASE (88% in hypertension systematic review) (Rathbone et al., 2016) or the three search engines (97% in orthopedic research) (Slobogean et al., 2009). Additional bibliographic databases did not provide unique records when two or three of the above databases were searched due to significant overlaps across databases (Royle and Milne, 2003; Hirt et al., 2021). The Cochrane Library was also found to have the highest precision rate in literature reviews and to be sensitive in identifying RCTs (Royle and Milne, 2003). Therefore, a combination of these three databases and a manual reference search were considered sufficient to identify all the studies meeting our inclusion criteria.

4.4 Conclusion and future research

Studies of multimodal cognitive and behavioral interventions on pwMCI demonstrated small to moderate positive effects on cognition and mood. A few directions for future research are postulated: (1) including long-term follow-ups to evaluate adherence and efficacy in delaying dementia conversion, (2) comparing effects of similar interventions in patients from diverse racial/ethnic backgrounds to inform adjustment in designs, and (3) considering simultaneous vs. sequential modes of delivery.

Data availability statement

The original contributions presented in the study are included in the article/Supplementary material, further inquiries can be directed to the corresponding author.

Author contributions

GY: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Software, Validation, Visualization, Writing – original draft, Writing – review & editing. AP-L: Methodology, Resources, Validation, Writing – review & editing. MM: Data curation, Formal analysis, Methodology, Supervision, Writing – review & editing. S-AL: Supervision, Writing – review & editing. GS: Conceptualization, Funding acquisition, Methodology, Resources, Supervision, Validation, Writing – review & editing.

Funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This study was supported in part by NIA grant P30AG066506.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fnagi.2024.1390699/full#supplementary-material

References

Albert, M. S., DeKosky, S. T., Dickson, D., Dubois, B., Feldman, H. H., Fox, N. C., et al. (2011). The diagnosis of mild cognitive impairment due to Alzheimer’s disease: recommendations from the National Institute on Aging-Alzheimer’s association workgroups on diagnostic guidelines for Alzheimer’s disease. Alzheimers Dement. 7, 270–279. doi: 10.1016/j.jalz.2011.03.008

PubMed Abstract | Crossref Full Text | Google Scholar

American Psychiatric Association (2013). DSM 5 diagnostic and statistical manual of mental disorders. Washington, DC and London, England: American Psychiatric Publishing, 947.

Google Scholar

Amofa, P. A., Locke, D. E. C., Chandler, M., Crook, J. E., Ball, C. T., Phatak, V., et al. (2021). Comparative effectiveness of behavioral interventions to prevent or delay dementia: one-year partner outcomes. J. Prev. Alzheimers Dis. 8, 33–40. doi: 10.14283/jpad.2020.59

PubMed Abstract | Crossref Full Text | Google Scholar

Artero, S., Petersen, R., Touchon, J., and Ritchie, K. (2006). Revised criteria for mild cognitive impairment: validation within a longitudinal population study. Dement. Geriatr. Cogn. Disord. 22, 465–470.

Google Scholar

Bae, S., Lee, S., Lee, S., Jung, S., Makino, K., Harada, K., et al. (2019). The effect of a multicomponent intervention to promote community activity on cognitive function in older adults with mild cognitive impairment: a randomized controlled trial. Complement. Ther. Med. 42, 164–16