Psychometric evaluation of the computerized battery for neuropsychological evaluation of children (BENCI) among school aged children in the context of HIV in an urban Kenyan setting

The two test batteries were administered among 274 children living with HIV and 330 children without HIV with a mean age of 9.48 (SD = 1.31), of which roughly half were male. Table 2 summarizes the demographics of participants in the two groups. The second assessment of the BENCI among 38 Children not living with HIV consists of 21 females, with a mean age of 9.18 (SD = 1.21).

Table 2 Socio demographic informationScale attenuation effects

Using correlational and descriptive statistics including histograms, we evaluated attenuation patterns in the BENCI tests. Eight of the BENCI subtests exhibited ceiling and floor effects that tend to suppress correlations and reliabilities. Specifically, on Verbal Comprehension Figures, 30% (N = 181) of the sample scored the highest possible score of 8 hits, while on Verbal Comprehension Images hits, 51% (N = 308) of the sample scored the highest possible score of 8 hits. Other subtests with ceiling effects included Continuous Performance hits, Go No Go hits, Working Memory hits, and Spatial Stroop. Both Verbal Memory Recognition 13.4% (N = 16) and Visual Memory Recognition 16.7% (N = 20) showed some ceiling effects meaning that the number of participants having the highest scores was almost equal to those with average scores. At the same time, floor effects were evident on the Planning Time of First Option and Spatial Stroop errors scores. Semantic Fluency hits 13.4% (N = 16), Phonetic Fluency hits 16% (N = 19), Verbal Memory Delayed hits 16.8% (N = 20), and Planning time total 33.9% (N = 38) showed some floor effects. This meant that the number of participants having the lowest scores was almost as equal to those with average scores. The floor and ceiling effects highlighted that these subtests psychometric functioning could be improved by adding easier and more difficult items, respectively, in any future revisions of the BENCI. The remaining BENCI subtests showed no such attenuation effects.

Internal consistency of the BENCI

We computed the Cronbach’s Alphas (KR-20 s) for seven of the subtests with dichotomous item scores. The internal consistency of the BENCI subtests varied from poor to excellent reliability. As shown in Table 3, the Language Comprehension tests, Verbal Comprehension Images, and Figures, had the fewest items (N = 8) and Cronbach Alpha 0.49 < α < 0.68 which was the lowest among the other BENCI subtests. Low Cronbach Alphas tend to suppress correlations, but most of the BENCI subtests had high Alphas. The Abstract reasoning, Planning, Go No Go, Spatial Stroop, and Processing Speed tests correlated well with themselves (0.75 < α < 0.97 or alpha range from 0.75 to 0.97) hence showing that there was little random measurement error.

Table 3 BENCI items internal consistency

Possibly due to the ceiling effects being less severe because of lower mean scores, we found Verbal Comprehension Figures and Images tests to show higher internal consistencies among children living with HIV (0.57 < α < 0.68) than among children not living with HIV (0.35 < α < 0.56), whose scores were more affected by the ceiling effect. In the Abstract reasoning, Planning, Go No Go, Spatial Stroop, and Processing Speed sub-tests the items had acceptable and excellent (0.76 < α < 0.97, or alpha range from 0.76 to 0.97) internal consistency showing that the tests are reliable for both children living with HIV and those not living with HIV, as shown in Table 4. The Alphas in the latter tests were higher in the lower-scoring sample of children living with HIV than in children not living with HIV due to less severe attenuation effects in the former group.

Table 4 Reliability test–retest of the BENCI batteryTests retest reliability of the BENCI

Table 4 presents the Intraclass Correlation (ICC) of the test and retest scores of the BENCI and the Pearson correlations between the repeated measurements among the 38 children not living with HIV. The Intraclass correlation for specific tests ranged from -0.34 to 0.81. The coefficients were rather high in Sustained Attention RT, Immediate Visual Memory, and Alternate Visuo-motor Coordination (ICC range from 0.74 to 0.81, r = 0.68—0.62). Moderate correlations were found in Immediate Verbal Memory, Delayed Visual Memory, and Visual Recognition Memory (ICC range from 0.52 to 0.58, r = 0.39—0.38). Test retest reliability was poor for Go/No-Go (RT), Sustained Attention CA, and Reasoning (ICC range from 0.14 to -0.34, r = 0.08—-0.15).

The test–retest reliability results showed that most of the tests were consistent on the two occasions (2 months in between t1 and t2). With clear significant gains in performance as expected by increasing test familiarity and maturation for fifteen out of nineteen subtests, except for Sustained Attention CPT, Verbal Recognition Memory (CA), Reasoning (CA), and Go/No-Go (RT) that showed no clear improvements in mean performance.

Convergent validity

Table 5 presents the correlations between corresponding BENCI and Kilifi toolkit tests. The attention, memory, inhibition/planning, reasoning, and flexibility tests in the BENCI and Kilifi were expected to correlate. However, some of these tests did not correlate as expected due attenuation effects, while others correlated as expected despite the attenuation effects.

Table 5 BENCI – Kilifi toolkit convergent validity

In domains of reasoning, several inhibition, and a few memory-related tests in the BENCI were positively correlated with tests in Kilifi toolkit, supporting convergent validity across these domains. The BENCI’s Working Memory test was expected to correlate with Kilifi’s Self-Ordered Pointing Test (SOPT) because they both measure working memory. However, the BENCI Working Memory test did not have a significant correlation with Kilifi’s working memory test, Self-Ordered Pointing Test (SOPT). This could be because the BENCI Working Memory test showed ceiling effects and might have been too easy for most test takers.

Kilifi’s Verbal List Learning Test and Nonverbal Selective Reminding Memory test were expected to correlate with the BENCI’s Verbal Memory and Visual Memory tests because they all measure memory. However, none of the BENCI’s memory tests had a significant correlation with Kilifi’s Nonverbal Selective Reminding Memory Test (NVSRT). Moreover, the BENCI’s Verbal Memory Recognition and Visual Memory Recognition tests had no significant correlation to any of Kilifi’s memory tests. This outcome could be because the BENCI’s Verbal Memory Recognition and Visual Memory Recognition tests had some ceiling effects while Kilifi’s NVSRT had floor effects. However, the BENCI’s Verbal Memory Immediate hits had a significant correlation with Kilifi’s Verbal List Learning’s (VLL) Immediate Memory Span (r = 0.37), Level of Learning (r = 0.40) and Total correct answers (r = 0.41). In addition, the BENCI’s Verbal Memory Delayed Trial was also significantly correlated with Kilifi’s Verbal List Learning’s Immediate Memory Span (r = 0.21). Moreover, the BENCI’s Visual Memory Immediate hits had a significant correlation with Kilifi’s Verbal List Learning’s (VLL) Immediate Memory Span (r = 0.23), Level of Learning (r = 0.34) and Total correct answers (r = 0.32). In addition, BENCI’s Visual Memory Delayed Trial was also significantly correlated with Kilifi’s Verbal List Learning’s (VLL) Level of Learning (r = 0.23) and Total correct answers (r = 0.25). The significance was found despite the BENCI’s Verbal Memory Delayed showing some floor effects. The rest of the memory tests in the BENCI and Kilifi had no ceiling or floor effects. The correlation between Kilifi’s Verbal List Learning’s (VLL) Level of Learning and Total correct answers and the BENCI’s Reasoning test was not expected. As expected, the BENCI Abstract Reasoning Test significantly correlated with Kilifi’s Raven’s Progressive Matrix (RPM) (r = 0.21). Both reasoning tests had no attenuation effects.

Kilifi’s People Search test and FNRT test were expected to correlate with BENCI’s Continuous Performance test and Spatial Stroop Attention test because they all measure attention. Among the attention tests, the BENCI sustained attention test, Continuous Performance hits and reaction time test, did not have a significant correlation with Kilifi’s visual sustained and selective attention—People Search test (r = -0.10; r = 0.12), as well as auditory sustained and selective attention test—Forward Digit Span total score (r = -0.14; r = 0.07). People Search test had floor effects while Continuous Performance hits had ceiling effects. Moreover, the BENCI tests that contain an attention component, Reasoning (r = -0.37) and Working Memory (r = 0.19) were also significantly correlated to Kilifi’s People Search. Kilifi’s People Search and its correlation with the BENCI’s Reasoning and Working Memory tests was unexpected as these BENCI tests are not primarily meant to measure attention.

BENCI’s Spatial Stroop was expected to correlate with Kilifi’s Contingency Naming test (CNT) because they both measure flexibility. However, the Spatial Stroop test, had no significant correlation with the Contingency Naming test (CNT) (r = 0.03). The Spatial Stroop test showed ceiling effects while CNT had no attenuation effects.

Kilifi’s Tower Test was expected to correlate with the BENCI’s planning test because they both measure inhibition. This is indeed the case, as the BENCI Planning Total Time test had a significant association with Kilifi’s Tower test (r = -0.21). However, BENCI’s Planning Time of First Option test had no significant association with Kilifi’s Tower test (r = -0.11). This results should be interpreted cautiously because the BENCI’s Planning Total Time test had some floor effects while the Planning Time of First Option had floor effects indicating that items were relatively difficult for our test takers.

Overall, in the reasoning domain, much convergence between the BENCI and Kilifi Toolkit was supported, whereas in the memory and inhibition domains there was only partial convergence. Subtests in the flexibility, attention, and working memory domains showed little convergent validity with the Kilifi mostly because of attenuation effects.

The BENCI functionality in age and HIV groups

As can be seen in Table 6, children not living with HIV outperformed those living with HIV on all BENCI tests. However, the mean group difference was significant in all subtests except Continuous Performance Test hits and reaction time, Go No Go hits, Verbal Memory Recognition hits, and Planning total time.

Table 6 Mean group differences in BENCI subtests responses

We checked whether the performance within the BENCI subtests aligned with developmental models’ expectation of growth in cognitive performance as children aged, and report Pearson correlations between age in years and the BENCI subtest performance for the children living with HIV- and those not living with HIV separately in Table 7. We hypothesized that children not living with HIV would significantly outperform those living with HIV. Among the children living with HIV, there were significant associations in the expected direction between age and Verbal Comprehension Images hits, Verbal Memory hits, Verbal Memory Recognition hits, planning total time, Planning Time of First Option, Abstract Reasoning hits, Visual Memory Immediate hits, Visual Memory Recognition hits and Spatial Stroop omission errors. Among children not living with HIV, there was a significant association between age and Continuous Performance reaction time, Processing Speed reaction time, Verbal Memory hits, Abstract Reasoning hits, and Visual Memory Delayed hits. The lack of significant correlations between some cognitive indicators and age could be because of attenuation effects, but might also relate to sampling issues (e.g., older participants appearing in the sample because of delayed development and the repeating of grades in school).

Table 7 Age correlations in BENCI subtests responsesConfirmatory factor analyses

We tested the construct validity of Executive Functioning as proposed by Diamond for normal development [30]. According to his model, the subtests that measure inhibition, flexibility, reasoning, memory, and fluency together constitute executive functioning [30]. These are tests that evaluate the ability to make decisions, exercise self-control, pay attention, be creative, solve problems, and plan towards having good health and success in life. These are considered core functions in the brain hence the name executive functions. We fitted a confirmatory factor analysis model previously fitted successfully in the Arabic version of the BENCI [20] and sought to adjust the model slightly to improve fit if necessary.

A second-order model with Executive Functioning as a second-order latent factor and five first-order latent factors (i.e., Fluency, Reasoning, Memory, Inhibition and Flexibility) measured by the specific the BENCI subtests (Fig. 2) was specified and tested with the pooled sample including missingness handled by Full Information Maximum Likelihood. The model fit indexes suggested a good fitting model (χ2 (100, N = 604) = 245.55, p < 001, RMSEA = 0.049, CFI = 0.908, TLI = 0.875). However, this model had several issues. First, the Fluency factor was estimated to have a negative residual variance that we fixed at zero. Second, in this revised model, the Verbal memory factor also yielded an estimate negative residual variance that we treated similarly by fixing it at zero. Third, in the third model, the residual variance of the Alternate Visuo-motor total time also needed to be fixed to zero. Next, we considered modification indices and found that the model could be improved if we included a covariance between the residuals of Reasoning and Flexibility and between the residuals of Semantic Fluency correct answers and Verbal Memory Recognition correct answers. This further modified model showed an acceptable fit (χ2 (101, N = 604) = 205.73, p < 0.001, RMSEA = 0.041, CFI = 0.934, TLI = 0.911). Figure 2 presents the standardized factor loadings. An inspection of this model showed that not all indicators of Inhibition (Go No Go RT = λ -0.46; Go No Go CA = λ 0.74) had significant loadings on their respective factor, indicating that these specific tests did not measure Inhibition as intended (Fig. 2). It also showed that the latent factor of Inhibition did not load on the Executive Functioning factor. Therefore, we removed the Inhibition factor together with its indicators and tested a second-order factor with only four factors. This model fitted well (χ2 (51, N = 604) = 135.57, p < 0.001, RMSEA = 0.052, CFI = 0.944, TLI = 0.914). Figure 2 presents the factor loadings of this model. Therefore, the five components of Executive Functioning as validated before did not all show up in the Kenyan sample, while Executive functioning comprised of fluency, reasoning, verbal memory, and flexibility was found to fit well in the Kenyan sample. The final model with four factors each measuring executive functioning supports the construct validity for the BENCI battery, despite Heywood cases on the Alternative Visuo-motor subtest.

Fig. 2figure 2

1: Five Factor Executive Function Model (χ2 (100, N = 604) = 245.55, p < 001, RMSEA = .049, CFI = .908, TLI = .875). 2 Five Factor Executive Function Model (χ2 (101, N = 604) = 205.73, p < .001, RMSEA = .041, CFI = .934, TLI = .911) ns – not significant. 3 Four Factor Executive Function Model (χ2 (51, n = 604) = 135.57, p < .001, RMSEA = .052, CFI = .944, TLI = .914)

AMOS treats missing data using full information maximum likelihood, which is considered a robust method for treating missing data. However, we checked whether model fit would be affected when using a dataset with no missing data. On running the model with no missing data, the model fit was excellent (χ2 (51, N = 327) = 64.07, p > 0.05, RMSEA = 0.028, CFI = 0.968, TLI = 0.958). This shows that the BENCI does have good construct validity though some changes in some test items and instructions are needed in future revisions of some subtests.

Measurement invariance

We set out to test whether the BENCI behaves the same way across the HIV-positive (N = 274) and HIV-negative groups (N = 330) using measurement invariance testing with multi-group confirmatory factor analysis. We used the factor model that was identified as having an excellent fit using the pooled sample as the basis and modified it to have only the four correlated first-order factors (i.e., Fluency, Reasoning, Memory, and Flexibility, each of them had their observed indicators) but no second-order factor (which is not required for testing measurement invariance). The model fit was excellent (χ2 (47, n = 604) = 107.76, p < 001, RMSEA = 0.046, CFI = 0.960, TLI = 0.933) as shown in Fig. 3.

Fig. 3figure 3

Four Factor First Order Model (χ2 (47, n = 604) = 107.76, p < 001, RMSEA = .046, CFI = .960, TLI = .933)

We first tested for configural invariance where all factor loading, item intercepts and residual parameters were freely estimated. The model fit indexes suggested a well-fitting model (χ2 (94, N = 604) = 175.09, p < 0.001, RMSEA = 0.038, CFI = 0.941, TLI = 0.902). The factor loadings of all the indicators in both groups were significant.

We then specified a model for metric invariance where all the factor loadings were restrained to be the same across the two groups and all the other parameters were freely estimated. This model had a good fit (χ2 (102, N = 604) = 198.35, p < 0.001, RMSEA = 0.040, CFI = 0.930, TLI = 0.893). On comparing the configural to the metric invariance model, we found that there was no statistically significant difference between the chi-square values, suggesting that the metric invariance was supported (Δχ2 = 23.26, DF = 8, p = 0.003). This meant that the factor loadings were invariant and the indicator items across groups have the same associations with the latent constructs. Differences in other fit indexes also showed that the metric invariance was tenable (ΔCFI from configural to metric model < 0.01).

A scalar invariance model was then specified where the item intercepts and factor loadings were restrained to be the same across groups, while the latent mean of the latent factors in the HIV-positive group was released (with an aim to check latent mean differences in flexibility, fluency, verbal memory, and reasoning). This model had a poorer fit compared to the metric invariance model (χ2 (110, N = 604) = 245.12, p < 0.001, RMSEA = 0.045, CFI = 0.901, TLI = 0.860). On comparing this scalar invariance model to the metric invariance model, there was a worsening fit due to constraints on the intercepts; this was due to a statistically significant difference between the chi-square values of the scalar invariance and metric invariance model (Δχ2 = 46.77, DF = 8, p < 0.001). The CFI difference also showed that the scalar invariance was not holding across all subtests (ΔCFI = 0.029). This indicates that some intercepts were not invariant and that these subtests are uniformly biased.

Using modification indices, we then specified a partial scalar invariance model where we constrained one intercept for each indicator at a time and tested whether this restraint resulted in a significant chi-square difference. For items for Verbal Comprehension (figures) CA and Visual Memory Delayed CA, the tests showed significant chi-square difference hence we freely estimated these two intercepts across groups while holding the rest of the intercepts and factor loadings to be the same across groups. This partially invariant model fitted well (χ2 (108, N = 604) = 218.38, p < 0.001, RMSEA = 0.041, CFI = 0.920, TLI = 0.884). The fit for the partial scalar invariance was better than the strict scalar invariance, and the difference between the chi square values between this model and the metric invariance model shows that partial scalar invariance fits reasonably well (Δ χ2 = 20.03, DF = 6, p > 0.001). The CFI difference also showed that the partial scalar invariance was tenable (ΔCFI 0.010).

To summarize the series of measurement invariance tests, we conclude that metric invariance is achieved indicating that factor loadings of the BENCI are comparable across the HIV-positive and HIV-negative samples, and we can compare the association of the BENCI with other invariant constructs across the two groups, but not the mean comparisons of Verbal Comprehension (figures) CA and Visual Memory Delayed CA. These subtests are not well-calibrated. A partially scalar invariant model fitted the data reasonably well meaning you could compare mean difference for most of the subtests with caution for Verbal Comprehension Figures CA and Visual Memory Delayed CA.

留言 (0)

沒有登入
gif