Clustering of antipsychotic-naïve patients with schizophrenia based on functional connectivity from resting-state electroencephalography

Participants

We included 37 antipsychotic-naïve, first-episode patients with schizophrenia and 97 matched HC recruited in two consecutive multimodal cohorts: The Pan European Collaboration on Antipsychotic Naïve Schizophrenia (PECANS, ClinicalTrials.gov Identifier: NCT01154829) and the Pan European Collaboration on Antipsychotic Naïve Schizophrenia II (PECANSII, ClinicalTrials.gov Identifier: NCT02339844) as previously described in [10, 11]. For full description, see www.ClinicalTrials.gov. The studies were approved by the Regional Danish Committee on Health Research Ethics (H-D-2008-088, H-3-2013-149). The patients were referred from in- and outpatient clinics in the Capital Region of Denmark. HC were recruited from the community of the Capital Region through online advertisement and were matched with respect to sex, age, and parental socioeconomic status. Inclusion criteria for patients were a diagnosis of psychosis spectrum according to ICD-10 or DSM-IVR and lifetime naïve to antipsychotic exposure. However, in the current study, we only included the patients with a F20.X diagnosis. Exclusion criteria for the patients were current drug-dependence (except for nicotine), organic brain damage, previous impact-related unconsciousness, contraindications for antipsychotic treatment, and intellectual disability (IQ < 70). Exclusion criteria for the HC were psychiatric diagnosis, psychiatric diagnosis in first-degree relatives, current drug abuse, and intellectual disability. All participants provided written informed consent.

Psychopathology and cognition

Patients’ psychopathology was assessed by trained raters using the Positive and Negative Syndrome Scale (PANSS) [16]. One patient was not assessed with PANSS, resulting in 36 patients in the analyses including PANSS.

All participants were assessed with a comprehensive neurocognitive test battery by trained raters. Premorbid and current intelligence (IQ) were estimated using the Danish version of the National Adult Reading Test (DART) [17] and four subtests (vocabulary, similarities, block design, and matrix reasoning) from Wechsler Adult Intelligence Scale (WAIS-III) [18]. The Brief Assessment of Cognition in Schizophrenia (BACS) [19] was used to measure verbal fluency, working memory, verbal memory, motor skills, processing speed, and planning. Moreover, subtests from the Cambridge Neuropsychological Test Automated Battery (CANTAB) [20] were used to examine spatial span (SSP), spatial working memory (SWM), planning (Stockings of Cambridge [SOC]), mental flexibility (Intra-Extra Dimensional set shifting [IED]), sustained attention (Rapid Visual Information Processing [RVP]), and Reaction Times (RTI); see, e.g., [21,22,23,24,25,26].

EEG recordings

As a part of the large multimodal studies, the participants were examined with the Copenhagen Psychophysiology Test Battery [27]–[29]. After the evoked-related paradigms, participants underwent 10 min of continuous rsEEG. The rsEEG was recorded with a BioSemi ActiveTwo system (BioSemi B. V., Amsterdam, The Netherlands) with 64 active electrodes arranged according to the extended 10–20 system and with a sampling frequency of 2048 Hz. During acquisition, participants were seated in a comfortable armchair in a sound insulated cabin (40 dB). Participants were instructed to sit still, relax, and keep eyes closed. To minimize the acute and/or withdrawal effects of caffeine and nicotine, withholding from coffee intake on the test day and smoking 1 h prior to the assessment was required. A urine sample was used to screen for cannabis, cocaine, opiates, and amphetamines (Syva® RapidTest d.a.u® 4). Participants with positive urine screening or intake of benzodiazepines on the test day were excluded.

EEG preparationPreprocessing

The preprocessing of the raw data was carried out in Matlab (version 9.6.0.1072779 (R2019a), The MathWorks Inc., Natick, Massachusetts, USA) using the EEGLAB environment (version 2019.1) [30]. A description of the preprocessing is provided in Supplementary Material and an overview of the preprocessing steps is provided in Supplementary Figure S1.

Source localization

The source localization was carried out in Python (version 3.7.7) using the MNE package (version 0.19.2) [31]. First, the forward solution was computed using FreeSurfer's average head model [32]. Second, the inverse model was computed for the preprocessed data using ‘exact Low Resolution Brain Tomography’ (eLORETA) [33]. Using the inverse solution, the EEG sources were reconstructed resulting in a high-dimensional matrix. To reduce dimensionality, the source time-series were mapped to the regions of interest (ROIs) using PCA-flip, resulting in a single signal for each region. The ROIs were chosen as key regions within the DMN and defined by the CONN network parcellation based on data from the Human Connectome Project and available in the CONN toolbox [34]. The six ROIs included Medial Prefrontal Cortex (MPFC), Precuneus cortex (PCC), and Lateral Parietal (LP) cortex in each hemisphere.

Connectivity measures

The connectivity within the DMN was determined by the Phase Lag Index (PLI) [35], calculated using the python module dyconnmap version 1.0.2 [36]. PLI effectively eliminates volume conduction, which is a common problem when estimating functional connectivity based on EEG [35]. The data were divided into epochs of 8 s and the PLI was calculated for the first 40 epochs and averaged. The epoch duration was fixed, and the same number of epochs was used for each participant as recommended [37]. The frequency bands analyzed were delta (1.5–4 Hz), theta (4–8 Hz), alpha (8–12 Hz), and beta (12–30 Hz). The gamma band was excluded from analyses because of the risk of contamination from muscle signals [38]. The connectivity estimated by PLI is undirected; hence, the connectivity matrices are symmetric, and we therefore only consider the upper triangle. Following, we get \(\frac=15\) connectivity values per frequency band per participant.

Modeling

All modeling was carried out in Matlab (version 9.6.0.1072779 (R2019a). The MathWorks Inc., Natick, Massachusetts). A flowchart of the study is provided in Fig. 1.

Fig. 1figure 1

Flowchart illustrating the different steps in the analysis pipeline. rsEEG resting-state electroencephalography, SZ schizophrenia patients, HC healthy controls, DMN Default Mode Network, PLI Phase Lag Index, PCA Principal Component Analysis, PCs Principal Components, SVM Support Vector Machine, LOOCV Leave-one-out cross-validation, PANSS Positive And Negative Syndrome Scale, GMM Gaussian Mixture Model

Principal component analysis

Principal Component Analysis (PCA) was applied to reduce the dimensionality of the feature space (15 connectivity measures per frequency band per participant). The PCA was performed on each frequency band separately. The PCA was carried out on the full data set (i.e., 37 patients and 97 HC, in total N = 134). The main reason for not including all frequency bands in the same analysis was the sample size-to-variable ratio. A common recommendation for PCA is that the sample size needs to be at least five times larger than the number of variables [39]. The optimal number of components to describe the data was determined by Akaike's Information Criteria (AIC) [40].

Gaussian mixture model

The Gaussian Mixture Model (GMM) is an unsupervised clustering algorithm that fits a selected number of Gaussian clusters to the data. As the true number of clusters in the data is unknown, the GMM was fitted to 1–8 clusters. The GMM was used to optimize subgroup attribution within the schizophrenia group based on the PCA components determined by the lowest Akaike Information Criterion (AIC) score for each frequency band separately. Because we aimed to identify subgroups of schizophrenia, only the 37 patients were used in the unsupervised clustering. There is no general agreement about the minimum sample size in cluster analyses, but Formann suggested a minimum of 2d samples, where d is the number of clustering variables. To follow this recommendation, the maximum number of clustering variables in our study should be 5 [41, 42].

The regularization of the GMM was optimized and the optimal number of clusters in the data were estimated using leave-one-out cross-validation (LOOCV). To ensure robustness of our results, the GMM was repeated 50 times [43]. To be considered stable, the same number of clusters had to be the optimal in at least 90% of the runs of the GMM. Following the optimization, the GMM was refitted with the optimized regularization parameter and optimal number of clusters. For more details, please see Supplementary Material.

Group differences between the subgroups, i.e., clusters found by the GMM, were assessed using χ2-test, two-sample t test, or Mann–Whitney U test as appropriate, with respect to demography, connectivity, cognition, and psychopathology. The significance levels were corrected for multiple comparisons using the false discovery rate (FDR) [44] within each modality and considered significant for pFDR < 0.05. Post hoc group comparisons between each of the subgroups and the HC were performed to explore the proximity of the subgroups to the HC.

Support vector machine

Besides assessing univariate subgroup differences, we further explored the clinical relevance of the detected subgroups, by testing if the pattern in the cognitive and psychopathological profiles, respectively, could predict subgroups. To quantify the relation between subgroup labels and patterns in cognition and psychopathology, a linear Support Vector Machine (SVM) was used to predict the frequency specific, DMN connectivity-based subgroup labels based on either PANSS sub-scores (PANSS positive, PANSS negative, and PANSS general) or cognitive measures. To avoid overfitting, only five a priori selected cognitive tests were used as predictors in the SVM. The tests were selected to cover a broad range of cognitive domains and based on subgroup differences found in our previous work [26] and included: Verbal IQ (estimated based on vocabulary and similarities from WAIS-III), verbal memory (list learning from BACS), verbal fluency (F-words from BACS), mental flexibility (IED Total errors adjusted), and reaction time (RTI five choice reaction time). The features most important for the prediction were found by inspection of the feature weights estimated by the classifier [45]. The accuracy of the SVM was estimated using LOOCV. The significance of the SVM was tested using a permutation test with 1000 permutations and the significance level was calculated by the Monte Carlo permutation p value [46].

留言 (0)

沒有登入
gif