An attention-based deep learning approach for the classification of subjective cognitive decline and mild cognitive impairment using resting-state EEG

Alzheimer's disease (AD) is the most common cause of dementia in the elderly population, accounting for up to 80% of cases worldwide [1]. In AD, the progression of neurodegeneration is well established by the stage of symptomatic disease. This might represent a considerable limitation in developing disease-modifying therapies, since the majority of interventions have been tested in cohorts with substantial synaptic and neuronal damage [2]. Thus, a further investigation of initial stages of AD is needed not only for prognostic purposes, but also to define populations with still sufficient functional compensation to be targeted in early clinical trials [3].

According to recent neuroimaging, neuropathological and biochemical investigations, the pathophysiological process of AD can begin decades before cognitive impairment [4]. It is now known that biological markers, such as amyloid-beta (Aβ) protein accumulation, which is distinctive in AD, may be found in the brain up to 20 years before the stage of dementia [5].

This evidence led to a new biological definition of the disease, which assumes that the cognitive decline in AD occurs over a long period [6] and develops as a continuum rather than as distinct, clinically-defined entities [7]. On this continuum, three broad phases can be distinguished: preclinical AD, mild cognitive impairment (MCI) due to AD and dementia due to AD [8].

While MCI refers to a well-defined, intermediate stage between normal ageing and pathological status [9], many patients experience a subjective cognitive decline (SCD) in memory and other cognitive domains prior to demonstrable impairment. SCD is not linked to a particular disease status itself [10]. However, it has been proved that the subjective decline, even at the stage of normal cognitive performance on mental tests, is associated with an increased risk of positive biomarkers for Alzheimer's and later conversion to dementia [1114]. In this context, it has been established that SCD can occur at late stages of preclinical AD, before MCI is reached. This phase can be also referred to as pre-MCI or pre-prodromal AD. In particular, since new diagnostic guidelines have been released, SCD individuals with pathological Aβ levels in cerebrospinal fluid could be considered to be in AD continuum [15]. Nonetheless, SCD constitutes a heterogeneous group, as it could be related to conditions such as normal aging, personality traits, psychiatric conditions, neurological and medical disorders, substance use, and medication [16].

Following these principles, many studies have focused on recognizing biomarkers to characterize and identify SCD at risk of progression to objective cognitive decline. Recently, Viviano and Damoiseaux reviewed several works making discrimination between SCD and healthy controls (HCs) using functional neuroimaging biomarkers, also proposing a model to integrate common features found in subjects affected by SCD [17]. It has also been noted that SCD individuals have a pattern of brain atrophy similar to that measured in AD pathology when compared to HCs without SCD [18]. Moreover, using 18F-fluorodeoxyglucose positron emission tomography (PET), it was found that subjects with SCD show glucose and neuronal hypometabolism with respect to HCs, which is correlated with decline in memory domain [14]. Altered activation of prefrontal cortex in SCD patients was detected using functional magnetic resonance imaging (MRI), even if there were no changes in verbal episodic memory encoding [14]. Additionally, longitudinal studies results have shown that it is possible to use these markers to predict patients with SCD and MCI who will convert to AD [19, 20]. Since Aβ burden is distinctive in the progression of AD from early stages, Maserejian et al developed a statistical framework based on multimodal data, including apolipoprotein E genotype status, to predict Aβ positivity in subjects with SCD and MCI [20]. Results on separate validation datasets indicated an estimated probability of Aβ positivity of up to 0.75 for patients with MCI and of 0.60 for SCD subjects.

Although the task of classifying SCD and MCI subjects from HCs has been addressed in several studies [21, 22], the discrimination between SCD and MCI conditions from a functional point of view is still poorly investigated in literature since anatomical and functional changes in brain between the two classes are subtler, making it a more challenging task to deal with [23]. Nevertheless, the intricacy of brain alterations in the early stages of AD makes it difficult to recognize patterns and develop accurate indicators for diagnosing and monitoring the development of AD on an individual basis [24, 25]. Furthermore, whilst advanced neuroimaging methods like PET and MRI enable to capture relevant modifications in brain processes related to AD, their use is limited in clinical settings due to cost, invasiveness and time consumption [26].

In this respect, electroencephalography (EEG) can represent an alternative technique that is both non-invasive and cost-effective, and much more practical for clinical applications [26, 27]. Since EEG signals reflect functional changes in the cerebral cortex, EEG-based biomarkers can be used to assess neuronal degeneration caused by AD progression long before actual tissue loss or behavioral symptoms appear.

Several studies proposed resting-state EEG (rsEEG) rhythms as candidate biomarkers of AD [2831]. A more comprehensive review of research in this field can be found in the work by Babiloni et al [32]. Cassani et al summarized EEG changes related to AD progression into four main categories: slowing, complexity reduction, synchronization decrement and neuromodulatory deficit [33]. At the MCI stage, such EEG abnormalities were found to be intermediate between HCs and dementia patients, and more severe compared to subjects with SCD [34]. Changes in relative and absolute power of theta (θ) frequency band appear to be significant among AD, MCI and HC at individual level [35]. Significantly higher global delta (δ) and theta power, lower global alpha (α) power and a higher global peak frequency have also been found in patients with SCD that have progressed to MCI and dementia [34]. Hence, measures of EEG-recorded brain activity can represent sensitive, non-invasive markers in the prediction of clinical development of AD. This assumption holds true also when comparing EEG to other neuroimaging methods, both structural and functional [36].

Research into the application of deep learning (DL) models based on EEG signals is growing thanks to the increasing availability of larger EEG datasets. DL enables end-to-end learning from raw inputs, thus overcoming the limitation of processing high-dimensional volumes of data encountered by traditional machine learning (ML) approaches. In the field of EEG data processing, DL has been used to improve and extend existing methods, reducing the need for domain-specific processing and feature extraction pipelines [37]. Compared with convolutional neural networks (CNNs) and recurrent neural networks (RNNs), which are the most extensively used architectures for time-series data classification [38], transformers [39] have shown higher ability to deal with long-range dependencies and recognize patterns in sequences of data [40], as well as employ more interpretable decision-making processes [40, 41]. Although transformers have become the standard models in natural language processing (NLP), recent efforts in exploring their applications on time-series data, such as EEG or electromyography signals, are showing interesting results [42].

In this work, we propose a DL framework based on the transformer model for the binary classification of resting-state EEG signals of SCD and MCI patients. To the best of our knowledge, this is the first study applying DL to EEG in a well-characterized clinical population at risk for AD, as MCI and SCD patients. The same framework is then employed to perform a multiclass classification among HC, SCD and MCI. Figure 1 shows the implemented workflow. We exploited the power of self-attention mechanism to extract relevant information from the signal in the temporal domain. Specifically, we firstly preprocessed the EEG signals with a well-consolidated, standardized pipeline [43]. Then, we filtered clean EEG signals to extract four main frequency bands, each used to build a new dataset from the original one. An additional dataset was obtained by filtering the signals in the full delta-to-beta range [0.1–30] Hz. Employing a leave-one-subject-out cross-validation (LOSOCV) approach, we trained and tested a model on each dataset to label the subjects based on their brain activity. Then, we compared the obtained results with three CNN-based models, both for the binary and multiclass classification tasks.

Figure 1. Workflow of the proposed method. Firstly, raw EEG signals are preprocessed. Then, 19 channels are selected and 4 frequency bands (δ, θ, α and β) are extracted, obtaining four distinct datasets and a fifth one using the δ-to-β range. After epoching the signals, we perform both two- and three-class classifications of epochs through LOSOCV. Finally, a majority voting approach is used to label each subject in the test set either as HC, SCD or MCI based on the class assigned to its epochs.

Standard image High-resolution image

The rest of this paper is organized as follows: section 2 introduces the state-of-the-art about studies dealing with the discrimination between SCD and MCI based on biomarkers; section 3 describes the EEG acquisition protocol, the dataset and the preprocessing pipeline; section 4 shows the classification methodology, detailing the transformer model and its application to rsEEG signals; section 5 reports the classification performances which are then discussed in section 6. Lastly, section 7 shows the conclusions of this work, highlighting the limitations and proposing improvements that could be made in future studies.

Despite longitudinal studies have assessed the increased risk for both SCD and MCI patients to develop Alzheimer's dementia, to the best of our knowledge a limited number of works have investigated changes of distinctive biomarkers to differentiate early AD stages.

Yue et al evaluated the extent of asymmetry of hippocampus and amygdala volumes from MRI scans in HC, SCD and MCI subjects [44]. They found significant differences between the latter two groups only when considering asymmetry of hippocampus, indicating that this marker could help the diagnosis of early AD stages. On the other hand, they found significant differences between HC and SCD in the volume of the right hippocampus, right amygdala and asymmetry of amygdala, and those differences were reflected in the comparison of HC and MCI. In a recent study by Li et al, an approach based on ML models was exploited on features extracted from MRI data to predict the scores of cognitive tests, i.e. Mini-Mental State Examination (MMSE) or Montreal Cognitive Assessment, of HC, SCD and MCI subjects, respectively. Results showed that imaging volumetric features of the brain were more correlated with the scores of cognitive tests than individual features extracted from brain subregions, such as the hippocampal area [45]. Such neuroimaging-based studies, although allow to characterize SCD and MCI effectively, still require time-consuming and expensive techniques to acquire data and thus are not easily replicable.

A study by Scheijbeler et al [46] used magnetoencephalography (MEG) data to compute brain network interactions in SCD and MCI patients by means of a permutation index, called inverted joint permutation entropy, which was used to train a logistic regression model. The area under the ROC curve (AUC) value obtained with this index (0.784 for SCD-MCI classification), was higher when compared to other MEG markers. However, a limited number of 18 SCD and 18 MCI subjects was employed and thus a replication of their method on larger samples is needed.

Even fewer works have focused on the role of EEG-derived biomarkers in the classification of SCD and MCI, although a lot of work has been done in discriminating AD subjects from both MCI and HC [31, 47] also employing DL models [4850].

Recently, quantitative EEG was used by Engedal et al to predict the conversion to dementia from a large dataset composed of 200 HC, SCD and MCI subjects for whom follow-up information was available [51]. Spectral features were extracted from the signal to calculate a dementia index, and a statistical pattern recognition method was employed to evaluate the predictive power of the index, reaching an accuracy of 69% in discriminating converters from non-converters. However, Engedal et al predicted conversion to dementia from EEG data of subjects already diagnosed. Lazarou et al [25] investigated the power of graph metrics derived from high-density EEG (HD-EEG) to discriminate among HC, SCD, MCI and AD individuals. They expected to find differences in brain connectivity in terms of correlation matrices constructed from the EEG activity. The statistical analyses showed that SCD individuals present network values intermediate to HC and MCI, underlying a common disconnection pattern of the brain connectome in SCD but not to the same extent as in MCI. Nonetheless, in the SCD vs MCI comparison, classification performances of both local and global network measures, evaluated with AUC values, were lower than 60%. Similarly, Abazid et al investigated connectivity links in the brain networks derived from rsEEG of SCD, MCI and AD patients by exploiting measures of statistical entropy and a support vector machine to discriminate the classes of patients. They demonstrated the effectiveness of the entropy measure to identify different stages of cognitive dysfunction when considering different graph parameters, reaching high accuracy levels, over 90% [52]. However, these results depend on several stages of signal manipulation (e.g. feature extraction, thresholding and selection) which can highly affect the classification performance.

Indeed, none of the above cited studies addressed the SCD vs MCI classification task by using DL approaches. Thus, we adapted an end-to-end model mainly employed in NLP, the transformer, and the self-attention mechanism, to classify resting-state EEG signals in a dataset of HC, SCD and MCI subjects by focusing on the global patterns of the brain oscillatory activity.

Resting-state EEG recordings of 17 HC, 56 SCD and 45 MCI subjects were collected at IRCCS Don Carlo Gnocchi in Florence, Italy. Table 1 reports clinical-demographic information of the study population. Patients with SCD and MCI who self-referred to the Regional Reference Center for Alzheimer's Disease and Cognitive Disorders of Careggi Hospital, Florence were enrolled in the 'PRedicting the EVolution of SubjectIvE Cognitive Decline to Alzheimer's Disease With machine learning (PREVIEW)' project, an ongoing prospective cohort study started in October 2020.

Table 1. Clinical-demographic characteristics of the study population. HC: healthy controls; SCD: subjective cognitive decline; MCI: mild cognitive impairment; MMSE: mini-mental state examination; TIB: Italian brief intelligence Test; SD: standard deviation.

CharacteristicsHC (n = 17)SCD (n = 56)MCI (n = 45) Age (mean ± SD) $64.29 \pm 4.77$ $66.26 \pm 8.72$ $74.26 \pm 8.20$ Females (%)41.278.354.3Age onset (mean ± SD)— $55.15 \pm 8.04$ $62.09 \pm 9.97$ Years of education (mean ± SD) $15.50 \pm 3.78$ $12.58 \pm 3.47$ $10.18 \pm 4.17$ MMSE (mean ± SD) $28.92 \pm 1.19$ $27.48 \pm 2.28$ $27.52 \pm 2.13$ TIB (mean ± SD)— $107.22 \pm 20.48$ $111.00 \pm 6.01$

Patients were classified as SCD according to the terminology proposed by the SCD initiative working group [10], which requires the subject to self-experience a persistent decline in cognitive capacity in comparison with a previously normal status and unrelated to an acute event, as well as normal age-, gender-, and education-adjusted performances on standardized cognitive tests. Patients were classified as MCI according to the National Institute on Aging-Alzheimer's Association workgroups criteria for the diagnosis of MCI [9], specifically requiring: cognitive concern reflecting a change in cognition reported by the clinician or the patient, objective evidence of impairment in one or more cognitive domains (all patients underwent an extensive neuropsychological investigation, with estimation of premorbid intelligence, and assessment of depression), preservation of independence in functional abilities and no signs of dementia. The study was approved by a local ethics committee and individual informed consent was obtained. Experimental procedures were conformed to the Declaration of Helsinki and national guidelines.

Data were acquired using EBNeuro's GalNt system (EBNeuro, Florence, Italy) with 64 channels digitized at a sampling rate of 512 Hz. Among the 64 electrodes, 61 electrodes covered the whole scalp to record EEG while the remaining ones recorded electrooculographic and electrocardiographic activity, and thus were not considered for further analysis. The electrodes were placed according to the 10–10 montage system and electrode-skin impedance was set below 5 kΩ. Subjects were sat in a reclined chair for approximately 20 min.

The acquisition protocol was structured to include both closed and open eyes conditions. Specifically, each subject was asked to open the eyes at irregular intervals and when the signal registered drowsiness. However, since it is known that the two conditions show very different signal properties (e.g. higher alpha band power in the eyes-closed (EC) condition) and according to previous studies, such as Lazarou et al [25], we extracted and employed only the EC epochs of the original signal for all the subjects (mean length = $15.03 \pm 1.41$ min), which represent the largest part of the protocol.

Raw data were preprocessed offline using Matlab R2019b (The Mathworks, Natick, MA, USA) and EEGLAB toolbox v.2021.0. Even though it is still not clear whether heavy signal preprocessing is needed when employing DL methods [53], a systematic review on EEG classification carried out by Roy et al pointed out how most published papers in this field still preprocess the EEG data before feeding it to deep models [37]. In this work, a standardized pipeline, the PREP pipeline [43], was adapted and employed as a first step to clean the signal. This pipeline uses a robust re-referencing algorithm to interpolate noisy channels and leverages routines from the cleanline method to remove line noise components [43]. Although the biggest advantage of this approach is that it removes only deterministic line components, while preserving substantial spectral energy, it can present some drawbacks due to the assumption of signal stationarity [43]. To overcome these limitations, a 50 Hz notch filter was further applied to ensure line noise cleaning. This method can be safely applied on our data since high frequencies of the signal, which could be distorted, were not analyzed [43].

The EEG data recorded from scalp electrodes can be considered summations of real EEG signals and artifacts, which are independent of each other. Independent component analysis has been widely used to remove EEG artifacts, such as eye blinks and muscle activity [54]. Thus, a semi-automatic method employing EEGLAB's ICLabel [55] and manual choice of independent components to retain has then been applied to the signals. Lastly, epochs with excessive noise or artifacts were visually inspected and removed. Figure 2 shows an example of the EEG signal of the first subject before and after applying the preprocessing pipeline.

Figure 2. Sample EEG recordings of subject 1. (top) 5 s window of raw EEG signal. (bottom) Same window after preprocessing.

Standard image High-resolution image

A cluster of 19 channels, namely Fp1, Fp2, F7, F3, Fz, F4, F8, T3, C3, Cz, C4, T4, T5, P3, Pz, P4, T6, O1, O2, was then selected. Since these channels evenly cover the scalp area, this EEG pattern is the most employed in the literature for similar studies [56] and has been proven to ensure sufficient quality along with possible comparison with previous rsEEG findings of other projects [30]. Subsequently, the signals were bandpass filtered between 0.1 Hz and 45 Hz.

Four main frequency bands, namely delta (δ) [0.1–4] Hz, theta (θ) [4–8] Hz, alpha (α) [8–13] Hz and beta (β) [13–30] Hz were extracted from each EEG signal using designed bandpass filters, and each related dataset was created. Furthermore, in order to assess which frequency band was the most distinctive in the classification of HC, SCD and MCI, we also filtered the signals in the entire range [0.1–30] Hz, and an additional dataset (all-band) was generated. Gamma (γ) band [30–70] Hz was excluded from the analysis since the EEG signal in this band can be significantly contaminated with muscle artifacts [57]. To design filters, we used the $pop\_eegfiltnew$ function from EEGLAB, which has a heuristic for automatically determining the filter length and order. This function employs a zero-phase Hamming windowed sinc finite impulse response filter [58].

Hence, five different datasets were constructed from the original one, respectively corresponding to specific frequency intervals.

To address the clinical problem of discriminating early stages of AD, we propose an EEG classification method based on the self-attention mechanism and the transformer architecture [39].

Although firstly employed in NLP, transformers have also been proven effective in computer vision tasks [5962], offering a valid alternative to CNNs and RNNs. In this context, a model called vision transformer (ViT), proposed by Dosovitskiy et al in 2020 [63], has yielded interesting results on multiple image recognition benchmarks when compared to state-of-the-art models [64].

Following this path, recent studies have also partly introduced the attention mechanism to EEG decoding [6569]. In the work by Wei et al, the integration of an attention module downstream of a CNN improved the classification accuracy of MCI and HC [70]. However, these approaches work on hybrid architectures that still heavily rely on CNNs and RNNs to learn discriminative information from EEG, thus not exploiting the computational advantages of transformers at its fullest. Song et al [71] implemented a variant of the ViT called spatial-temporal tiny transformer (S3T) to convert the input EEG signal into a discernible representation for motor imagery EEG (MI-EEG) classification purposes. In their model, both spatial and temporal features were captured by applying attention firstly on the EEG channels and then on the EEG time series. The output was a new representation of data that could be classified using a fully-connected layer. As ViT, S3T almost completely disengages from using convolution layers or recurrent layers and relies on the attention mechanism to learn informative features from raw EEG signals. One single convolution operation is kept to learn global positional dependencies of signal segments. Compared to baseline DL models, the authors reached state-of-the-art results using models with a smaller amount of parameters, thus alleviating computational burden and improving scalability.

4.1. Transformer

As shown in figure 3, the core of a transformer consists of an encoder and a decoder with several blocks of the same type. The encoder generates encodings of inputs, while the decoder generates the output sequence from the encodings. Each transformer block is composed of an attention layer, a feed-forward neural network, shortcut connection and layer normalization. The attention layer is based on the concept of self-attention, which computes an attention function of the inputs to retrieve the dependencies of each element to the others.

Figure 3. Original transformer architecture. Reproduced with permission from [39].

Standard image High-resolution image

Specifically, the input vector is first transformed into three different vectors: the query vector q, the key vector k and the value vector v with dimensions $dq = dk = dv$. Vectors derived from different inputs are then merged together into three different matrices, namely Q, K and V. Subsequently, the attention function between different input vectors is calculated according to equation (1)

Equation (1)

The function computes scores between each pair of inputs, and these values impact how much attention we give to other inputs when encoding the current input. These scores are normalized for gradient stability and then translated into probabilities using the softmax function. Finally, each value vector is multiplied by the sum of the probabilities. The subsequent layer focuses on vectors with higher probability.

The original transformer employs layers of multi-head attention (MHA), which generalize the concept of attention by computing different representation subspaces using H randomly initialized query, key and value matrices, where H is the chosen number of heads. These representations are then concatenated to feed the classification layer. This method allows the model to focus on one or more specific input positions without influencing the attention on other equally important positions at the same time.

ViT directly applies the MHA mechanism to sequences of image patches for image classification tasks [63]. Few modifications are implemented to the original architecture, even though only the transformer encoder module is kept. In such model, sequences of image patches are treated as sequences of words in NLP. 2D images are reshaped into a series of patches of dimension $xp\in R^$ where C is the number of image channels, (P, P) is the resolution of each image patch, and N is the total number of resulting patches. A similar approach was proposed by Cordonnier et al, but images were divided into patches of dimension $2\times2$ pixels, thus limiting its use only to small-resolution images [72].

The sequence of patches is then flattened and linearly projected to have a sequence of patch embeddings, which are added to extra learnable embeddings and positional embeddings before being fed to the encoder stack. Since MHA is permutation-equivariant with respect to its inputs, the latter are used to retain spatial information on the position of each patch in the original image.

Lastly, a multilayer perceptron head performs the classification of the resulting encoded representation.

4.2. Proposed model

Following the work by Song et al [71], we implemented a pipeline to classify EEG epochs, as shown in figure 4, by designing and training a modified version of the S3T on EC rsEEG signals of SCD and MCI subjects. The same pipeline was followed for the classification of HC, SCD and MCI. For this second task, the last fully connected layer was composed of three output units.

Figure 4. EEG epoch classification pipeline. Each EEG segment of C = 19 channels and D = 5120 datapoints is used as input to our model, which uses a convolutional layer to compress the signal, extract slices and embed the information. k = 31 is the size of the kernel, $\mathrm  = 6$ is the embeddings' dimension and CLS is the classification token prepended to the input. Attention mechanism is then applied on the temporal domain and, after global average pooling, a linear layer is used to classify the input EEG epoch.

Standard image High-resolution image

The major difference between the two architectures concerns the way attention is applied to the signals. The proposed model dismisses the spatial attention module, which is used to weight the information encoded by each EEG channel, and prioritizes the temporal domain of the signal. This difference is due to the fact that the objective of this work is to classify resting-state signals, instead of MI signals as in Song et al [71]. In fact, while different MI processes activate different areas of the cerebral cortex, and thus spatial channel information was revealed to be of fundamental importance when engaging in a MI classification task [73, 74], resting-states reflect the spontaneous brain activity, thus there is not an established spatial correlation also when investigating cognitive decline associated with AD [75].

Consequently, our model aims to exploit MHA to understand if temporal dependencies of the EEG sequences can highlight discriminative patterns among HC, SCD and MCI subjects. The MHA layer is included in an encoder block, which combines it with a feed-forward module, a normalization layer and dropout. The encoder block is replicated a number of times specified by the depth parameter, which was set to 2, whereas the number of heads was set to 3. It is worth noting that this configuration is of low complexity and reduced computational cost since it requires fewer parameters than traditional CNNs and RNNs. A graphical representation of the implemented transformer model is shown in figure 5 with reference to SCD vs MCI classification.

Figure 5. Proposed transformer architecture. $\mathrm $ is the classification token, h = 3 is the number of heads used by multi-head attention and $\mathrm  = 2$ indicates the number of times the transformer encoder block is repeated. A legend for uncaptioned blocks is provided on the bottom right corner.

Standard image High-resolution image

Similarly to the original transformer architecture, the proposed model also needs some information on the position of inputs in the time series. This is achieved by Song et al by using a convolutional layer on the time dimension before compression, rather than positional encodings as in the original model [71]. Instead, we use a convolutional layer to embed channels' information, compressing it to a single channel representation, and to extract slices from EEG sequences as shown in figure 4. Then, we encode the positions of all slices in the sequence, and the vector of positions is linearly added to the input. Furthermore, we prepend an extra-learnable classification token to each input sequence, which is used to predict the final class after being updated by attention, as in the ViT [63]. Compared to the original S3T model, this position encoding method requires fewer parameters and avoids the use of an additional convolutional layer, which increases the complexity of the model. After the global average pooling, a classification head composed of a fully-connected layer, after layer normalization, is then used to classify the new representation of the input.

4.3. Experimental details

After preprocessing, on each dataset, namely the all-band dataset and delta, theta, alpha and beta datasets, LOSOCV was performed, meaning in each fold all of the subjects except one were used to train the model, and the remaining subject was used to test it. This cross-validation strategy is the most used across studies that employ rsEEG for AD diagnosis and progression analysis [33].

We split the EEG signal of each subject into epochs of 10 s, meaning our models were trained on windows of N = 10 s · 512 Hz = 5120 data points. Each epoch was associated to the label of the corresponding subject. Since the duration of the recording was different for each subject, the number of epochs generated per subject was variable. However, in order to improve the learning capabilities of the model, the number of EEG epochs of the majority classes, i.e. the SCD in the two-way and both SCD and MCI in the multiclass classification, has been reduced by random sampling for being equal to the number of epochs of the minority class in the training set.

Furthermore, all the epochs were normalized using z-score normalization, which was revealed to be an optimal normalization technique for giving models the ability to make classification across an inter-subject population [76], and effective i

留言 (0)

沒有登入
gif