High-Order Temporal Convolutional Network for Improving Classification Performance of SSVEP-EEG

Modern brain-computer interfaces (BCIs) have developed as an important branch of brain science by connecting the brain neurology with artificial in vitro systems to assist, enhance and repair the sensory-motor functions of human bodies or to upgrade human-computer interactions [1]. During operation, users do not require any regular muscle activity, as control commands are implemented only by interpreting the intrinsic meaning of the brain's electrical signals. Of these, electroencephalography (EEG) is one of the most common forms of signal input due to its non-invasive and high resolution [2]. Steady-state visual evoked potential (SSVEP) stands out among many EEG paradigms with advantages in terms of information transfer rate (ITR) and training time.

SSVEP refers to the potential changes that occur in the brain's visual cortex or posterior occipital region of skull when the binoculars are exposed to fixed frequency visual stimuli. In SSVEP-BCI's experimental paradigm, each target is presented on the display devices with unique blinking frequencies, and the system achieves precise control of the external devices by parsing the SSVEP signal when users focus on specific objects. Blink stimulation frequencies can be divided into low frequency band (4–12Hz), medium frequency band (12–30Hz) and high frequency band (30Hz-) [3], [4]. Although the experimental procedures are more comfortable in the high frequency bands, the corresponding SSVEP signals are insufficiently identifiable [5]. The quality of SSVEP signals obtained is higher in the low and medium frequency bands, although they increase the risks of visual fatigue or epilepsy [3]. Therefore, to achieve superior performance SSVEP-BCI, most of current studies have focused on exploring visual stimuli in the low and medium frequency bands [6], [7], [8].

As with any human-computer interaction systems, the goal of SSVEP-BCI is to trade-off signal duration and recognition accuracy to achieve maximum ITR. Previous studies have categorized feature extraction and classification algorithms into two types: machine learning (ML) or deep learning (DL) based methods. ML algorithms consist of manually selecting features and constructing classifiers, of which the most commonly used method is typical correlation analysis (CCA) [9]. CCA realized classification by capturing the correlation between EEG data and the sinusoidal reference template corresponding to stimulus frequencies. Based on this principle, Taejun et al [10] proposed a novel adaptive time window method, namely the filter bank typical correlation analysis (FBCCA-DW) based on analysis of covariance (ANCOVA). This method dynamically adjusted the time window length to obtain the highest SSVEP classification performance. FBCCA-DW achieved an ITR of 146.81bits/min on the first public dataset using data of 1.53s length and 119.01bits/min on the second dataset by using data of 0.65s. To solve the problem of lower accuracies of SSVEP recognition in short time, Sun et al [11] proposed a temporal domain enhancement method based on time-weighted CCA (TWCCA). This method integrated all features and no longer targeted specific number of targets, thus not requiring excessive calibration data. Compared with the traditional CCA, TWCCA achieved a 3.86% improvement in classification accuracy. Isler et al [12], [13], [14], [15] proposed to prioritize the energy, entropy and variance features calculated from mother wavelet Haar when using wavelet transform. By utilizing these feature vectors individually or in combination, along with other nonlinear parameters exhibiting significant statistical characteristics, superior results can be achieved in various classification tasks. Xu et al [16] enhanced the SSVEP characteristics and reduced the dependence on sample sizes by refining the CST method, which enabled the full use of supervised learning information from training samples and unsupervised learning information from test samples. The CSTs was tested by designing a real-time SSVEP-BCI system, and for each category, an online ITR of 236.19bits/min can be achieved using only training samples with 36s calibration time. However, ML-based methods perform poorly when dealing with signals with many artifacts. On the other hand, manual selection of features depends overly on the knowledge and experience of the researchers [17].

By contrast, DL relies on powerful nonlinear computation and automatic feature extraction capability to obtain wide attention in the BCI field. DL methods can automatically learn multiple dimensional features from the original EEG signal without the limitations of manual features [18]. Li et al [19] proposed a convolutional correlation analysis DL model (Conv-CA), drawing on the concept of correlation coefficient in ML. The method extracted the features of the EEG signal and 3D reference signals by two parallel convolutional neural network (CNN) branches, then calculated the correlation coefficients between the two features using a correlation layer, and finally used a fully connected layer to achieve classification. This methodology provided a fresh idea for construction of subsequent classification models and achieved highly competitive results. Zhao et al [20] proposed a method called “filter bank CNN”, which covered multiple harmonics using filters in different frequency bands to obtain correlation information. FBCNN performed well on the public datasets and verified the enhanced effect of harmonic correlation on classification. Chen et al [21] first introduced the popular Transformer architecture to SSVEP classification tasks and constructed a filter bank-based SSVEPformer (FB-SSVEPformer). The method utilized the complex spectrum of SSVEP data as input, making the model focus on extracting frequency domain features. The FB-SSVEPformer achieved classification accuracies of 88.37% and 83.19%, and ITRs of 145.98bits/min and 155.55bits/min on two public datasets, respectively.

SSVEP is a conventional bioelectric signal continuous in temporal domain, whose main characteristic is driven by harmonics of the input flicker frequencies. Therefore, it has significant feature representation in the temporal domain. However, prospective studies were clearly inadequate for feature extraction in the temporal domain. To begin with, during SSVEP acquisition, users are usually focused during one time period and relatively distracted during other time periods. The time period of focus varies from user to user. Therefore, in order to achieve high-performance SSVEP decoding methods, it is necessary to enhance the characteristics of the time period in which the user's attention is focused while reducing the weights of other time periods. On the other hand, conventional CNNs have poor scaling properties for large receptive fields, and their capturing dependencies between distant sampling points within EEG has proven to be difficult [22]. In summary, it remains a pressing challenge to efficiently exploit attentional local temporal features within a fixed time window length and to fully extract global features by increasing the receptive field of the computational process.

To solve the above problems, we proposed a high-order temporal convolutional neural network (HOT-CNN), including a time-slice attention (TSA) module, a temporal convolutional network (TCN) module, and a feature fusion and classification module. The TSA module split the raw SSVEP-EEG signals into multiple equal-length time slices in the temporal dimension and assigned higher weights to the correlated time series of SSVEP by the attention mechanism. By this way, the trend of SSVEP-EEG can be represented through the changes of consecutive sampling points within one time slice. The TCN module utilized dilated causal convolution to expand the receptive field of the network to extract complex patterns and features of EEG signals at the global level, thereby improving the feature extraction capability. It also adopted a novel backpropagation path design, which effectively avoids the problem of gradient explosion or disappearance. Additionally, the TSA and TCN modules reduce the noise in the temporal domain features and create a novel compact feature representation of the SSVEP-EEG signals, which improved the decoding accuracies and ITRs.

留言 (0)

沒有登入
gif