A Novel Real-time Phase Prediction Network in EEG Rhythm

Phase Prediction Frameworks

Optimizing the TMS strategy based on the instantaneous EEG phase is a hot area in closed-loop neuromodulation, which needs to be balanced between accuracy and speed. In real-time situations, only narrow-band EEG signals can be acquired before stimulation; this leads to the instantaneous phase being distorted by the causal filter. There are several models for restoring the instantaneous EEG phase.

Optimized Multi-layer Filter Architecture (MLOF)

The key feature of this method is to combine two fully connected (FC) layers, one of which uses a Softmax operation, via a dot-multiplying operation to capture multiple frequency representations and be able to gate between them. The machine learning architecture is designed to make a causal mapping from raw EEG data to the instantaneous phase. A detailed description can be found in [33].

Auto-regress Model (AR)

Since the edge of a narrow-band EEG signal is often distorted by a causal filter, the AR model regresses out the points so that the phase information can be subsequently calculated by the Hilbert transform in a single filtered EEG data trial [27, 38].

The EEG rhythm data can be fitted using Equation (1):

$$ X_ = a_ X_ + a_ X_ + \ldots + a_ X_ + \varepsilon_ $$

(1)

where \(X_\) is the filtered EEG rhythm data at time t, \(p\) is the order, \(\varepsilon_\) is the error at time t, and \(a\) is the factor of the AR model. The factor \(a\) is usually calculated using the Yule-Walker method and is updated every time new data are collected. The order p is usually set empirically.

Educated Temporal Prediction (ETP)

The distorted instantaneous phase can be linearly interpolated by using the points that are unaffected or little affected by the causal filter in the narrow EEG signal. For example, ETP calculates the frequency cycle of the individual alpha frequency (IAF) specific to each subject based on the pre-acquired resting EEG data [34]. Then the peaks are extracted from the filtered narrow-band rhythm and treated as the 0-degree Hilbert phase. The mean absolute phase bias of the peaks to the ground truth phases is then calibrated through statistical learning and added into the IAF cycle. Finally, the target phrase of interest can be obtained by interpolation based on the nearest peak location according to Equation (2).

$$ T_}^ }} = \frac}}} }}}} $$

(2)

where \(T_}^ }}\) is the new value of the individual alpha frequency cycle and is added to the last detected alpha peak. \(T_}}}\) is the IAF cycle learned and bias-adjusted from the pre-recorded resting EEG data. \(\theta \) is the target phase frequency for TMS stimulation.

EEG Phase Prediction Network (EPN)

EEG is non-stationary, so approximation to a steady-state signal introduces prediction errors, and a cross-trial approach is needed to learn the rhythmic characteristics of narrow-band EEG signals. We thought that combining conventional filtering with machine learning could significantly reduce the phase prediction variance and increase the prediction accuracy. Therefore, we developed the EPN model, which can detect the non-linear EEG features and directly predict the instantaneous EEG phase.

As shown in Fig. 1D, the EPN consists of five layers: Input layer, Average pooling layer, Fully connected layer, Dropout layer, and Output layer.

1. Input Layer

This layer receives the input EEG data with a sampling window of 250 (500 ms at a sampling rate of 500 Hz). Its function is to pass the input data to the subsequent layers for processing.

2. Average Pooling Layer (Avg Pool)

The average pooling layer function is similar to a moving average filter for time series data, suppressing noise and enhancing the robustness of the model. For a given EEG sequence \( x\), the output of the average pooling layer is:

$$ A\left( n \right) = \frac \mathop \sum \limits_ = n}}^ x\left( i \right) $$

(3)

where \(s\) is the kernel size (here, we set \(s\) to 13), \(A\left( n \right)\) represents the output of the Avg Pool at EEG sampling index \(n\), which is the mean value of the EEG signal in the window from \(n \,}o\,n + s\). To ensure that the number of EEG data remained unchanged after average pooling, we used the repeat method to pad the input signal.

3. Fully Connected Layer (FC)

Following the Avg Pool layer, the output features were further processed and expanded into higher-dimensional features by FC-1, with an output dimension of 500. To enhance the model’s non-linear learning capability, we applied the ReLU non-linear activation function to the output of FC-1.

$$ F\left( n \right) = \mathop \sum \limits_ = 1}}^ W\left[ i \right] \times A\left[ i \right] + B\left[ i \right] $$

(4)

where \(F\) represents the output of FC-1, \(A\) is the output of Avg Pool, \(W\) is the weight of FC-1, and \(B\) is the bias coefficients of FC-1. The bias coefficients and weights of the fully connected layer were both obtained through training.

4. Dropout Layer

To prevent overfitting, we introduced a dropout layer where neurons were randomly dropped with a probability of \(P = 0.9\). Dropout could also be used as a method to introduce noise to the model, which helps to enhance the model’s generalizability by preventing it from relying too heavily on specific local features.

5. Output Layer

This fully connected layer serves as the output layer with a dimension of 2, mapping the features to the complex domain and providing a complex expression of μ rhythm signals \(Z\).

$$ Z\left( n \right) = \mathop \sum \limits_ = 1}}^ U\left[ i \right] \times C\left[ i \right] + D\left[ i \right] $$

(5)

\(Z\) represents the output of the EPN, \(C\) is the output of the dropout layer, \(U\) is the weight of the output layer, and \(D\) is the bias coefficients of the output layer. The bias coefficients and weights of the output layer were also learned through training.

Dataset Generation and ProcedurePre-recorded EEG Dataset

We studied existing data from twenty subjects (6 female) who received single-pulse TMS of the motor-hand area. Before the experiment, all the subjects provided written informed consent, which was approved by the National Institutes of Health Combined Neuroscience Section Institutional Review Board. The subjects were not intentionally preselected for μ rhythm spectral features in their EEG signals, as was done in another study [39, 40]. The electrophysiological signals and related navigation data are published in a database at https://openneuro.org/datasets/ds002094 created by Hussain et al. [29]. The scalp EEG signal was recorded at a 5 kHz sampling rate by a 32-channel TMS-compatible amplifier (BrainAmp MR+, Brain Vision, Gilching, Germany). At the beginning of the TMS, an additional 5-min resting-state EEG was recorded. This was used to train and test different phase prediction frameworks.

To generate the EPN, MLOF, and AR training datasets shown in Fig. 2A, we divided each subject’s 5-min resting data into a 4-min training and validation set and a 1-min test set and then resampled to 500 Hz followed by a 5-point C4-centered sum-difference Hjorth-Laplace transform as a spatial filter to capture the more local μ rhythm information [41]. The training and testing datasets were then separately segmented into windows 6.16 s long (3,080 sampling points) with a series of overlapping window lengths: 0 s (no overlap), 0.128 s (64 sampling points), 0.256 s (128 sampling points), 0.384 s (192 sampling points), 0.512 s (256 sampling points), 0.640 s (320 sampling points), 0.768 s (384 sampling points), 0.896 s (448 sampling points), 1.024 s (512 sampling points), 1.152 s (576 sampling points), and 1.280 s (640 sampling points). The ground truth of the μ rhythm was defined as the raw EEG data after applying a two-pass zero-phase finite impulse response (FIR) bandpass filter (8 Hz–13 Hz bandpass, 769 order). As Fig. 2B shows, 0 s is the center of a 6.16 s interval, which is treated as the onset event for the stimulation, the non-causally filtered data between −1.54 s and 1.54 s is not affected by the filter, and the Hilbert phase of the filtered data at time 0 s is defined as the phase ground truth. The instantaneous phase at the simulation onset at time 0 s was defined as the output label of the three models. The backward difference \(y\left( n \right) = x\left( \right) - x\left( n \right)\) of the raw data from −0.5 s to 0 s, which was used to remove large electrode drifts and facilitate machine learning training speed [31], was defined as the inputs to the EPN and MLOF models, where \(x\left( n \right)\) was the raw EEG data at index \(n\). The raw data from −1.0 s to 0 s before stimulation onset was defined as the input to the AR model [27].

Fig. 2figure 2

The generation of datasets and illustration of the instantaneous phase definition for the EPN, MLOF, and AR models. A The pipeline of generating training and testing datasets. The 5-min resting data is divided into a 4-min resting set for training and validation and a 1-min resting dataset. The details of dataset generation are described in the Dataset Generation and Procedure section. B Definition of the instantaneous phase label. The instantaneous phase (stimulation onset) is defined as the center of the EEG signal in the 6.16 s window after non-causal filtering and Hilbert transformation.

For the ETP model, each subject’s 5-min resting data was divided into a 4-min training and a 1-min test set and then resampled to 1000 Hz followed by a 5-point C4-centered sum-difference Hjorth-Laplace transform. The μ rhythm cycle of the frequency that is specific to each subject was determined by the first 4-min resting data, as detailed in [34]. Like the EPN, MLOF, and AR models, the next 1-min test set was also segmented by a window length of 3.08 s (3,080 sampling points) with a series of overlapping window lengths: 0 s (no overlap), 0.64 s (64 sampling points), 0.128 s (128 sampling points), 0.192 s (192 sampling points), 0.256 s (256 sampling points), 0.32 s (320 sampling points), 0.384 s (384 sampling points), 0.448 s (448 sampling points), 0.512 s (512 sampling points), 0.576 s (576 sampling points), and 0.64 s (640 sampling points). The filter order and ground truth definition were the same as in the EPN, MLOF, ETP, and AR models. The prediction phase of the ETP at time 0 s was interpolated using Equation (2).

Given that the EPN and MLOF models require appropriate training for phase prediction, we implemented and trained these models using the PyTorch framework. We applied the Adam optimizer with an initial learning rate of 10−5 to optimize the model’s hyperparameters. The mean squared error function was used as the loss function, and the models were trained for 3,000 epochs with a batch size of 512.

Simulated EEG Dataset

The ground truth phase of the μ rhythm is often defined as the Hilbert phase of non-causal filtering. However, because of EEG \(1/f\) background neural noise, the non-causal phase estimate still has limitations with respect to obtaining the phase of the actual rhythm. To verify the performance of the EPN and other models with respect to the underlying real truth, we used the fitting oscillations & one over f (FOOOF) toolbox [42] to fit the background noise and the Kuramoto model [43] to model the EEG signal. The real truth phase was defined as the non-causally recovered phase after applying a Hilbert-transform, prior to the addition of noise. A detailed description can be found in [33].

Real-time Phase Prediction ProcedureHuman Subjects

Six healthy male subjects signed an informed consent document and participated in this study, which was approved by the Ethical Committee of the Institute of Automation, Chinese Academy of Sciences. They were asked to get a good rest and not to use any psychotropic substances or coffee before the experiment. Due to technical problems, one participant was not included in the subsequent analyses.

Experimental Protocol

The subjects were instructed to sit quietly with their eyes open during the recording. Initially, 5 min of resting EEG data was collected. We tested the performance of real-time μ rhythm phase prediction in four models (EPN, MLOF, AR, and ETP) at the rising edge (0°), peak (90°), falling edge (180°), and trough (270°) with 100 replications collected in each of the four sessions. The order of the algorithms in each session and the target phase selection were shuffled to obtain random order. There was a 5-min break after each session.

EEG Recording

EEG data were recorded with a 16-channel EEG amplifier (BRE-100, CASIA Brain, Beijing, China) and a 32-channel EEG cap (GT Cap PRO, Greentech, Wuhan, China). FC1, FC5, CP1, and CP5 electrodes around the C3 electrode were attached to the head (the FCz electrode was the reference and AFz was the ground electrode), and the impedance of each electrode was kept below 10 kΩ. The sampling rate was 1,000 Hz and the collected data were sent to a computer by USB cable.

Real-time Signal Processing and Model Deployment

The AR and ETP were implemented in custom MATLAB scripts (https://github.com/OpitzLab/CL-phase), and the EPN and MLOF models were trained by custom Python scripts and exported to an Open Neural Network Exchange file to share and deploy the models (https://github.com/onnx). The real-time EEG data were streamed to a processing computer (Z620, Hewlett-Packard, California, USA) by Lab Streaming Layer (https://github.com/sccn/labstreaminglayer). To reduce the computational latency of the EPN and MLOF machine learning models, we used Qt C++ (https://github.com/qt/qt5) and the CUDA Deep Neural Network library (https://github.com/NVIDIA/cudnn-frontend) to deploy these models into graphics processing unit (Geforce RTX 2060, Nvidia, California, USA) for real-time computation. Trigger signals were sent to the TMS machine (Super Rapid 2, Magstim, Carmarthenshire, UK) via the PCI parallel port (MCS9865, MosChip Semiconductor Technology, Telangana, India) which took only 1 ms–2 ms, and the latency was stable over time. We focused on the model performance of the phase prediction rather than on the output of the TMS stimulation, so the TMS machine was shut down and only the trigger time was labeled on the data stream. As a result, the real-time EEG data was not distorted, which helped us to build the non-causal toolchain and restore the ground truth of the μ rhythm [34].

Evaluation

To obtain a comprehensive measure of each model’s performance, we applied the circular mean (MEAN), circular variance (VAR), standard circular deviation (SD), mean absolute circular error (MACE) [33, 44], and accuracy (ACC) [34]:

$$ } = }\left( \mathop \sum \limits_^ }^}}} - i\theta_}}} }} } \right) $$

(6)

$$ } = 1 - \left| \mathop \sum \limits_^ }^}}} - i\theta_}}} }} } \right| $$

(7)

$$ } = \frac\mathop \sum \limits_^ \left| }\left( }^}}} - i\theta_}}} }} } \right)} \right| $$

(9)

$$ } = 1 - \frac}^ \left( ^ }^}}} - i\theta_}}} }} } \right|} \right) $$

(10)

where N is the EEG block number, \(\theta_}}}\) is the prediction phase in radians, and \(\theta_}}}\) is the true phase in radians. The closer the value of MEAN is to 0, the smaller the phase prediction deviation. The smaller the SD, the less the spread of the measured distribution. The closer the accuracy approaches 1, the more accurate the phase prediction becomes.

To compare the performance of the four models, we applied the Wilcoxon signed-rank test and the Bonferroni correction for multiple comparisons. We further compared the effect of the signal-to-noise ratio (SNR) on the prediction accuracy of the different models. We performed the following steps: (1) The power spectral density (PSD) was first estimated by Welch’s method over 1-s epoched data with a Hann window. (2) The SNR was calculated as the average PSD from 8 Hz to 13 Hz divided by the total PSD (2 Hz–45 Hz). (3) A linear regression model with accuracy was used as the dependent variable, and SNR was the independent variable.

Data Availability

All relevant data and computation codes are available from the corresponding authors upon request.

留言 (0)

沒有登入
gif