An Approach to Emotion Recognition Using Brain Rhythm Sequencing and Asymmetric Features

In this work, MATLAB R2021a was applied for programming the proposed methodology, and the results in various test sets are from its calculations. Then, for the results, the performances of seven similarity measure methods with three classifiers are discussed to summarize the appropriate method for measuring the similarity levels of rhythm sequences and the suitable classifiers for asymmetric features. In addition, the representative symmetrical channels used for recognizing specific emotional factors are analyzed based on the optimal features found. Note that the conditions are different between DEAP and MER, so the results and discussion are separated into two subsections. Finally, the performance comparison with the existing works that consider symmetrical spatial features is carried out.

Results From the DEAP Dataset

The average classification accuracies of the asymmetric features extracted by seven similarity measure methods are presented in Tables 23, and 4, respectively, in which the first column lists the methods and the remaining columns display the accuracies of set-A and set-V using the asymmetric features extracted from the sequence data from the first 30-s (F30 s), last 30-s (L30 s), and all 60-s (A60 s) periods. Here, to calculate the classification accuracy of each subject, 40 experimental trials are classified. Thus, these results are from 40 simulation runs per subject and then averaged by 32 subjects. In addition, the best of each case is underlined.

Table 2 The average classification accuracies (mean ± standard deviation %) of the asymmetric features using k-NN, DEAP datasetTable 3 The average classification accuracies (mean ± standard deviation %) of the asymmetric features using SVM, DEAP datasetTable 4 The average classification accuracies (mean ± standard deviation %) of the asymmetric features using LDA, DEAP dataset

Meanwhile, for illustration, Fig. 7 depicts a comparative histogram to display the average accuracies of three classifiers with different similarity measure methods on set-A of the DEAP dataset, where the data sources are from the L30 s. As observed, the performances of SVM are similar to those of LDA, while k-NN yields better results. Similar trends can also be found in the other scenarios. The main reason may be the properties of the classifiers. SVM generates a hyperplane that separates the training set in the frontier between two classes, and LDA makes a hyperplane that separates the training set. Thus, both achieve classification by separating the hyperplane with a special margin. Then, k-NN conducts the classification through a cluster determined by known neighbors (i.e., a training set) around the testing data. Such results also reveal that the distribution of the asymmetric features is more fit with k-NN. The comparisons indicate that k-NN is more suitable for use as the classifier for training and testing the asymmetric features in this work.

Fig. 7figure 7

Average classification accuracies (set-A_L30 s, DEAP) using the asymmetric features extracted from different similarity measure methods with three classifiers (k-NN, SVM, and LDA)

In addition, when using the same classifier, the performances by different similarity measure methods are close, as their variations are slight. This indicates that there are no substantial differences in similarity measure methods for the asymmetric features. The main reason may be that the components of the sequences are only five brain rhythms, and their length is either 150 (i.e., 30 s) or 300 (i.e., 60 s), so they can be viewed as a short string. For the methods investigated, even though some are distance-based, and some are shape-based, they may not produce different performances in the similarity levels between such strings. Here, DTW provides approximately 1–2% higher accuracy than the others. Based on the above considerations, it can be said that the performances by different similarity measure methods are close when using the same classifier. Overall, DTW is slightly better. Therefore, DTW is recommended as the similarity measure method to extract the asymmetric features in this work.

Furthermore, the length of the brain rhythm sequence is the same as the length of EEG, so different lengths are evaluated to investigate the time effect in emotion recognition. Here, close results are obtained when employing 30 s and 60 s data on the respective classifiers, disclosing that the 30-s period is sufficient to realize a similar performance as 60 s. As a result, the time applied for emotion recognition can be further reduced from 60 to 30 s, which also removes the redundant data at the time scale. More importantly, the L30 s data exhibit slightly better results than the F30 s data. This may be due to the later periods containing more emotion-related information than the earlier periods. Similar findings have been reported previously. Kumar et al. [42] compared the classification accuracies on DEAP by F30 s and L30 s data, respectively. The results revealed that the L30 s period is more associated with emotion. In another work, Jatupaiboon et al. [43] assessed the accuracies of arousal and valence through the F30 s, L30 s, and A60 s data, respectively. They claimed that the L30 s data yield the best average accuracy. Thus, the aforementioned works also demonstrated that the results from the proposed methodology are reasonable.

The above analysis indicates that the DTW is appropriate for the similarity measure, k-NN is suitable for the classifier, and the L30 s period is proper for emotion recognition. Based on such properties, the classification accuracies using the asymmetric features extracted from various symmetrical channels are evaluated. Figure 8 illustrates the results of subject S3 from DEAP, in which a and b depict the accuracies on set-A and set-V, respectively. The deeper the red, the higher the classification accuracy. In Fig. 8, the accuracies of the asymmetric features vary with the emotional factors, even for the same subject. For example, the asymmetric feature of FC1–FC2 yields a remarkable accuracy (95%) on set-A, but it is not the best (75%) on set-V, while CP1–CP2 is more useful (80%) on set-V. Such findings further imply that the similarity levels of rhythm sequences between FC1 and FC2 and between CP1 and CP2 are sensitive to variations in arousal and valence, respectively. Consequently, the emotion recognition of subject S3 can be directly achieved by the corresponding asymmetric features.

Fig. 8figure 8

Classification accuracies using the asymmetric features extracted from various symmetrical channels (subject S3, DEAP). The deeper red indicates a higher classification accuracy: a set-A, b set-V

Further investigations were conducted to determine the performances of asymmetric features among different subjects on the same test set. Figure 9 draws the accuracies of the asymmetric features for set-A from four subjects (S3, S5, S21, and S25) on DEAP. As observed, although the asymmetric features are extracted and classified in the same way, their performances change by subject. For instance, the asymmetric feature of FC5–FC6 is only vital for subject S21 (Fig. 9c), while it is not active for the others. Such distinctness implies that emotion recognition exhibits subject-dependent properties, consistent with earlier works [44, 45].

Fig. 9figure 9

Classification accuracies of four subjects on set-A of DEAP: a S3, b S5, c S21, and d S25

Taking the highest accuracy, the optimal asymmetric features are identified for all subjects of DEAP. It is interesting to know the locations of the optimal features. To this end, the statistical percentages based on five scalp regions are presented in Table 5, in which the first row denotes the region and the remaining rows display the percentages on different test sets. The results reveal that the optimal features are mainly from the symmetrical channels located in the frontal or parietal regions. Regarding brain function, the frontal region regulates cognitive awareness from stimuli, and the parietal region processes perceptual information from audio and vision. Thus, the EEG recordings from such regions function in the reactions under stimuli. This may be why frontal asymmetry has been commonly used to assess emotions [25, 26]. In addition, Table 5 implies the involvement of other regions in emotion recognition, revealing that an appropriate solution considers the representative symmetrical channels per subject, rather than a fixed feature for all cases. In this regard, the proposed methodology is valid for obtaining the optimal feature for each test set.

Table 5 Statistical percentages of optimal asymmetric features based on five scalp regions from 32 subjects of DEAPResults From the MER Experiment

Using DTW and k-NN, the results from the MER experiment are obtained. Here, the analysis and discussion also consider the optimal asymmetric features found. For this aim, the statistical percentages based on five regions are summarized in Table 6, in which the first row shows the region, and the remaining rows list the percentages of the respective test sets (arousal, valence, liking, familiarity, and understanding).

Table 6 Statistical percentages of optimal asymmetric features based on five scalp regions from 36 subjects of MER

In Table 6, the MER results of set-A and set-V are similar to the DEAP results displayed in Table 5, as the optimal asymmetric features are also mainly from the frontal or parietal regions. Such consistency proves that the proposed methodology is available to select the representative symmetrical channels under different experimental conditions. Moreover, regarding the three test sets (set-L, set-F, and set-U) that are not investigated in DEAP, their optimal features are primarily located in the temporal, parietal, and frontal regions, respectively. Usually, the temporal region copes with sound information such as music. Therefore, it is reasonable that the data from this region can assess the liking feeling when listening to the music. As discussed, the parietal region always addresses perceptive information involving audio. Hence, its data can disclose the effect of familiarity evoked by the music. The frontal region controls conscious thought from external stimuli, so its data can help to answer whether the lyrics or musical rhythm is understood by the subjects.

In addition, the statistical percentages in Table 6 exhibit individual characteristics. To further discuss such characteristics, Fig. 10 illustrates the optimal asymmetric features used for recognizing five emotional factors for subjects S9 and S14 of MER, where different colors correspond to various emotional factors. As observed, the locations of the optimal features are adjacent, revealing that the emotional reaction should be a complex procedure that requires a group of surrounding channels to process. Moreover, different factors are typically recognized by particular symmetrical channels, and such properties also vary with the subjects. This may imply that there is no general model of emotion elicitation among the different cases. Previously, Lim [46] claimed that emotion is related to the cultures, backgrounds, and experiences of the subjects, so emotion recognition is likely to be subjective, such as in the results found here.

Fig. 10figure 10

The optimal asymmetric features used for recognizing five emotional factors of two subjects of MER: a S9, b S14

Performance Comparison

A performance comparison with the existing works is summarized in Table 7, in which the first column lists the work and the remaining columns show the number of channels applied for emotion recognition, methodology, and the classification accuracies on various cases correspondingly. In addition, the best of each case is underlined.

Table 7 Performance comparison with the existing emotion recognition works

In Table 7, all of these works consider the symmetrical spatial features to investigate the DEAP dataset. For example, Wang et al. [19] used the NMI matrix derived from the spectrograms of all pairs of symmetrical channels. Mohammadi et al. [20] applied entropies and PSDs from five pairs of symmetrical channels. Kumar et al. [42] utilized bispectrum analysis of the symmetrical channels FP1–FP2. Islam et al. [47] designed Pearson’s correlation coefficient images from all pairs of symmetrical channels. Xing et al. [48] developed a linear mixing model based on the frequency subband power features from all pairs of symmetrical channels. Ahmed et al. [49] proposed a two-dimensional vector consisting of the asymmetry in different brain regions and termed it AsMap. Cui et al. [50] exploited the regional asymmetric features located on the left and right hemispheres of the brain. From the comparisons, even though the accuracies are not the best when using the proposed BRS method, it achieves impressive results through an asymmetric feature extracted from only one pair of symmetrical channels. In addition, deep learning methods such as neural networks achieve superior accuracy. However, their main limitation is that a large training dataset is needed, so all channel data are usually applied, meaning that when the dataset is smaller, it is not easy to train a neural network with outstanding performance. In this regard, the proposed methodology is more suitable for processing a smaller dataset because the number of applied channels is comparatively lower. This property fully considers the trade-off between classification accuracy and the number of channels. Therefore, different approaches can determine various conditions of emotion recognition.

Moreover, this work obtains superior results in the MER experiment, while most of the existing works were without self-designed experiments. This comparison demonstrates that the proposed methodology has stable performances on both the public dataset and the experimental data, indicating that it is reliable for different scenarios. In addition, in this work, the simulation conditions are central processing unit (CPU): Intel Core i5-10,505@3.20 GHz; random access memory (RAM), 8 GB; hard disk drive, 1 TB, 7500 revolutions per minute. Using it, the time of sequencing is approximately 18 s when the EEG length is 30 s, and it is approximately 52 s when the EEG is 60 s. After that, for each subject, with the sequences generated by different channel data, it takes approximately 49 s to extract the asymmetric features using seven similarity measure methods. Finally, regarding the classification through k-NN, SVM, and LDA, the time including the training and testing periods is approximately 31 s. Therefore, the settings of DTW, k-NN, and L30 s are formed, which can simplify the whole classification process. Note that there is no strict memory requirement for the proposed methodology. Undoubtedly, a larger memory size speeds up the simulation runs. In short, the BRS exhibits advantages in simplifying portable emotion-aware devices such as low-cost EEG headsets, which further provide a solution to recognize the emotions of the human being during self-isolation through the use of fewer electrodes or sensors.

留言 (0)

沒有登入
gif