A multimodal dataset for investigating working memory in presence of music: a pilot study

Recent advances in physiological signal measurement techniques enlighten the non-invasive closed-loop brain-machine interface (BMI) design. Such signals can be used to infer individuals' underlying cognitive brain states and regulate them via non-invasive interventions such as music (Fekri Azgomi et al., 2023). The human brain state regulation via musical stimuli would have profound applications in closed-loop BMI (CLBMI) (Ehrlich et al., 2019), neural rehabilitation (Ottonello et al., 2019; Salas et al., 2019), and cognitive impairment treatments (Ray and Mittelman, 2017). As an instance of such applications, the well-known Yerkes-Dodson law, a.k.a. the inverted-U law, explains that optimal cognitive performance can be achieved by setting the cognitive arousal within a moderate level (Yerkes, 1907; Yerkes and Dodson, 1908). This inspires us to design a pilot study involving a human-subject working memory experiment in the presence of arousing music stimuli. We record the multimodal physiological and behavioral signals to investigate the feasibility of regulating one's cognitive arousal and performance via background music with calming and exciting contents.

Several studies have explored how the introduction of music can impact cognitive functioning (Parshi et al., 2019; Khazaei et al., 2021). Multiple studies have used music to influence driving performance (Ünal et al., 2012, 2013). In Huang and Shih (2011) and Kuschpel et al. (2015), the positive effect of music on concentration and the effectiveness of using music to reduce cognitive stress in closed-loop systems have been shown. In our designed working memory experiment, two types of music are employed, namely, calming and exciting music. Notably, the calming and exciting music selected by participants such that the calming and exciting components are supposed to replicate the low and high arousing conditions, respectively. Here, we mainly focus on cognitive performance and arousal in the presence of calming and exciting music.

Cognitive performance is a hidden brain state that demonstrates the general performance of one's cognitive functions (Khazaei et al., 2024). The cognitive functions of humans can be divided into two groups, namely, basic cognitive functions and higher-level cognitive functions. Working memory is one of the basic cognitive functions that provides temporary storage and allows the manipulation of information (Baddeley, 1992). Here, the n-back task serves as a cognitive task of interest, which encourages working memory usage by inducing different cognitive loads (von Janczewski et al., 2021; Fekri Azgomi et al., 2023). Decoding the underlying cognitive performance in a continuous manner is one of the challenges in this paradigm, which can be addressed by employing informative data and applying decoding approaches. One of the most accessible and informative data in this context would be the behavioral data recorded during the cognitive task of interest. We consider the sequence of responses as well as the reaction time of participants as the available behavioral observation and quantify the cognitive performance using Bayesian filters within an expectation-maximization (EM) framework (Amin et al., 2021; Khazaei et al., 2021).

Another hidden brain state that may have impact on cognitive performance is the underlying arousal state. In particular, the amygdala plays a crucial role in connecting arousal to memory formation (McGaugh, 2004). Typically, the arousal state is linked to the degree of physiological alertness (Cudo et al., 2018), and variation in arousal is mainly due to the exogenous and endogenous stimulation, which can be accompanied by neural, hormonal, and other biochemical changes (Hobson and Lindsley, 1988). Specifically, previous research on the autonomic nervous system (ANS) inference presents the electrodermal activity (EDA) as an informative measurement of cognitive arousal (Wickramasuriya et al., 2018; Wickramasuriya and Faghih, 2020a). In particular, the skin conductance (SC) signal (a measure of EDA) reflects the sweat secretions process, which is firmly linked to the underlying ANS activity inside the brain (Amin and Faghih, 2022). The variation in skin conductivity can be used as a quantitative index of arousal during a cognitive task (Khazaei et al., 2021).

When it comes to designing a CLBMI architecture for human cognitive functions, it would be crucial to investigate how the environment and interactions affect the cognitive capacity of the human operator (Cain, 2007). To gain a better insight into BMI design procedures and analyze how different regions of the brain react to the stimuli, neuroimaging methods with relatively high spatial resolution can play a crucial role. These neuroimaging methods include but are not limited to magnetoencephalography (MEG), electroencephalography (EEG), functional magnetic resonance imaging (fMRI), and functional near-infrared spectroscopy (fNIRS) (Berka et al., 2007; Wendel et al., 2009; Baldwin and Penaranda, 2012; Fyshe et al., 2012). The fNIRS is a relatively new neuroimaging technique that can show significant spatiotemporal changes for memory tasks in the prefrontal cortex (PFC) (Hoshi et al., 2003). Also, the fNIRS has a higher spatial resolution than EEG, and it is practical for unobtrusive applications. In the designed experiment, we use the fNIRS and collect data from the prefrontal cortex (PFC) and occipital (OC) areas of the participant.

Multiple studies investigate the brain neural activity in the course of n-back tasks via fNIRS (Ayaz et al., 2007; Roy et al., 2013; Fishburn et al., 2014; Herff et al., 2014). In Ayaz et al. (2007) and Herff et al. (2014), the cognitive loads and hemodynamic responses association is considered, and the hemodynamic responses are classified accordingly. Another study considers the n-back tasks and presents the fNIRS sensitivity to cognitive load and transitioning from a resting state to a task (Fishburn et al., 2014). While the link between the cognitive load of n-back task and corresponding hemodynamic responses is studied vastly, the study of association between the continuous cognitive performance signal and fNIRS data in the presence of music is investigated relatively sparsely. Previous studies such as Meidenbauer et al. (2021) and Struckmann et al. (2021) have considered the distinctive measures of cognitive performance, like the number of correct/incorrect responses and reaction time, and investigated the hemodynamic response data association. While this set of measurements can be an informative index of performance, it merely presents a distinct measure of performance, and might not fully capture the underlying dynamic of the brain's cognitive state (Basu et al., 2023). In this research, we study the association between the oxygenated hemoglobin (HbO) concentration and the continuous performance state decoded from behavioral data. In particular, we employ a Bayesian-based performance decoder that accounts for the underlying dynamic of this brain state, and tracks its continuous trajectory using behavioral data. Also, we evaluate the energy of total hemoglobin (HbT) signal within each music session.

To summarize, the primary purpose of this pilot study is to provide a multimodal database and examine the viability of using music as an intervention. The database includes a diverse array of data encompassing multiple physiological measurements to shed light on the human-subject experiments using music intervention. The available signals include SC data, electrocardiogram (ECG), skin surface temperature, respiration, photoplethysmography (PPG), functional near-infrared spectroscopy (fNIRS), electromyogram (EMG), de-identified facial expression scores, sequence of correct/incorrect responses, and reaction time recorded during the n-back task. In this research, we focus on cognitive performance and arousal, and we present how the HbO concentration is correlated with cognitive performance. Also, the energy of the HbT signal is evaluated with respect to each music. We represent how the arousal and performance indices vary with respect to task difficulty and music type. Then, we discuss our findings and provide a summary of our approach, followed by future directions of this research.

We designed the experiment centered on the working memory task called the n-back task (Herff et al., 2014; Shin et al., 2018). Here, the participant was shown a series of alphabets as stimuli, and the participant had to identify if the most recently displayed alphabet was the same as the alphabet displayed at the nth previous iteration (Herff et al., 2014; Shin et al., 2018; Khazaei et al., 2021). In this type of experiment, the cognitive workload increases with n, and the participant has to recall more of the stimuli with higher values of n. Here, the experiment was conducted within two sessions; session one was accompanied by calming background music, and session two was accompanied by exciting background music. The participants were requested to share their own music with “calming” and “exciting” content to be played during the experiment. We employ this strategy first to enhance the ecological validity, as people often listen to their own music (Ünal et al., 2012); secondly, this strategy aims to ensure that any behavior observed in brain activity is not attributable to disliking the music (Ünal et al., 2012). Finally, this strategy enables us to incorporate the person-specific closed-loop system with personalized intervention. We delivered the background music via an external speaker. The music stimulation started at the beginning of each session, and it continuously played until the end of the session (i.e., 964 s). We performed the experiment in a closed experiment room to minimize the impact of external variables. Each session in the experiment included 8 blocks for each type of n-back task (2 types of n-back task × 8 = 16 blocks within one session). Hence, we have 16 blocks at each session and 32 blocks during the whole experiment. We decided to keep the experimental duration time for a bit longer than half an hour. This ensures that the collected SC signals do not show a lot of relative drift over time due to the accumulation of sweat or shift in the position of electrodes.

Each block includes 5 s instruction followed by 22 trial windows in which a letter stimulus was displayed for 0.5 s and a cross was presented for 1.5 s, which resulted in a total 2 s trial window that participant could deliver the response via Chronos Keypad. Thus, the total duration of each task block was 49 s (5+22 × 2 = 49). In each task block, 30% of the stimuli were a target. The task block of each n-back task was randomized. At the end of each block, a 10 s “RELAX” segment was presented where a resting cross was displayed on the screen. After 8 task blocks (the halfway mark for each session), a 20 s “RELAX” section was presented where a resting cross is displayed on a smart 65 inch TV screen connected via HDMI to a laptop. The duration of each session was 964 s, which is ~16 min. After each session, there was a 2 min relaxation break in which the participant was allowed to relax. A resting cross was displayed on the screen during this time. Figure 1 describes the timing of one session with randomized trials. The experiment took a total of 2, 168 s (calming session duration 964+ intersession break 120+ exciting session duration 964+ after the session rest period 120 = 2, 168 s), i.e., approximately half an hour. Participants were comfortably seated with the attached non-invasive sensors, and a display screen was placed ~1–2 m in front of them. The only required movement was pressing one of two buttons on a Chronos Keypad: the target and non-target buttons. Participants were required to press one of the buttons for each stimulus displayed.

The experimental procedures in this study were approved by the institutional review board at the University of Houston, TX, USA (STUDY00002013). Only participants who were 18 years or older and were able to provide consent were permitted to participate in this experiment. Anyone suffering from known cardiac ailments or psychological disorders was excluded. Adults unable to consent, anyone below age 18, pregnant women, prisoners, students whose grades may be influenced and, economically and/or educationally disadvantaged persons were excluded from the study. A total number of 11 healthy participants (five male and six female) between the ages of 22–25 participated in this study. Six participants with measurement error, data corruption, and small modalities were removed. Hence, the studied sample size here is five (two male and three female) while the de-identified facial expression scores for four participants are included in the main database. All identifiable aspects of the data were removed to ensure privacy. This includes any data that may be used to identify the original participants. The experiment focused on collecting multimodal data to investigate the feasibility of using music as an intervention. The experiment was conducted with a multitude of sensors. We describe the sensors applied in the following subsections.

The near-infrared region (620–1, 000 nm) of the electromagnetic spectrum is scattered by biological tissue but absorbed by hemoglobin (Villringer et al., 1993); by measuring the amount of absorbed near-infrared light and using the modified Beer-Lambert law (Sassaroli and Fantini, 2004), fNIRS measures changes in oxygenated hemoglobin (HbO), deoxygenated hemoglobin (HbR), and total hemoglobin (HbT). The fNIRS demonstrates excellent spatial resolution but relatively poor temporal resolution (Fazli et al., 2012). The spatial resolution of fNIRS can be used to obtain the functional connectivity map of the brain (Santosa et al., 2018). The fNIRS optodes can be placed according to the international 10 − 5 system (Oostenveld and Praamstra, 2001), and readings can be taken from the whole scalp. The fNIRS channels placed during this experiment collected hemodynamic data from the PFC and the OC areas of the brain. In particular, the employed fNIRS sensor is NIRSport 2, configured as shown in Figure 4. The sources (S) and detectors (D) were placed according to the positions depicted in the figure (on a head cap worn by the participants). There were 16 sources and 14 detectors located on PFC and OC areas, and signals were recorded from 44 channels. The sampling frequency is 7.81 Hz.

ECG is the electrogram of the heart. Specifically, it is the electrical signal that correlates with the expansion and contraction of the heart muscle, and it is used to detect heart problems such as arrhythmia. In our experiment, ECG sensors were placed on the torso of the participants, as shown in Figure 2. We collected the ECG data with the MP160 BioPac system and the BioNomadix wireless devices. The EL503 BioPac general-purpose disposable electrodes were used on the torso region. The sampling frequency is 2, 000 Hz.

The respiration belt sensor of the MP160 BioPac system was placed on the abdomen of the participant in contact with the torso as described in the BioPac manual and depicted in Figure 2. The contraction and expansion of the lungs are captured by the belt. The sampling frequency is 2, 000 Hz.

As portrayed in Figure 3, the skin surface temperature data is collected from the minimus digitus (little finger) of the non-dominant hand using the MP160 BioPac system with the BioNomadix wearable device coupled with the BN-TEMP-A-XDCR BioPac sensor. Also, the Empatica E4 wearable wristband worn by the participant collected skin temperature data. The sampling frequency for BioPac is 2, 000 Hz, and for Empatica E4 is 4 Hz.

Sensors from both the MP160 BioPac system and Empatica E4 wearable wristband were used to record EDA. The Empatica E4 wearable wristband was worn on the wrist by the participant. The M160 BioPac system sensors were placed over the digitus quartus manus (ring finger) and digitus medius manus (middle finger) of the participant, as shown in Figure 3. The BioPac EL507 disposable electrodes are used as the leads for EDA. The sampling frequency for BioPac is 2, 000 Hz, and for Empatica E4 is 4 Hz.

Wearable physiological sensor BN-PULSE-XDCR coupled with BioNomadix unit is placed on the digitus secundus manus (index finger) of the non-dominant hand (Figure 3) to obtain PPG data with the M160 BioPac system. Also, the Empatica E4 wearable wristband (worn on the wrist) collected PPG data. PPG is an optical means to detect changes in blood volume in a tissue. PPG is generally used to monitor cardiac health and heart rate. The sampling frequency for the BioPac system is 2, 000 Hz. The sampling frequency for the Empatica E4 wristband is 4 Hz.

As depicted in Figure 3, sensors from the MP160 BioPac system are placed on the participant's trapezius muscle for EMG recordings. The EL503 general-purpose electrodes are used in this case. EMG is used to detect the health of muscles and the nerves that control them. In this experiment, the placement of EMG electrodes provides data about the tensing of a participant's shoulders and back while performing a cognitive stress task. The sampling frequency is 2, 000 Hz.

Facial expression data were recorded via a dedicated camera. Then, the facial expression scores were obtained using Face Reader software. De-identified facial expression scores for four participants are included in the multimodal dataset.

The experimental design, timing, and triggers for different equipment have been executed using the Chronos input device and E-Prime software. The Chronos and E-Prime offer a script-free way to synchronize data with task events. The participant's behavioral signals, including the number of correct and incorrect responses along with reaction times, were collected.

To decode the cognitive state of interest, we employ the Bayesian filtering approach within the expectation-maximization framework. We utilize the SC signal collected via MP160 BioPac system to decode a cognitive arousal state. This is done by considering the arousal events occurrences and their amplitudes as the available observation (Wickramasuriya et al., 2018, 2019, 2022; Wickramasuriya and Faghih, 2019a,b, 2020a,b). Also, to quantify the hidden performance state, we use the sequence of correct and incorrect responses and the reaction time as binary and continuous observations, respectively (Prerau et al., 2009).

where ϵj~N(0,σϵ2) is the process noise and j stands for the time index. Following the marked point process filtering approach (Wickramasuriya and Faghih, 2020a), we consider Bernoulli distribution for neural impulse occurrence (arousal events) nj with probability mass function ajnj(1-aj)1-nj such that P(nj = 1) = aj.

We can relate xj to aj by applying a sigmoid transform (Young et al., 2004). Thus,

where β is a constant that can be derived from β≈log a01-a0 , and a0 is the average probability of observing an impulse during the experiment. As described in Wickramasuriya and Faghih (2020a), the continuous-valued amplitude rj of each neural impulse can be represented as

where vj~N(0,σv2) presents the sensor noise, γ0 and γ1 are the unknown model parameters in arousal state model in Equations (1)–(4), to be determined. Consequently, the joint density function for the observed neural impulse is

(nj∩rj|xj)=, and hidden arousal state xj can be decoded at the same time using an expectation-maximization (EM) framework (Wickramasuriya and Faghih, 2020a). A description of the applied arousal state decoder is available in the Supplementary material. 2.3.2 Performance state estimation

Inspired by the proposed state-space model in Prerau et al. (2009), we model the cognitive performance state as

where zk is the performance state, wk~N(0,σw2) stands for the process noise and k is the trial number during the experiment.

Similar to Prerau et al. (2009), we can form the observation model by specifying one binary observation (correct/incorrect response at kth trial) and one continuous observation (reaction time of the corresponding response). The Bernoulli probability model is assumed for the binary responses with the probability mass function of pkmk(1-pk)1-mk. Applying sigmoid transform we express the pk in terms of zk such that

pk=11+e-(zk+μ). (6)

The constant term μ can be derived from μ≈log p01-p0 where p0 is the average probability of having a correct response.

The reaction time tk can be related to the performance state as

Ik=logtk=α0+α1zk+δk, (7)

where we consider the log of reaction time at each trial (Ik) to follow the linear model with the Gaussian noise term δk~N(0,σδ2); the vector of unknown model parameters in performance state model in Equations (5)–(7), θP= and the performance state zk can be decoded using the EM approach (Prerau et al., 2009; Khazaei et al., 2021). A description of the applied performance state decoder is available in the Supplementary material.

2.3.3 fNIRS processing

Similar to Yaghmour et al. (2021) and Parshi et al. (2019), we preprocess the collected hemodynamic data using the Nirslab software (Xu et al., 2014). The preprocessing steps include bandpass filtering and converting the light intensity data to HbO, HbR, and HbT concentrations (Yaghmour et al., 2021).

We first analyze the HbO data. To study the PFC and OC areas, the combinations of HbO channels are considered such that we cover 16 different brain regions. These brain regions are distributed within the right and left sides of the PFC and OC areas (Donadel et al., 2021). As depicted in Figure 4B, each of the studied areas is surrounded by four channels, and the average HbO concentration can be derived accordingly. We evaluate the epoch of HbO signal over 22 trials (within each task block) of the n-back task with respect to the task difficulty (e.g., 1-back) and music session. The epoch of a signal can be defined as a signal segment within a specific time window. Such evaluation is inspired by the event-related potential (ERP) study, which analyzes a brainwave in response to the stimuli (Luck, 2014). Here, we consider the signal segment over 22 trials (≈44 s), and we perform a similar ERP-like study to derive an epoch of performance state with respect to the type of task (i.e., task difficulty) and music session. Then, we find the Pearson correlation coefficient between the epoch of HbO and continuous performance.

Figure 4. The Functional Near Infrared Spectroscopy (fNIRS) sensor configuration. (A) Optode layout of the fNIRS sources (red), detectors (blue), and channels (green) used during the experiment. The Nasion (Nz, the intersection point of the frontal and nasal bones), Inion (Iz, the occipital protuberance behind the scalp), and Left and Right Pre-auricular points (LPA, RPA, the points anterior to the ears in front of the upper end of the tragus) are labeled accordingly. (B) The channel numbers and studied regions of PFC and OC areas located on right and left hemisphere: LF1- to LF4, RF1 to RF4, LB1 to LB4, and RB1 to RB4.

Additionally, inspired by the study in Wickramasuriya et al. (2023), we investigate HbT signal energy with respect to the presented music types. Particularly, we consider the collected signal from the PFC channels (i.e., channels 23–44) and smooth the signal using a 10 s sample-by-sample running average filter.

2.3.4 Statistical analysis

The differences in human brain structure can lead to variation in behaviors, cognitive abilities, and mental and physical health (Gu and Kanai, 2014). To have a general and person-specific index of cognitive states with respect to an individual's baseline, we formulate the metrics called high arousal index (HAI) and high performance index (HPI) (Wickramasuriya and Faghih, 2020a). These arousal and performance indices can be calculated from prob(statej>threshold) where the threshold has been set to the median of the state values. Here, we decode the hidden arousal and performance state merely based on the SC and behavioral data (Fekri Azgomi et al., 2023). The HAI and HPI are evaluated with respect to task difficulty and music sessions. In particular, we perform the two-sided sign test to compare the 1-back task and 3-back task data (i.e., 1-back vs. 3-back) during each music session (N = 176); similarly, the two-sided sign test is executed to compare the HAI and HPI associated with calming and exciting sessions (calming vs. exciting) during each task difficulty level. We consider 0.01 as the significance level (99% confidence), and the p−values are reported in Tables 1, 2.

Table 1. The performed signrank test with respect to music sessions and n-back difficulty levels given the decoded high arousal index (HAI).

Table 2. The performed signrank test with respect to music sessions and n-back difficulty levels given the decoded high performance index (HPI).

3 Results

Given the HAI boxplots in Figure 5, a higher variation in the arousal matrices with respect to the calming and exciting session is noted, while such variation can not be observed with respect to the task difficulty. The median values of HAI in 1-back and 3-back (1-back vs. 3-back) tasks do not diverge considerably. However, considering the music sessions (calming vs. exciting), the median values associated with the calming sessions do not fall within the range of exciting session boxes. According to the reported p-values and boxplots (Table 1), there is no significant difference between the HAI in calming session 1-back tasks and 3-back ones. In regards to HAI associated with exciting sessions, participants 3 and 4 are the only ones who depict a significant difference between the 1-back task and 3-back task data (Table 1). Considering the HAI with respect to the music, all the participants present significant differences when shifting from a calming to an exciting session (Table 1).

Figure 5. Distribution of high arousal index (HAI) within trials with respect to task difficulties and music sessions. Each sub-plot shows the box plot of the average HAI data within the trials with respect to 1-back task in the calming session (green box), 3-back task in the calming session (blue box), 1-back task in the exciting session (gray box), and 3-back task in the exciting session (red box).

We evaluate the HPI in a similar manner, and as expected, we find that the median of HPI within the 1-back task trials is considerably higher than in 3-back task trials (Figure 6). Also, the HPI median values among all the participants are higher within the 3-back task trials during the exciting session compared to the calming session (Figure 6). The reported p−values in Table 2 depict the significant difference in HPI with respect to the task difficulty as well as music session in most of the cases.

Figure 6. Distribution of high performance index (HPI) within trials with respect to task difficulties and music sessions. Each sub-plot shows the box plot of the HPI data within the trials with respect to 1-back task in the calming session (green box), 3-back task in the calming session (blue box), 1-back task in the exciting session (gray box), and 3-back task in the exciting session (red box).

Figure 7 presents the epochs of HbO concentration and performance across the task blocks for participant 2. As noted earlier, 16 specific brain regions from the left and right sides of the PFC and OC areas are considered. The correlation between the HbO epochs and performance epochs with respect to the types of the n-back task and the played music is illustrated. In particular, Figure 7A displays the correlation in the 1-back task blocks within the calming session; Figure 7B pays attention to the 3-back task blocks within the calming session; Figure 7C addresses the 1-back task blocks within the exciting session; Figure 7D considers the 3-back task blocks within the exciting session. The Pearson correlation coefficient (r) of performance and HbO concentration for each studied brain region is reported in a box next to each subplot. The highest HbO and performance positive correlation for participant 2 corresponds to the RF1 region (right side of the PFC) during the 3-back task blocks within the exciting session. Similar demonstrations for the other participants can be found in the Supplementary material. Also, the highest positive correlation can be observed within the 3-back task blocks for all participants. In particular, we can see that the highest correlation in participants 1–5 corresponds to LB1, RF1, LF4, LF3, and LF2, respectively. Four out of five participants presented the highest performance-HbO correlation with the acquired HbO signal from the left hemisphere. A similar trend can be noted concerning the PFC areas.

Figure 7. The correlation study for the epoch of HbO and performance state across the task blocks for one participant. The sub-figures present the epochs of HbO and performance data recorded within: (A) the 1-back task blocks within the calming music. (B) the 3-back task blocks within the calming music. (C) the 1-back task blocks within the exciting music. (D) the 3-back task blocks within the exciting music. The sub-plots in each sub-figure, from top to bottom, represent: the HbO data collected from the left side of the PFC, right side of the PFC, left side of the OC area, and right side of the OC area. The boxes on the right side of subplots present the Pearson correlation coefficients (r) within the studied brain regions.

In Figure 8, we present the mean energy (black) and mean envelope (blue) of smoothed HbT signal collected from the PFC channels (i.e., channels 23–44). The Figure 8A presents the results within the calming session (green), and Figure 8B is related to the exciting session (red). The 1-back task blocks are indicated with lighter colors, while the 3-back ones are represented with more intense background colors. Aside from participant 3, it can be seen that the peak of the HbT energy is located within the exciting session.

Figure 8. The mean energy and mean envelope of smoothed HbT signal collected from PFC channels. The sub-figures present: (A) The mean energy (black) and mean envelope (blue) of smoothed HbT signal collected from PFC channels within the calming music for participants 1–5. (B) The mean energy (black) and mean envelope (blue) of smoothed HbT signal collected from PFC channels within the exciting music for participants 1–5. The background colors in each sub-panel indicate: the 1-back task during the calming session (light green); the 3-back task during the calming session (dark green); the 1-back task during the exciting session (light red); the 3-back task during the exciting session (dark red).

4 Discussion

We have performed a working memory experiment in the presence of musical stimuli to collect multimodal physiological signals along with brain hemodynamic response signals of participants and evaluate the possibility of using music as an intervention. Participants were asked to perform a working memory task with different difficulty levels and music types (i.e., calming and exciting). The difficulty levels were included to ensure different cognitive loads during the experiment. We have used fNIRS to record the brain hemodynamic response. By incorporating multiple physiological signals in the experimental data collection, we have derived a rich neuro-physiological dataset that could offer a more comprehensive picture of the body and brain's reaction to music and cognitive load. In this dataset, we have implemented the musical stimuli by conducting the cognitive task in two sessions–one with calming music content and the other with exciting music content. The presence of calming music was supposed to mimic the low arousal condition, while the exciting one was simulating the high arousal one. To incorporate the personalized version of CLBMI, we have used personalized music selected by the participants for calming and exciting music. While the applied music intervention offers us a personalized closed-loop architecture, it can induce the subject familiarity with the experiment environment. One possible approach to preserve the person-specific nature of the intervention and reduce the impact of the subject's familiarity is to employ newly generated music based on the subject's preference in future studies (Fekri Azgomi et al., 2023).

In the presented research, we have collected data using both research grade (e.g., Biopac) and wearable devices (e.g., Empatica E4). While wearable devices can be implemented in everyday life settings and they seem to be more aligned with the future closed-loop architecture, the employed signal processing algorithms and estimation framework are more compatible with the research grade devices, and they outperform in the lab settings. Hence, we perform our in-depth analysis based on the data collected using research-grade devices. Although the presented framework includes a multimodal measurement that can be used concurrently for brain state estimation and classification, here, we analyze the SC data, oxygenated hemoglobin, total hemoglobin, and behavioral signals. However, we deliver complete dataset and experimental settings to provide a framework for any researcher who is interested in performing research in this paradigm. Particularly, the collected data presents a unique opportunity for future investigations, and it holds the potential to unlock groundbreaking insights and guide future data collection in this context.

Recording multimodal physiological signals within a cognitive task can provide an opportunity to explore cutting-edge strategies that may extend human cognitive capacities (Mangaroska et al., 2021). However, multimodal physiological data collection comes with certain challenges (Cukurova et al., 2020). Specifically, a suboptimal sensor connection or the presence of motion artifact in the modalities can lead to data corruption (Mangaroska et al., 2021; Fekri Azgomi et al., 2023). Consequently, the participant removal rate in this form of research, as observed in this experiment, can be high. Given the fast-paced advances in biomedical sensor development, one potential approach to address this concern is to use a more advanced setup that employs a lower number of sensors with a more robust connection to collect the signal of interest. Employing the recently developed sensors may pave the way for future data collection in a more practical and efficient manner.

Based on our analysis on the SC signal, we can observe that the HAI level varies significantly within the music sessions, while the task-wise perspective does not reveal a significant variation. One may interpret that the difference in the induced cognitive load by 1-back and 3-back tasks is not significantly high enough to be reflected in the HAI signal. Another interpretation, though less probable, is that the impact of cognitive load on the arousal level is not as much as the effect of music type, and the cognitive load does not seem to have a mediating effect on arousal. Determining a general relationship that holds out of the scope of this dataset requires a comprehensive causality analysis, which is not the main focus of this study.

It should be noted that the HAI for participants 4 and 5 was reduced within the exciting session. This is the opposite of what the exciting content of music is supposed to elicit. This is a notable counterexample and demonstrates that the skin conductance response does not agree with the emotional content of music for participants 4 and 5. On the other hand, we can see that the HbT energy signal reaches its peak within the exciting session in both participant 4 and 5 cases. While the SC is a long-standing index of arousal (Greco et al., 2016), we can observe that decoding the arousal from a single SC measurement might lack the robustness needed to explain the observed behavior. To address this concern in future investigations and experiment designs, the graphene e-tattoos (GET) sensors can be applied (Jang et al., 2022), and the SC data can be collected and analyzed in a multichannel manner (Alam et al., 2023).

According to the presented HPI, except for participant 2, we can see that the performance indices are elevated within the second session, in which the exciting music was presented. While one may hypothesize that the improved performance is a result of establishing the arousal within a moderate range using the exciting music, other factors such as the learning impact and participant's familiarity with the environment can be influential, and we should avoid making any definite conclusion on the impact of music. Here, findings may suggest the potential for integrating music into the closed-loop system. Given the low sample size, the absence of any mental state score, and the possible confounding factors, further studies with a higher number of subjects, mental state annotation, inclusion of a control group, and shuffled task difficulty as well as music sessions are required for a decisive resolution.

In the context of BMI, users may experience various cognitive loads and emotional states during the interaction with a technical system (Herff et al., 2014). Both cognitive load and emotion status can trigger a particular brain response followed by variation in the cognitive performance (Fishburn et al., 2014; Bigliassi et al., 2015). Hence, it is crucial to understand the relation between the brain response under different cognitive loads and environmental stimuli to potentially optimize performance. The performed experiment provides an opportunity to study the hemodynamic response within different cognitive loads and music sessions. The highest HbO and performance positive correlation for participants 1–5 can be seen during the 3-back task blocks (Supplementary material). One may interpret that the HbO data has the potential to be applied as an informative biomarker of performance within high cognitive loads. Also, the observed high HbT energy over the PFC within exciting sessions may be interpreted as participants' higher brain activities within the exciting session compared to the calming session. This conforms with the findings in Zheng et al. (2020). Perhaps the participants had to concentrate more within higher arousal levels (Wickramasuriya et al., 2023). Overall, these findings may be an indication of a new avenue for decoder design research, leading to innovative fNIRS feature extraction to decode the hidden arousal and performance states. It is important to highlight that due to the small sample size, further studies with a higher number of participants and more interventions as well as cognitive loads would be beneficial in drawing a final conclusion.

It would be vital to note that the experiment serves as a pilot study to investigate the viability of using music as the brain state regulator. In general, our findings show variation in collected signals as well as decoded brain states between different music sessions. According to each person's unique physiology and the applied personalized intervention, we are interested in having a personalized perspective rather than a general view. Particularly, the brain structure of individuals and the network-level interaction between cognitive brain states of individuals can be studied independently regardless of the sample size (Hurlburt et al., 2015). The preliminary findings show that music may potentially have ramifications in the BMI system as a form of background stimulation, while more in-depth research is required to fully understand the role of music on arousal and performance. Specifically, in the CLBMI realm, it is crucial to have precise and robust measurements as well as feedback mechanisms. The presented pilot study should be improved to ensure the robustness of the CLBMI pathways. In the future, magnetoencephalography (MEG) can be employed in parallel with fNIRS technique to improve the spatial and temporal resolution of the brain recordings and implement the neurofeedback in our setup (Yucha and Montgomery, 2008). We intend this study to be a stepping stone for formulating such studies and developing more comprehensive experiments.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement

The studies involving humans were approved by Institutional Review Board at the University of Houston, TX, USA. The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.

Author contributions

SK: Data curation, Investigation, Methodology, Software, Visualization, Writing – original draft, Writing – review & editing, Formal analysis, Validation. SP: Investigation, Methodology, Visualization, Writing – original draft, Writing – review & editing. SA: Data curation, Writing – original draft, Writing – review & editing. MA: Data curation, Investigation, Methodology, Writing – review & editing. RF: Conceptualization, Funding acquisition, Investigation, Methodology, Resources, Supervision, Writing – review & editing.

Funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This work was supported in part by the U.S. National Science Foundation under grants 1942585/2226123 - CAREER: MINDWATCH: Multimodal Intelligent Noninvasive brain state Decoder for Wearable AdapTive Closed- loop arcHitectures, 1755780 - CRII: CPS: Wearable-Machine Interface Architecture, and in part by the New York University (NYU) start-up funds to RF.

Acknowledgments

We would like to extend our gratitude to Dilranjan S. Wickramasuriya for his valuable contribution to data collection.

Conflict of interest

RF and MA are co-inventors of a patent application filed by the University of Houston based on this research.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fnins.2024.1406814/full#supplementary-material

References

Alam, S., Amin, M. R., and Faghih, R. T. (2023). Sparse multichannel decomposition of electrodermal activity with physiological priors. IEEE Open J. Eng. Med. Biol. 4, 234–250. doi: 10.1109/OJEMB.2023.33328

View original article

FRONTIERS IN NEUROSCIENCE

分享书签

0 0 0 0 0 0 0

More from this channel

A multimodal dataset for investigating working memory in presence of music: a pilot study

留言 (0)