Relationship between electrocardiogram‐based features and personality traits: Machine learning approach

1 INTRODUCTION

Electrocardiography (ECG) is a non-invasive clinical technique for monitoring electrical heart activity in cardiovascular diagnostics. Recently, the rich collection of non-traditional applications of ECG-based parameters emerged despite partial or incomplete comprehension of their relevance (Chen, 2018).

In this study, we explored the relationship between ECG-based parameters and personality traits, that is, stable patterns of emotion, motivation, cognition, and behavior (DeYoung, 2015). The most influential, contemporary models of personality postulate the existence of five (McCrae et al., 2005), six (HEXACO, Ashton et al., 2004), or seven broad traits (recently proposed by some authors, such as (Ashton & Lee, 2020; Knezevic et al., 2017)) subsuming many narrower ones in the lower level of hierarchy. These traits are found to be universal in humans (McCrae et al., 2005) and subhuman species (Gosling & John, 1999), longitudinally stable, with about 40% of their variability heritable (Vukasović & Bratko, 2015). Available evidence indicates that personality traits have profound relationships with peripheral physiology. A modular influence of brain structures implicated in personality traits, such as orbitofrontal and insular cortex, amygdala, hippocampal formation, and hypothalamus (Deckersbach et al., 2006; Depue & Collins, 1999; Koelsch et al., 2007; Panksepp, 1998), seems to be responsible for these relationships. In addition, data show connections between personality traits and peripheral organs and tissues through the autonomic, endocrine, and immune systems (Cloninger, 2000; Depue & Collins, 1999; Irwin, 2008). Therefore, due to the well-known and established influence of the autonomic nervous system on ECG, finding the connection between ECG signal and personality traits seems promising.

Available evidence showed that heart rate decreases and heart rate variability (HRV) increases with Extraversion (Brouwer et al., 2013), Neuroticism correlates with QT interval (Minoretti et al., 2006), and Agreeableness correlates with P, QRS, and T amplitude (Koelsch et al., 2012). Typically, the relationship between personality traits and physiological measures is investigated descriptively, that is, using correlations (Koelsch et al., 2007) or by trying to predict cardiac output with scores on personality questionnaires.

We used the supervised machine learning (ML) approach to examine this relationship. ML is a computer algorithm that automatically assigns traits to the input set of ECG-based features by going through the training and testing phase. The training phase is used for constructing an optimal model that learns from the available ECG features and corresponding traits, while the testing phase is used to evaluate ML performance. Here, we adopted random forest (RF) ML algorithm for trait classification and feature selection as it achieved high prediction accuracy in similar ECG-based investigations (Dissanayake et al., 2019; Melillo et al., 2015) and it is suitable for processing a large number of variables with complex interactions (Breiman, 2001; Strobl et al., 2009).

Random forest ML was applied on ECG-based features with proven clinical efficacy in diagnostics, that is, clinically relevant features (Electrophysiology, 1996; Wagner et al., 2008) and on other parameters due to their attractive and practical characteristic as they are calculated from the local ECG extremes being more robust to noise than standard clinically relevant parameters (Arteaga-Falconi et al., 2016; Cabra et al., 2018) and have proven efficacy in previous studies (Cabra et al., 2018; Israel et al., 2005; Sansone et al., 2013; Shen et al., 2010).

1.1 Aim of the study

We test a novel approach for extracting ECG-based features related to personality traits with RF ML algorithm applied on 62 ECG-based parameters and investigate perceptible changes within intervals of parameters in healthy individuals, to detect the possible relationships between ECG and individual differences in personality traits. An exploratory analysis of ECG-based feature selection is presented.

2 METHODS AND MATERIALS

Electrocardiogram data analyzed in this study were recorded for another project aiming to investigate emotions and affects by the means of physiological measurements (Bjegojević et al., 2020). We used baseline recording of 120-s long ECG segment recorded in sitting position before the emotion induction to avoid subjects’ emotion influence.

2.1 Study sample

The sample consisted of 71 university students, average age 20.38 years (SD = 2.96), 78.8% female. Exclusion criteria were previous cardio-vascular disorders. The study has been approved by the Institutional Review Board of the Department of Psychology, University of Belgrade No. 2018-19. Respondents signed informed consents in accordance with the Declaration of Helsinki.

2.2 Assessment of personality traits

The HEXACO Personality Inventory-Revised HEXACO PI-R (Lee & Ashton, 2018) contains 100 items with a 5-point Likert-type scale ranging from 1 (strongly disagree) to 5 (strongly agree). It assesses six personality domains: Honesty/humility, Emotionality, eXtraversion, Agreeableness, Conscientiousness, and Openness. We used the Serbian form of HEXACO PI-R (Međedović et al., 2019). Domain scores were calculated as the average scores on all items mapping specific domains (ranging from 1 to 5). The Disintegration trait was measured via a DELTA questionnaire containing 110 items with the same 5-point Likert-type scale. The score is calculated as the average of scale items (also ranging from 1 to 5).

2.3 Recording procedure

Upon arrival, all respondents were introduced to the study and fitted the BIOPAC sensors (Biopac Systems Inc.) (Bjegojević et al., 2020). Subjects were seated and instructed to relax with eyes open and to avoid movements as much as possible to reduce the artifacts. ECG signals were visually inspected for quality on site. All subjects were blinded for the ECG signal and related parameters. Personality measures were collected separately, before physiological measurements.

Electrocardiogram signals were recorded from standard bipolar Lead I using the BIOPAC MP150 unit with AcqKnowledge software and ECG 100C module with surface H135SG Ag/AgCl electrodes (Kendall/Covidien). Before electrode placement, the skin was cleaned with Nuprep gel (Weaver & Co.) to reduce skin–electrode impedance. The sampling frequency was set at 2000 Hz.

2.4 ECG preprocessing and feature extraction

The complete procedure of ECG preprocessing and feature extraction is described in Boljanić et al. (2021). Computed ECG peak locations and corresponding absolute peak amplitudes were employed for extracting three groups of clinically relevant and clinically non-relevant features based on the HRV, temporal parameters, and relative amplitude.

We used three domains to calculate HRV-based features: time, frequency, and geometry. The overview of HRV-based features is displayed in Table 1 together with the relevant references related to its application and calculation. All HRV-based features were classified as clinically relevant features, except for the HRV index, as it has been defined and consequently used for 24-h ambulatory ECG monitoring and not for short-term recordings of 2-min duration as applied here (Cripps et al., 1991; Kouidi et al., 2002). Therefore, we applied RF ML on all features with and without the HRV index.

TABLE 1. Heart rate variability (HRV)-based features for three feature domains (time, frequency, and geometry) with corresponding units and related references Feature Feature domain Unit References Description HR mean Time bpm Abadi et al. (2015), Kim and Andre (2008), Tulppo et al. (1996) Average heart rate RR mean Time s Abadi et al. (2015), Dissanayake et al. (2019), Electrophysiology (1996), Kim and Andre (2008) Average of all RR intervals rmssd Time s Abadi et al. (2015), Abbasi (2004), Dissanayake et al. (2019), Electrophysiology (1996), Kim and Andre (2008) Root mean square of all RR intervals sdnn Time s Standard deviation of all RR intervals m_nn Time s Abadi et al. (2015), Dissanayake et al. (2019), Kim and Andre (2008), Koelsch et al. (2012) Maximal RR interval nn50 Time count Abadi et al. (2015), Abbasi (2004), Dissanayake et al. (2019), Electrophysiology (1996), Kim and Andre (2008) Number of pairs of adjacent RR intervals differing by more than 50 ms in the entire recording pnn50 Time % nn50 count divided by the total number of all RR intervals sdsd Time s Abadi et al. (2015), Abbasi (2004), Electrophysiology (1996), Kim and Andre (2008), Tulppo et al. (1996) Standard deviation of differences between adjacent RR intervals HRV index Time n.u. Abbasi (2004), Cripps et al. (1991), Electrophysiology (1996), Kouidi et al., 2002) HRV triangular index - integral of the density distribution (the number of all RR intervals) divided by the maximum of the density distribution at a discrete scale of 1/fs bins, where fs is a sampling frequency LF Frequency s2 Abadi et al. (2015), Abbasi (2004), Dissanayake et al. (2019), Electrophysiology (1996), Kim and Andre (2008), Koelsch et al. (2012), Tulppo et al. (1996) Spectral power of low frequency (0.04–0.15 Hz) HF Frequency s2 Spectral power of high frequency (0.15–0.40 Hz) LFHF Frequency n.u. Abbasi (2004), Dissanayake et al. (2019), Electrophysiology (1996), Kim and Andre (2008), Koelsch et al. (2012), Tulppo et al. (1996) LF to HF ratio LFnu Frequency n.u. Abbasi (2004), Dissanayake et al. (2019), Electrophysiology (1996) LF in normalized units in relation to the total power without very low frequencies HFnu Frequency n.u. HF in normalized units Total power Frequency s2 Total PSD power SD1 Geometry s Dissanayake et al. (2019), Kim and Andre (2008), Koelsch et al. (2012) Length of the transverse line of the Poincaré plot in the perpendicular direction. Poincaré plot presents a scatter plot of the current RR interval in relation to the prior RR interval. SD2 Geometry s Length of the longitudinal line of the Poincaré plot in the perpendicular direction Abbreviations: bpm, beats per minute; n.u., no unit.

The overview of extracted temporal features is displayed in Table 2.

TABLE 2. Temporal features—clinically relevant and clinically not relevant parameters with normal values and ranges where applicable Distance Description Features References Normal range (s) PR Measured from the fiducial point P to the R peak PR_min, PR_max, PR_mean, PR_median, PR_sd Cabra et al. (2018), Dissanayake et al. (2019) Na ST Measured from the fiducial point S to the fiducial point T ST_min, ST_max, ST_mean, ST_median, ST_sd Na QRS Measured from the fiducial point Q to the fiducial point S QRS_min, QRS_max, QRS_mean, QRS_median, QRS_sd Na PR intervala Measured from the beginning of the P wave to the beginning of the QRS complex PRinterval_mean, PRinterval_sd Wagner et al. (2008) 0.12–0.20 PR segmenta Measured from the end of the P wave to the beginning of the QRS complex PRsegment_mean, PRsegment_sd 0.05–0.12 ST intervala Measured from the end of the QRS complex to the end of the T wave STinterval_mean, STinterval_sd 0.42 ST segmenta Measured from the end of the QRS complex to the beginning of the T wave STsegment_mean, STsegment_sd 0.005–0.150 QRS complexa Measured from the beginning of the QRS complex to the end of the QRS complex QRScomplex_mean, QRScomplex_sd 0.08–0.12 P wavea Measured from the beginning of the P wave to the end of the P wave Pwave_mean, Pwave_sd ≤0.12 T wavea Measured from the beginning of the T wave to the end of the T wave Twave_mean, Twave_sd 0.10–0.25 QTc intervala Measured from the beginning of the QRS complex to the end of the T wave and compensated according to Bazzet's formula QTnorm_mean, QTnorm_sd

Men: <0.45

Women: <0.46

0.35–0.43 (QT)

Abbreviations: na, not available; QTc, corrected QT interval; Suffixes _min, _max, _mean, _median and _sd stand for minimal value, maximal value, mean, median, and standard deviation, respectively. a Clinically relevant parameters. Bazzet’s formula:urn:x-wiley:1082720X:media:anec12919:anec12919-math-0001.

The overview of extracted amplitude-based features is displayed in Table 3. The Ek parameter has been suggested as a cardiac signature of emotionality and personality in previous studies (Koelsch et al., 2007, 2012). It presents a weighted linear relation of ECG amplitudes unrelated to the person’s BMI with a direct correlation with Emotionality. Thus, higher Ek indices correspond to higher Emotionality measured by the Revised Toronto Alexithymia Scale (Taylor et al., 1992) and vice versa. Originally, Ek indices are determined from the 12-lead resting ECG (Koelsch et al., 2007, 2012). By carefully studying the proposed Ek and its practical significance (BMI and electrode positioning compensations), we concluded that Ek can be calculated for one-channel ECG.

TABLE 3. Amplitude-based ECG parameters Distance Feature (n.u.) References Description PRa PRa_mean, PRa_sd Cabra et al. (2018), MNUA Relative amplitude differences between P and R RQa RQa_mean, RQa_sd Arteaga-Falconi et al. (2016), Cabra et al. (2018) Relative amplitude differences between R and Q RSa RSa_mean, RSa_sd Relative amplitude differences between R and S RTa RTa_mean, RTa_sd Cabra et al. (2018), MNUA Relative amplitude differences between R and T Sta STa_mean, STa_sd Relative amplitude differences between S and T QSa QSa_mean, QSa_sd Relative amplitude differences between Q and S Ek Ek_mean, EK_sd Koelsch et al. (2007, 2012) Calculating formula is available in Boljanić et al. (2021) Abbreviations: MNUA, Mentioned in literature not used for analysis; n.u. no unit.

The ECG signal with marked time distances and amplitude differences is shown in Figure 1.

image

Normal heartbeat ECG signal marked with temporal and amplitude-based features: clinically not relevant (left-hand panel) and clinically relevant (right-hand panel) parameters

2.5 Analytic strategy

We applied RF ML separately for each personality trait. As psychological test results ranged from 1 to 5, to perform classification and test our hypothesis on a more distinctive personality scores grouping, we used the following reasoning for splitting data: 1 for 1.00–1.50, 2 for 1.51–2.50, 3 for 2.51–3.50, 4 for 3.51–4.50, and 5 for 4.51–5.00. The distribution of classes is presented in Figure 2.

image

Distribution of five categories of personality traits for 70 subjects presented with box plots

Random forest is an ensemble ML algorithm, consisting of basic models called decision trees where the predictions of all individual trees are combined. Each tree returns a predicted class for the same classification problem and the class that most trees vote for is returned as the prediction of the ensemble and as the final outcome of the algorithm. RF also enables the calculation of feature importance by counting the number of times each variable is selected by all individual trees in the ensemble termed feature importance. Unlike other nonlinear classifiers, RF ML is robust to over-fitting (working perfectly well on a small dataset and poorly on a more general dataset) and yields good classification results even without extensive tuning of the algorithm parameters (Breiman, 2001; IJzerman et al., 2016; Shen et al., 2007; Zhou et al., 2019). RF ML was also used to estimate variable importance.

Parameters were split into three groups and RF was applied on all parameters with (62 overall) and without HRV index (61), and on clinically relevant parameters (34). By clinically relevant parameters, we observed HRV-based features except for the triangular index (16), temporal features (8 × 2), and Ek (2). Each dataset was divided into a training and a testing set (75% and 25% of data, respectively (Attia et al., 2019)). We used R function createDataPartition that randomly splits the data taking into the class distribution balance. We further applied 10-fold cross-validation on the training set using trainControl function that provided an overall accuracy estimate (Ross et al., 2009).

For RF ML application, we tuned decision trees used in the forest (ntree) and random variables used in each decision tree (mtry) by application of tuning Caret procedure to minimize parameters effect on the final accuracy (Brownlee, 2016). We reported mean classification accuracies and confident intervals.

For personality traits with accuracies ≥75%, the first 10 feature importances were plotted for three sets of parameters. We used the varImp function from the Caret package for ranking features by importance. Furthermore, to assess the degree of association between the test scores (both original and mapped into categories) and the top 10 features as in Melillo et al. (2015), we used the Spearman correlation coefficient and calculated the statistically significant correlations as suggested before (Koelsch et al., 2007; Minoretti et al., 2006). p Values were set to .05, .01, and .001.

3 RESULTS

Descriptive statistics for all personality measures are shown in Table 4.

TABLE 4. Descriptive statistics (N = 71) Personality traits M SD Range Skew Kurt Honesty/Humility 3.57 0.66 1.69–4.88 −0.54 2.98 Emotionality 3.47 0.67 1.44–4.88 −0.22 3.24 Extraversion 3.34 0.71 1.56–4.50 −0.37 2.79 Agreeableness 3.13 0.71 1.63–4.75 −0.22 2.72 Conscientiousness 3.67 0.67 1.88–4.94 −0.42 2.67 Openness 3.85 0.59 1.81–4.88 −0.97 4.14 Disintegration 2.07 0.50 1.10–3.81 0.97 4.87 Abbreviations: Kurt, kurtosis; M, mean; SD, standard deviation; Skew-skewness.

In Table 5, mean classification accuracies when 10-fold cross-validation of RF ML algorithm was performed with 95% confident intervals for all seven personality traits when all features and only clinically relevant features were used are presented. Classification accuracies for the special case (without HRV index) are also presented (Table 5).

TABLE 5. Mean classification accuracies for personality traits using all features and only clinically relevant Trait All features (62) All features without HRV index (61) Clinically relevant features (34) Mean accuracy [%] 95% Confident interval Mean accuracy [%] 95% Confident interval Mean accuracy [%] 95% Confident interval Honesty/Humility 75.0 47.6–92.7 75.0 47.6–92.7 75.0 47.6–92.7 Emotionality 31.3 11.0–58.7 37.5 15.2–64.6 43.8 19.8–70.1 Extraversion 35.3 14.2–61.7 52.9 27.8–77.0

留言 (0)

沒有登入
gif