Impact of ECG data format on the performance of machine learning models for the prediction of myocardial infarction

Background We aim to determine which electrocardiogram (ECG) data format is optimal for ML modelling, in the context of myocardial infarction prediction. We will also address the auxiliary objective of evaluating the viability of using digitised ECG signals for ML modelling. Methods Two ECG arrangements displaying 10s and 2.5 s of data for each lead were used. For each arrangement, conservative and speculative data cohorts were generated from the PTB-XL dataset. All ECGs were represented in three different data formats: Signal ECGs, Image ECGs, and Extracted Signal ECGs, with 8358 and 11,621 ECGs in the conservative and speculative cohorts, respectively. ML models were trained using the three data formats in both data cohorts. Results For ECGs that contained 10s of data, Signal and Extracted Signal ECGs were optimal and statistically similar, with AUCs [95% CI] of 0.971 [0.961, 0.981] and 0.974 [0.965, 0.984], respectively, for the conservative cohort; and 0.931 [0.918, 0.945] and 0.919 [0.903, 0.934], respectively, for the speculative cohort. For ECGs that contained 2.5 s of data, the Image ECG format was optimal, with AUCs of 0.960 [0.948, 0.973] and 0.903 [0.886, 0.920], for the conservative and speculative cohorts, respectively. Conclusion When available, the Signal ECG data should be preferred for ML modelling. If not, the optimal format depends on the data arrangement within the ECG: If the Image ECG contains 10s of data for each lead, the Extracted Signal ECG is optimal, however, if it only uses 2.5 s, then using the Image ECG data is optimal for ML performance.

留言 (0)

沒有登入
gif