Issues in the automated classification of multilead ecgs using heterogeneous labels and populations

Objective. The standard twelve-lead electrocardiogram (ECG) is a widely used tool for monitoring cardiac function and diagnosing cardiac disorders. The development of smaller, lower-cost, and easier-to-use ECG devices may improve access to cardiac care in lower-resource environments, but the diagnostic potential of these devices is unclear. This work explores these issues through a public competition: the 2021 PhysioNet Challenge. In addition, we explore the potential for performance boosting through a meta-learning approach. Approach. We sourced 131,149 twelve-lead ECG recordings from ten international sources. We posted 88,253 annotated recordings as public training data and withheld the remaining recordings as hidden validation and test data. We challenged teams to submit containerized, open-source algorithms for diagnosing cardiac abnormalities using various ECG lead combinations, including the code for training their algorithms. We designed and scored the algorithms using an evaluation metric that captures the risks of different misdiagnoses for 30 conditions. After the Challenge, we implemented a semi-consensus voting model on all working algorithms. Main results. A total of 68 teams submitted 1,056 algorithms during the Challenge, providing a variety of automated approaches from both academia and industry. The performance differences across the different lead combinations were smaller than the performance differences across the different test databases, showing that generalizability posed a larger challenge to the algorithms than the choice of ECG leads. A voting model improved performance by 3.5%. Significance. The use of different ECG lead combinations allowed us to assess the diagnostic potential of reduced-lead ECG recordings, and the use of different data sources allowed us to assess the generalizability of the algorithms to diverse institutions and populations. The submission of working, open-source code for both training and testing and the use of a novel evaluation metric improved the reproducibility, generalizability, and applicability of the research conducted during the Challenge.

留言 (0)

沒有登入
gif