Artificial intelligence enabled ECG screening for left ventricular systolic dysfunction: a systematic review

We identified 7 different AIeECG algorithms (Table 1) in this first comprehensive review of studies using AIeECG algorithms to screen for LVSD. Despite investigating various study populations and using different LVEF thresholds to define LVSD, the majority of studies obtained a high diagnostic accuracy on external validation. Overall, AIeECG seems to be a robust and potentially universal tool to screen for LVSD, which could improve when combined with clinical characteristics such as gender and comorbidities as well as NTpro-BNP.

AIeECG screening for LVSD

Overall, AIeECG had a high diagnostic value when screening for LVSD and resulted in median AUC of 0.90 (IQR from 0.85 to 0.95). The median sensitivity of 83.3% ensures a low number of false negative screen failures while a high specificity of 87% leads to a low number of false positive screen subjects if used for screening.

The studies examined different populations with different prevalences and outcome definitions (Table 1), making direct comparison difficult. Most consistently, there appears to be higher AUC, sensitivity, and specificity in the studies examining hospital populations. A higher AUC may be explained by populations with a few severe cases of LVSD and a large number of patients admitted with non-cardiac conditions.

Risk of bias and certainty of evidence

Retrospective designs were dominant as these are obviously easier to complete. Furthermore, most algorithms were tested in selected populations which is a drawback since results cannot directly translate to future value in clinical practice with consecutive patients. Accordingly, the prevalence of LVSD in the tested populations was higher than in the general population. Selection bias occurs when including patients who already had an ECG and echocardiogram performed, as it selects higher-risk patients who already had an indication for an echocardiogram. This could result in a better performance than may be observed in a prospective study.

Truly external validation was only applied to one algorithm by Attia et al. [12] as this was the only algorithm to be tested in other studies with separate populations [13,14,15,16,17, 25, 26]. More than half of the algorithms were not validated in a truly external dataset as the same dataset was used for both development and validation (the same dataset was split into a training, internal validation, and external validation group) [20,21,22, 24]. The remaining studies used different datasets for training/validation and external validation, respectively, but the majority of algorithms were not truly externally validated [18, 19, 23].

Despite all the differences between the populations used for the development of algorithms, a consistent high AUC was shown by several groups, not just by one dedicated group in a selected population. Therefore, AIeECG has the potential to become a widespread and useful technique in clinical practice in the future.

Clinical characteristics

Six of the included studies [12,13,14, 17, 18, 25] performed sub-analyses according to comorbidity, age, and sex (Table 2). AI diagnostic performance seemed lower in populations with comorbidity (Table 2), but data lacked power for a formal statistical analysis. Populations without comorbidities were associated with a higher AUC in two out of three studies [12, 17]. Overall gender did not affect the algorithms, although a single study found that the algorithm was significantly better in men [17]. The algorithms seemed to perform better in the younger populations, maybe due to less comorbidity, but this was only reported significant in two of the five studies [17, 18]. It is a major limitation that the impact of ethnicity on the performance of the algorithms was not investigated, because several studies have shown that different ethnicities display different ECG characteristics. Therefore, ethnicity specific ECG reference ranges/cut-offs are pertinent to investigate [28,29,30].

It is a strength regarding screening that the performance of AIeECG was not strongly associated with gender and age as the overall performance was strong in these subgroups. Only few studies investigated this, therefore it needs to be further investigated in future studies.

Potential for improvement

In clinical practice, more useful information beyond the ECG is available for the clinician and for an AI algorithm. Therefore, the diagnostic value of AIeECG combined with demographics and clinical information should be further tested.

Very few studies compared their AIeECG algorithm with the performance of BNP or NT-proBNP measurements as a screening method for HF [12, 13, 16, 22, 26]. Two studies found that their AIeECG algorithm outperformed natriuretic peptide measurement when identifying reduced LVEF [12, 22]. The addition of NT-proBNP to the AIeECG marginally improved detection of LVSD [13, 26] and resulted in a higher specificity and fewer false-positive screen cases, without increasing the number of false-negative screenings [16].

We cannot yet conclude that AIeECG outperforms BNP and NT-proBNP measurement as a screening method, but it seems that AIeECG may be more stable across age and gender than BNP [12, 18, 31], and a combination might therefore be the optimal way of screening.

Detection vs. prediction

Besides detecting LVSD with a high diagnostic accuracy some of algorithms also predicts future LVSD or HF events.

In three studies [12, 23, 25], patients with a false positive AIeECG at the time of screening had a significantly increased risk of subsequently developing LVSD compared to patients with a true negative AIeECG screening. Another study found that patients with a false positive AIeECG were more susceptible to major adverse cardiovascular events (HR 1.5) compared to patients with a true negative AIeECG [24].

In one of the studies, the algorithm was able to predict newly emerging HF pathology, as well as aggravating cardiac pathology by detecting subtle changes in a newly recorded ECG compared to a previously recorded ECG [20].

These findings support the notion that AI algorithms detect subtle or subclinical ECG changes that are associated with risk of future LVSD. This notion was corroborated in a study screening for HF with preserved ejection fraction (HFpEF) where a significant proportion of false positives remarkably developed HFpEF during follow-up [32]. The AIeECG has the ability to predict a wide range of pathologies, even simultaneously, such as HFpEF, right ventricular dysfunction and more [20, 23, 24, 32, 33]. Hence, the gain of AIeECG is most likely greater than we demonstrated in this review.

Clinical implications of AIeECG

With the emerging AIeECG technology, the idea of screening for LVSD is worth revisiting, especially in combination with BNP/NTproBNP and basic clinical information. The health expenditures for patients diagnosed with HF are expected to rise in the coming years [34]. Lifesaving treatment to prevent the development of HF, hospitalization and death are evident if early identification is possible [12].

Clinical implication in regard to early diagnosis of low EF has been investigated in the recently published EAGLE trial [1] which substantiates that the concept of screening in the primary sector is viable. The study found that the use of an AIeECG algorithm increased the diagnosis of low EF in the overall cohort (1.6% in the control arm versus 2.1% in the intervention arm), suggesting a modest but significant gain from using AIeECG. Importantly, the use of AIeECG was not associated with an overall increased use of echocardiography, but instead an increased use of echocardiography on more relevant patients.

Notably, AIeECG had the highest value for primary care physicians and the lowest value when applied in hospital wards. These findings suggest that AIeECG may be most useful for clinicians who are not routinely interpreting ECGs, and less useful in settings where echocardiograms are performed routinely. A clear distinction does not exist as Katsushika et al. [21] showed how AIeECG proved to help even cardiologist with > 7 years’ experience.

ECG screening for LVSD is potentially feasible but cost-effectiveness and clinical implication are yet to be fully investigated [35]. So far, only two studies have investigated cost-effectiveness of screening for LVSD [25, 27]. Under most clinical scenarios, screening was cost-effective with a cost of < $50,000 per QALY [27]. It was estimated that numbers needed to screen to identify one case of LVSD corresponds to 90.7 AIeECGs’ and 8.8 echocardiograms when screening the total population. But it could be reduced to 67.4 AIeECGs’ and 5.6 echocardiograms when screening a “high risk” population [25]. Cost-effectiveness increased with higher disease prevalence and better sensitivity of the AIeECG method. Thus, cost-effectiveness can possibly be improved if screening is applied to subjects with preexisting cardiovascular risk factors, abnormal natriuretic peptide, diabetes mellitus, hypertension, and ischemic heart disease [27].

Future use of AIeECG

AIeECG algorithms will most likely be implemented in ECG machines in a few years. A new study has even shown that it can be built into a stethoscope to detect low EF during cardiac auscultation [36], the possibilities are plenty. One attribute of AIeECG is the high specificity which may guide physicians to refer patients with abnormal AIeECG findings and increase the likelihood of identifying the patients at the highest risk.

In the future, multiple groups can and will be able to produce effective algorithms, but standardization is required to compare effectiveness of algorithms. One solution could be external validation of algorithms in a multinational common dataset of paired ECG’s and outcome measures.

We anticipate that a breakthrough for these algorithms will occur when they are combined with other risk markers, possibly natriuretic peptides, and tested prospectively including management actions based on the screening results. Success may be defined when a 15–20% reduction of HF hospitalization or all-cause mortality is demonstrated in comparison with standard of care. But less ambitious goals such as demonstrating a reduction in the cost per identified patient with LVSD are also valid as it leads to more rational use of resources.

Strengths and limitations

The strengths of this literature study are the simple, well-defined research questions and the clinically focused literature search, consisting of both Mesh terms and free-text words. The study followed PRISMA guidelines for reporting results, but some limitations of this work must be acknowledged when considering the findings. We searched Pubmed and Cochrane databases and only included peer-reviewed studies of high quality to focus on clinical aspects rather than technical differences between the algorithms. We focused solely on LVSD and a 12-lead ECG which could have led to exclusion of otherwise relevant studies that examined AIeECG based on one or three lead electrocardiograms. Furthermore, although it is a limitation that we did not use a formal “risk-of-bias assessment tool,” we aimed to minimize the risk of bias by using strict study selection criteria.

Due to few studies and lack of statistical power, we summarized data instead of making formal statistical tests. Many more studies and algorithms will without doubt evolve over the next years, allowing for more accurate estimates of accuracy. We suspect such result will most likely lie within the range reported here. Still, the main objective of our review was to examine whether AIeECG works generically and to point at strengths and possibilities for improvement.

留言 (0)

沒有登入
gif