Automatic detection of cardiac conditions from photos of electrocardiogram captured by smartphones

Introduction

In recent years, rapid advances in artificial neural network technologies have revolutionised all industries including the cardiology field.1–3 Increasingly accurate ECG diagnostic algorithms for detecting conduction system disorders,4–6 myocardial infarction,7 8 valvular disease,9–11 cardiomyopathies,12 13 heart failure with reduced ejection fraction,14 electrolyte disturbance15 and accessory pathways,16 and for predicting mortality risk17 have been developed by researchers using machine learning approaches. However, most of these diagnostic algorithms cannot be used in real-world clinical practice because most of the older generation ECG machines do not permit installation of new diagnostic systems.

One possible method for enabling real-world use of these diagnostic algorithms is to develop a smartphone application that automatically extract ECG waveforms from photos and to convert them to voltage-time series for downstream analysis by a variety of classification systems built by researchers.

Previous attempts to extract waveforms from scanned ECG by computer vision-based approach often failed to achieve fully automatic waveform extraction and required extensive manual intervention by users.18–22 Recent advances in deep learning-based object detection and image segmentation techniques have made it possible to achieve fully automatic waveform extraction from photos of ECGs taken using a smartphone.

The objective of this study was to develop and validate DigitHeart, a smartphone application that uses deep learning-based objective detection and image segmentation techniques to automatically extract ECG waveforms from ECG images. The extracted waveforms in voltage-time series data were further analysed by machine learning-based diagnostic algorithms. The accuracies of waveform extraction and diagnostic algorithm were evaluated.

MethodsRaw dataset

Patients with atrial fibrillation (AF) were drawn from the AF registry of Queen Mary Hospital, University of Hong Kong and included in this study. Patients were identified using Clinical Data Analysis and Reporting System (CDARS) of the Hospital Authority, Hong Kong, and International Classification of Diseases, Ninth Revision code 427.3. Scanned grayscale ECGs for the period 2000–2020, and colour-photographed ECGs were acquired. Photographed ECGs were acquired using 12 megapixels camera from iPhone X (Apple, USA). All ECGs were anonymised by masking the patient’s identity before use. Clinical data of patients were acquired using CDARS.

Smartphone application and deep learning models for ECG waveform extraction

The DigitHeart system is a cloud-based smartphone application for automatically extracting waveforms from images of ECGs and analysing them using machine learning-based ECG diagnostic algorithms. To use the DigitHeart system, a clinician needs to photograph ECGs using the smartphone application, which will then upload them to the server for further processing. The backend server processes the images in three steps: first, an object detection model is used to locate and label waveforms on ECGs. Second, an image segmentation model removes gridlines and other background noise. Finally, a novel scale calibration technique is applied to the voltage marker to derive the voltage-time series data. The digitised ECG is then passed to a diagnostic algorithm for analysis. The diagnosis is passed back to application on the smartphone (figure 1, online supplemental table S1). Key hardware of the backend server include Intel Core i7-8086K Processor (Intel, USA), ASUS GeForce RTX 2080 Ti O11G (Asus, Taiwan) and 16 GB random-access memory.

Figure 1Figure 1Figure 1

DigitHeart is an artificially intelligent powered system trained for fully automatic ECG waveform extraction from scanned and photographed ECGs, enabling the generation of diagnoses. A clinician first upload the ECG photo via a smartphone application to the cloud server. In the server, an object detection model was used to locate, and label waveforms on the ECG. Second, an image segmentation model removed gridlines and other background noise. Finally, a novel scale calibration technique was applied to the voltage marker to derive the voltage-time series data. Then the digitised ECG waveforms was sent for further analysis by the diagnostic algorithm. The final diagnosis was returned to the clinician.

Training dataset

Machine learning models were trained with a paired training dataset generated using computer vision techniques, which will be described in detail in the following sections. A web-based graphic user interface was used to manually review and edit the extracted waveforms to enhance quality of the training data. Django 2.2.1 was used as server to store all original and annotated datasets. Only scanned ECGs were used for training.

Waveform localisation and labelling

To generate a training dataset for locating and labelling the leads, technicians manually labelled all waveforms on 6059 scanned ECGs using our web-based data annotation platform (figure 2A). The generated dataset was augmented by random rotations, brightness and contrast adjustment and noise addition to compensate for the limited sample size and to reduce overfitting in training (figure 2B). An object detection neural network was trained using the augmented dataset to locate and label ECG lead segments. In order to minimise the effect of suboptimal lighting due to shading and glare, photographed ECGs underwent local background subtraction with median filtering using a large-sized filter followed by normalisation using OpenCV 4.1.0.25 before processed by the objection detection model (figure 2C). TensorFlow GPU V.1.15 and TensorFlow Object Detection API were used for this purpose, employing Faster R-CNN with Inception V2 as the backbone. The model was trained with a momentum optimiser and a progressively declining learning rate from 2×10–4 to 1×10–7.

Figure 2Figure 2Figure 2

Waveforms localisation, cropping and naming. (A) To generate a training dataset, technicians manually labelled all waveforms on 6059 scanned ECGs using our web-based data annotation platform. (B) The manually generated dataset was augmented by random rotations, brightness and contrast adjustment and noise addition to increase the effective training samples to enhance neural network training and reduce overfitting. (C) Using the augmented dataset, an object detection neural network was trained to automatically locate and name ECG lead segments using TensorFlow GPU V.1.15 and TensorFlow Object Detection API. The model used Faster R-CNN with Inception V2 as the backbone. (D) Photographed ECGs had suboptimal lighting compared with scanned ECGs due to shading and glare that affected waveform extraction. Local background subtraction with median filtering using a large-sized filter was applied followed by normalisation using OpenCV 4.1.0.25 to remove shading and glare.

Gridlines and background noise removal

To generate a training dataset for segmentation model for waveform extraction, we developed a ‘sequential addition approach’ to isolated waveform signal from gridlines and noise. The technique we used is opposite to more straightforward ‘one-off subtraction approach’, which aims at performing one-off removal of background noise by setting a specific threshold value for colour intensity or other binarisation tools. In this iterative process, we initially used a higher cut-off value for binary thresholding in OpenCV 4.1.0.25 to process the raw image. This would output image with the least number of pixels from both waveform and background noise. Horizontal dilatation is then performed to preferentially connect pixels from waveforms rather than background noise. Then, only connected components with sufficient horizontal length, which are likely waveforms, are added to a final output mask, and the remaining are discarded. Subsequently, the process will be repeated using a slightly lower cut-off value for binary threshold, which will output more pixels than previously from both waveforms and noise. Similar horizontal dilatation and connected component with horizontal width filtering is performed to identify new areas of waveforms to add to the final output mask. The iterative process continues with decreasing cut-off values for binary thresholding. At each iteration, the algorithm identifies new regions of the waveform and adds them to the final output mask. This approach ensures that as much of the waveform signal as possible is preserved while minimising the inclusion of background noise and gridlines. Given the significant variations in scanning quality, brightness and contrast among ECGs from various hospitals and units, defining a common final cut-off value for binary threshold for all ECGs proved to be ineffective. Consequently, multiple final cut-off values for binary thresholding were applied for each ECG to generate multiple variations of output images. These variations were assessed manually to select a single most optimal image that minimised gridline presence while preserving the integrity of the ECG signals (figure 3A). Additional data augmentation, such as random rotations, brightness and contrast adjustment, and noise addition, were implemented to increase the effective training size. An UNet architecture image segmentation model with ResNet-101 as feature extractor was used for training.23 Adam optimiser with an initial learning rate of 1×10–5 was used (figure 3B).

Figure 3Figure 3Figure 3

Gridlines and background noise removal. (A) Since ECG tracings were darker in colour than the background gridlines, ‘sequential addition approach’ with OpenCV 4.1.0.25 was used to filter out the gridlines as described in main text. Multiple cut-off values for binary thresholding were applied to each ECG to generate multiple variants. Technicians manually evaluated different threshold values of the ECG images and selected the version with the least remaining gridlines and most preserved ECG signals. (B) An image segmentation model with a UNet architecture and ResNet-101 as backbone was trained using the dataset prepared to automatically remove gridlines and background noise. (C) Voltage markers for scale calibration. The height of voltage marker was measured using template matching with OpenCV 4.1.0.25. Briefly, six kernels with size 3×3 pixels were used to represent six features of the voltage marker, including the four corners and the two blind-ended sides. Each kernel was convolved over the entire image. The location with the minimal sum of absolute differences was defined as the most likely location of the six representative features. The height of voltage markers in pixel-units was measured accordingly. Voltage markers had a fixed height of 10 mm on standard ECGs. By measuring the height and width of the voltage markers in the images, it was possible to convert pixel-units to voltage-units and time-unit using conversion ratios of 10 mm/mV, and 25 mm/s.

Voltage markers for scale calibration

In the previous steps, ECG waveforms were extracted in pixel-units. These signals had to be converted to voltage-units and time-units before they could be analysed by ECG diagnostic algorithms. A novel approach was developed to achieve scale calibration using voltage markers printed on ECGs. Voltage markers had a fixed height of 10 mm on standard ECGs. By measuring the height of the voltage markers in the images, it was possible to convert pixel-units to voltage-units and time-unit using conversion ratios of 10 mm/mV, and 25 mm/s. To create a training set for the object detection model to locate the voltage markers, technicians manually located and labelled the markers on the ECGs (figure 2A). The same waveform localisation, and labelling model was trained additionally using these data to localise the voltage markers. After obtaining the voltage markers, their height was automatically measured using a template matching technique in OpenCV 4.1.0.25. Briefly, six kernels with size 3×3 pixels were used to represent six features of the voltage marker, including the four corners and the two blind-ended sides. Each kernel was convolved over the entire image. The location with the minimal sum of absolute differences was defined as the most likely location of the six representative features. The height of the voltage markers in pixel-units was measured accordingly (figure 3C). The height of each voltage marker was 10 mm on a printed ECG by default. By measuring the height and width of the voltage markers in the images, it was possible to convert pixel-units to voltage-units and time-unit using conversion ratios of 10 mm/mV, and 25 mm/s.

Testing of performance of ECG digitisation

943 scanned ECGs which were randomly selected from the untrained dataset, and all 444 photographed ECGs, were used to assess performance of automatic waveform localisation, and labelling (figure 4). All the ECGs selected for testing were not included in the training or internal validation. Each step of the ECG digitisation process, namely waveform localisation and labelling, gridline and background noise removal and voltage markers for scale calibration, was evaluated separately.

Figure 4Figure 4Figure 4

To generate a training dataset, waveforms from 6059 scanned ECGs were extracted using computer vision techniques. A web-based graphic user interface was used to manually review and edit the extracted waveforms to enhance accuracy. 8831 randomly selected scanned ECGs and 216 colour-photographed ECGs were reviewed by cardiologists who assigned one of two possible labels to the observed cardiac rhythm: sinus rhythm (SR) and atrial fibrillation (AF).

Direct diagnosis derivation from photos of ECGs

A proof-of-concept smartphone application was developed to demonstrate that the ECG waveforms extracted using DigitHeart could be used to directly derive a clinical diagnosis (online supplemental figure S1A and online supplemental video 1). As a photographed ECG has distorted geometry and is not perfectly rectangular, geometry transformation was performed using Canny edge detector, SmartCropper V.1.2.5, and Pillow V.8.2.0, to convert the photos to a rectangular shape with aspect ratio of A4 paper (210:297) before further processing (online supplemental figure S1B). Information related to patient’s identity was masked before the image was sent to server for analysis (online supplemental figure S1C,D,E). The digitalised ECG waveforms were fit to an AF classifier based on combined convolutional and recurrent neural network developed by our group.7 Details of neural network designed are summarised in online supplemental table S2. The accuracy of the diagnostic algorithm to diagnose AF was evaluated using a dataset with cardiac rhythm labelled by cardiologists as the gold standard.

Human resources utilisation

The number of human resources require to annotate data for creating the training set was quantified in man-hours. The average number of man-hours required to extract voltage-time signals from a scanned or photographed ECGs was calculated by dividing the number of ECGs manually annotated by the total number of man-hour involved.

Statistical analysis

Normality of variables were tested using Shapiro-Wilk test. Normal and discrete variables are presented as mean±SD and percentages, respectively. Sensitivity, specificity, positive predictive value, negative predictive value and F1 score were calculated for the AF classifier to assess their performance. F1 score is a metric commonly used for evaluating machine learning models. F1 score is the harmonic mean of precision and recall, with values ranging from 0 to 1, where 1 indicates perfect classification performance and 0 indicates worst possible performance. Softmax activation function was used in the output layer of our neural network for classifying AF versus sinus rhythm. A softmax activation function transforms a vector with K real numbers, which can be positive, negative or zero, into a vector with K real numbers whose sum is 1. The output vector can be interpreted as probability of the outcome classification. In cases with only two possible outcomes, the one with a probability >0.5 is considered the chosen outcome. A receiver operating characteristic curve was not generated, as a varying threshold of probability is unavailable. Fourfold cross-validation was performed for evaluating accuracy of the AF classifier. The method was chosen as it allows full utilisation of the full relatively small ECGs dataset with cardiac rhythm manually labelled by independent cardiologists. For each fold generated with random sampling, the dataset was divided into four equal parts, where three parts were used for training the AF classifier and the remaining part was used for testing its performance. This process was repeated 4 times, with each part being used as the test dataset once. Similar cross-validation was not used for assessing intermediate tasks, such as waveform localisation and labelling, as only 15% of all the available scanned ECGs were involved in the model training and a large number of remaining images were available to serve as internal validation set by random sampling.

ResultsData characteristics

10 945 patients with AF from Queen Mary Hospital, Hong Kong, of whom 5469 (50.0%) were male and 5476 (50.0%) were female, were included in this study. At the time of data acquisition, their mean age was 80.4±11.1 years. 40 516 grayscale scanned ECGs from 17 public hospitals and 444 colour-photographed ECGs from 2 public hospitals were included (figure 4).

Performance of ECG digitisation

Majority of the 943 scanned and 444 photographed ECGs were standard 12-lead ECGs, while some had fewer leads. A total of 13 258 scanned leads and 5743 photographed leads were evaluated. Among these, 12 828 of 13 258 (96.8%) and 5399 of 5743 (94.0%) waveforms, respectively were correctly localised, and labelled (figure 5). In terms of the waveform extractions with gridlines and background noise removal, successful gridlines and background noise removal occurred in 11 604 of 12 735 (91.1%) and 5062 of 5752 (88.0%) lead segments, respectively (figure 5). Regarding voltage marker scale calibration, 801 of 905 (88.5%) scanned ECGs had voltage marker heights correctly measured (figure 5).

Figure 5Figure 5Figure 5

943 randomly selected scanned ECGs and all 444 photographed ECGs were selected to assess the performance of fully automatic waveforms extraction. Among the selected scanned ECGs and photographed ECGs, 12 828 of 13 258 (96.8%) and 5399 of 5743 (94.0%) waveforms were correctly locate and labelled, 11 604 of 12 735 (91.1%) and 5062 of 5752 (88.0%) waveforms achieved successful grid and background noise removal, and 801 of 905 (88.5%) scanned ECGs had voltage marker height correctly measured.

Human resources utilisation

In the initial phase during which ECG waveforms were manually extracted by technicians aided by computer vision techniques, 6059 scanned ECGs were processed over 705.5 man-hours (0.12 man-hour per ECG). The fully autonomous DigitHeart system improved the efficiency of ECG digitisation. Waveforms from 40 516 scanned and 444 photographed ECGs were extracted automatically, which saved about 4768 man-hours (figure 6).

Figure 6Figure 6Figure 6

In the initial phase during which ECG waveforms were manually extracted by technicians with 0.12 man-hour per ECG. The fully autonomous DigitHeart system improved the efficiency of ECG digitisation. Waveforms from 40 516 scanned and 444 photographed ECGs were extracted with full automation and saved about 4768 man-hours. Extrapolating from these data, using DigitHeart to automatically extract waveforms from 1 million ECG would theoretically save researchers 120 000 man-hours.

Direct diagnosis derivation from photos of ECGs

8831 randomly selected scanned ECGs and 216 colour-photographed ECGs were reviewed by cardiologists who assigned one of two possible labels to the observed cardiac rhythm: sinus rhythm and AF. Diagnosis of AF was established if there were no discernible repeating P waves or irregular interbeat intervals. 5186 (58.7%) and 3645 (41.3%) scanned ECGs were labelled as AF and sinus rhythm, respectively. 80 (37%) and 136 (63%) photographed ECGs were labelled as AF and sinus rhythm, respectively (online supplemental table S3).

An AF classifier, which combined both convolutional and recurrent neural networks based on a previous method from our group, was used to analyse the selected ECGs7 (online supplemental table S2). The AF classifier was able to correctly classify 4807 and 3561 ECGs as AF and sinus rhythm, respectively. It achieved 91.3% sensitivity, 94.2% specificity, 95.6% positive predictive value, 88.6% negative predictive value and 93.4% F1 score using images as input (online supplemental table S4).

Discussion

While human interpretation of ECGs remains the gold standard for diagnosis, increasingly accurate diagnostic algorithms for 12-lead ECGs have been developed with artificial intelligence. These algorithms can identify a wide range of cardiac conditions and predict clinical outcomes.4 5 7–10 12–15 However, it is challenging to deploy these systems on a global scale because most ECG machines currently in use do not enable installation of new diagnostic algorithms. Replacing all ECG machines with newer generation machines with machine learning-based diagnostic algorithms is also impractical from a resource perspective. A possibly more cost-effective solution to enable wide adoption of these newer generation ECG diagnostic algorithms can be achieved using smartphones as ECG interpreters. A system that enables automatic extraction of ECG waveforms from photos, conversion to voltage-time format and analysis using machine learning-based diagnostic algorithms is required. The present study demonstrated feasibility of automatically extracting ECG waveforms from photos using custom-designed machine learning models to perform waveform identification, gridline removal and scale calibration, and achieving high accuracy. The advantage of this system lies in its simplicity, which is beneficial to the clinicians. All that is required for the clinician is a smartphone with a camera, our proprietary application and an internet connectivity. In the backend, it requires a server with a graphic processing unit that is powerful enough to run a machine learning model.

Object detection and image segmentation techniques for waveform extraction

In the past, specialised software was developed using traditional computer vision techniques to extract ECG waveforms from scanned ECGs. Methods such as Otsu’s algorithm and other thresholding methods were able to achieve grid and background noise removal with modest efficacy,18–22 24 while Hough transform was used to localise lead segments.24 Nonetheless, these methods required a high degree of manual manipulation, and it was not possible to extract ECG waveforms in a fully automatic manner since ECGs generated from different institutions had different printing formats and scanning or quality. The key novelty in this reported work is to illustrate the feasibility of using object detection and image segmentation techniques based on artificial neural networks for achieving fully automatic waveform extraction from photos of ECGs taken by a smartphone. To the best of our knowledge, this is the first report of an artificial neural network being used to handle intermediate steps of ECG waveform extraction, including waveform localisation, and labelling; gridline and background noise removal and scale calibration using voltage markers on ECGs. As the overall capabilities of machine learning systems improve, the performance of ECG waveform extraction from photos of ECGs is also expected to improve significantly over time.

Signal-based versus image-based approaches

DigitHeart takes an image as input and outputs a waveform signal in voltage-time format, which can then be fed into an ECG diagnostic algorithm. Another approach that has been adopted by researchers is to directly use ECG images and diagnosis as paired data for training diagnostic models, without extracting ECG waveforms in voltage-time format.25 DigitHeart did not adopt such image-based approach and focused on signal-based approach because it will enable users to use a wider variety of ECG classification algorithms. There are significantly more ECG classification algorithms trained using voltage-time signal-based approach than those trained using image-based approach, as signal-based data are much more widely available to researchers for training of machine learning algorithms. For instance, AF classifier is only available in convolutional neural network in image-based approaches, while newer generation of neural networks, such as transformer neural networks, are already widely used in voltage-time signal-based models and potentially achieving even higher accuracy than our reported combined convolutional and recurrent neural network design.26 27

Cost saving implication

From a clinical standpoint, DigitHeart and similar system may potentially allow institutions to transition to machine learning-based ECG interpretation without purchasing additional hardware, instead of completely relying on clinicians for assessment. It was estimated that the cost of interpretating each 12-lead ECG test by a general practitioner in the UK was approximately £2.28.28 With over 300 million ECGs performed annually worldwide, it was proposed that the adoption of automatic ECG interpretation could reduce healthcare expenses by £684 million.29 From an academic perspective, DigitHeart allows researchers to convert large trove of historical printed ECGs to digital version for downstream use such as training of machine learning-based diagnostic algorithms. As demonstrated in our study, a fully automatic system for extracting voltage-time data results in a drastic reduction in man-hour required, particularly when the number of ECGs is huge (figure 6). It is noteworthy that the speed of automatically processing ECGs depends on several factors, including hardware configuration such as the extent of parallel processing available and achievable inference speed, as well as the complexity of computer vision and artificial neural network.

Limitations

First, although our system was designed to suit a wide range of ECG formats, the training and validation set was limited to the ECG formats used in Hong Kong public hospitals. With more ECG formats trained by our model, the generalisability of our system can be further enhanced. Second, our AF classifier served only a proof-of-concept purpose to illustrate the possibility of deriving cardiac rhythm from a smartphone-acquired ECG photograph. It is necessary to further develop or incorporate other disease classifiers into the system to realise its full potential. Third, proprietary smartphone application is required for the use of the algorithm.

Keys to widespread adoption

To enable widespread adoption of machine learning-enabled ECG diagnostics, several key steps need to be taken. Prospective studies for validating accuracy of DigitiHeart and similar platforms for obtaining a wider range of diagnosis such as myocardial infarction and other conduction disorders need to be performed. In addition, head-to-head comparison between using waveform signal extracted directly from machine and those extracted from photos using a computer vision approach for obtaining ECG diagnosis should be performed to determine the relative performance of the latter approach. On the user interface front, significant investments in application design and optimisation are critical for ensuring functionality, while also addressing privacy issues associated with the handling of sensitive medical data in light of stringent data protection regulations. Furthermore, building regulatory and public trust presents a formidable challenge, as the acceptance of machine learning-generated interpretations as conclusive diagnoses is not yet widespread across many nations. Finally, to facilitate worldwide adoption, it is crucial to expand the app’s linguistic capabilities beyond just English.

留言 (0)

沒有登入
gif