Deep-learning-based prediction of glaucoma conversion in normotensive glaucoma suspects

Introduction

Glaucoma suspect (GS) is the status of a person for whom clinical findings or a combination of clinical findings and related risk factors indicate an increased likelihood of developing glaucoma.1 Among all GS individuals, eyes showing possible or suspected early-glaucomatous optic nerve head (ONH) features are particularly challenging for clinicians. Unlike the abundance of longitudinal data on ocular hypertensive GS cases,2–6 there is little information on cases of suspicious-looking ONH with normal-range intraocular pressure (IOP). Because microvascular abnormality and vascular imbalance have been suggested as risk factors for glaucoma, especially in cases of lower-baseline IOP,7 various clinical factors need to be taken into account in order to determine the best management approaches for such patients.

Recent advances in artificial intelligence (AI), especially in the forms of deep learning (DL) models, have inspired researchers to develop algorithms for diagnosis of glaucoma and detection of its progression.8 9 As for prediction of functional glaucomatous progression, a significant number of unsupervised and supervised models such as Random Forest, Bayesian techniques and recurrent neural networks have been tested, and promising results then have been reported.10–13 However, none of the previous studies had developed an algorithm for identification of GS patients who are at higher risk of progression to perimetric glaucoma. Predicting patients for whom there is a greater possibility of visual field (VF) defect could enable better risk stratification and IOP-lowering therapy application in efforts to preserve visual functional and, thus, quality of life.14 15

In actual clinical practice, diagnosis and consequent treatment decisions are made based on various test results, risk factors and concomitant diseases. However, relatively few studies have attempted to employ clinical data in the training of AI models. In the present study, baseline features that were extracted from both fundus images and comprehensive clinical datasets pertaining to a longitudinal cohort that had been followed for longer than 7 years were used as the input variables. Then, the ability of AI algorithms to predict normal-tension glaucoma (NTG) conversion in normotensive GS patients were compared. Additionally, for those normotensive GS patients who developed NTG, time-to-conversion and associated factors were determined.

MethodsStudy subjects and data collection

Clinical and in-office testing data were obtained from the Clinical Data Warehouse (CDW) of Seoul National University Hospital Patients Research Environment and represent the routine clinical care of the patients involved.

The patients’ specific inclusion criteria were as follows: (1) diagnosed as GS, (2) followed up every 6–12 months for a minimum of 7 years, (3) all IOP measurements below 21 mm Hg during the entire follow-up and (4) without IOP-lowering treatment unless a VF defect had been identified. Diagnosis for GS required identification of at least one of the following features16: vertical cup-to-disc ratio (vCDR) ≥0.6; difference in vCDR between two eyes ≥0.2; presence of glaucomatous optic nerve damage (eg, diffuse and/or localised notching, thinning) without retinal nerve fibre layer (RNFL) defect as visible on red-free RNFL images. The vertical disc diameter measurements for assessment of vCDR excluded areas of peripapillary atrophy as well as the Elschnig scleral ring. The cup’s vertical diameter was measured as the vertical distance between the maximal centrifugal extension points at 11–1 o’clock and 5–7 o’clock.17 Open angle was confirmed by gonioscopy, and normal VF results in standard automated perimetry (SAP, Humphrey 30–2 SITA-standard; Carl Zeiss Meditec, Dublin, California, USA) at two consecutive reliable examinations. Patients were excluded for any of the following reasons at any time during the entire follow-up period: spherical equivalent (SE) more than±6 dioptres; stigma of conditions that could result in temporary or intermittent IOP elevation, such as uveitis or pigment dispersion, or any other diseases possibly affecting VF examination results.

The following data were collected for analysis at the initial visit: baseline IOP by Goldmann applanation tonometry (Haag-Streit, Koniz, Switzerland); refraction (KR-890; Topcon, Tokyo, Japan); RNFL thickness by Cirrus high-definition spectral domain-optical coherence tomography (SD-OCT) (Carl Zeiss Meditec); central corneal thickness (CCT, Orbscan 73 II, Bausch and Lomb Surgical, Rochester, New York, USA), axial length (AXIS-II ultrasonic biometer; Quantel Medical SA, Bozeman, Missouri, USA), ocular and medical disease history, family history of glaucoma, systolic and diastolic blood pressure, and height and weight data for calculation of body mass index (BMI). Patients with missing data in any of these columns were excluded from further analysis.

Retinal-imaging data preparation

Digital colour stereo optic disc photography (ODP, CF‐60UVi/D60; Canon, Tokyo, Japan) and red-free RNFL photography (TRC-50IX; Topcon) were obtained after pupil dilation. The images were saved in the 448×448-pixel digital imaging and communications in medicine format and stored in the picture archiving communication system of Seoul National University Hospital.

Determination of conversion to perimetric NTG

The SAP data from all visits for all patients were assessed independently by two glaucoma specialists (AH/YKK) in a masked fashion (ie, without knowledge of any clinical information). Glaucomatous VF defect was defined as (1) glaucoma hemifield test values outside the normal limits or (2) three or more abnormal contiguous points with a probability of p<0.05 on a pattern deviation plot, of which at least one point has a probability of p<0.01, or (3) a pattern SD of p<0.05. A confirmed VF defect required abnormal results showing damage in the same test locations on two consecutive reliable tests (fixation loss rate ≤20%, false-positive and false-negative error rates ≤25%). Confirmation that the VF defect could be attributed to glaucomatous damage without the possibility of artifact-caused VF abnormality was based on a masked clinical-chart review by a third examiner (KHP).

Design of overall system

The overall prediction system consisted of the three steps shown in figure 1. In the first step, the features of two fundus images (ODP and red-free RNFL photography) were extracted by convolutional auto encoder (CAE). In the second step, the extracted features from images as well as clinical features were fed into machine-learning classifiers for prediction of whether or not a patient would show conversion to NTG. The following clinical parameters (total 15) obtained at the initial visit had been entered as clinical features for model training: age, sex, laterality, IOP, SE, CCT, axial length, average circumpapillary RNFL (cpRNFL) thickness, presence of diabetes mellitus, family history of glaucoma, systolic blood pressure, diastolic blood pressure, height, weight and BMI. In the third and final step, time to NTG conversion was predicted using a regressor. The detailed architecture of the CAE network is described in online-only text in online supplemental eFigures 1–3.

Figure 1Figure 1Figure 1

Overall design of deep learning network. Colour stereo optic disc photography (ODP) along with red-free retinal nerve fibre layer photography features were extracted by the convolutional auto encoder (left), and the features thus extracted were fed into machine-learning classifiers to identify normal-tension glaucoma (NTG) conversion eyes (middle) and predict time to NTG conversion (right). AL, axial length; BMI, body mass index; BP, blood pressure; CCT, central corneal thickness; DM, diabetes mellitus; IOP, intraocular pressure; SE, spherical equivalent.

Prediction of conversion to NTG

A total of 40 features from 2 input images were extracted from the latent vector of the CAE, with 15 features from the clinical information. Thus, a total of 55 features were used to predict conversion to NTG. Prediction was performed using XGBoost,18 Random Forest19 and Gradient Boosting classifiers20 with 5 feature combinations: both fundus images and clinical features (total: 55 features), ODP and red-free RNFL photography (40 features), ODP and clinical features (35 features), red-free RNFL photography with clinical features (35 features) and only clinical features (15 features). Fivefold cross-validation with random and grid search methods was used for hyperparameter optimisation only on the training data, not on the test set (see online-only text in online supplemental file 1). To calculate the CI, a bootstrapping method was used.

Prediction of time to NTG conversion

In order to predict time to NTG conversion, regression was performed using the three classifiers with five feature combinations, as described above. As in the prediction of conversion to NTG, fivefold cross-validation with random and grid search methods was used for hyperparameter optimisation only on the training data, not on the test set. The hyperparameter was optimised using the root mean squared log error, and the Bootstrapping method was used to calculate the CI.

Embedded ImageEmbedded Image

(p: prediction, a: ground truth)

Outcome metrics

Specificity and sensitivity values were calculated with a randomly selected held-out test set. The area under the receiver operating characteristic curve (AUC) generated on the same test set was used to compare the performances of the individual models. To compare the AUCs among the different algorithms, DeLong’s test was used. Additionally, in order to identify key risk factors for conversion to NTG, top-ranked features were selected based on the feature importance scores assigned to the variables in each model. All of the data processing and analysis was implemented in Python V.3.9 and Scikit-learn V.0.24.0.21

ResultsDemographic and clinical characteristics of study population

Datasets on 12 458 patients diagnosed with GS were reviewed for the purposes of the present study. After the inclusion and exclusion criteria were applied, we identified 105 eyes showing conversion to NTG during the follow-up period. Since imbalanced classes would lead algorithms to skew toward the majority, we performed under-sampling in the dataset for eyes that did not show conversion to NTG.22 That is, patients who met the inclusion criteria and did not show NTG conversion for a period of over 7 years were consecutively included until a balance was reached between the classes. Although the first 105 non-conversion patients were selected by sampling time, the probability of sampling bias was low, since the selection of patients with a GS diagnosis in the CDW was performed randomly. Finally, a total of 210 eyes of 210 patients representing 1334 person-years made up the final datasets. Among the 210 eyes, 70 were set aside for the test set (online supplemental eFigure 4).

The mean age of the included patients was 55.8±9.5 (range: 33–76) years, and the mean baseline IOP was 14.8±2.9 (range: 7–20) mm Hg. Further characteristics of the study population are available in online supplemental eTable 1. As one of the main purposes of this study was to identify clinical factors (including demographics) influencing disease progression, rather than examining the impact of a specific factor, we performed further analysis without demographic matching.

Prediction of conversion to NTG

For the test dataset, the performance of each designed network for the feature combinations showing the best performance is indicated in table 1. XGBoost trained with both fundus images and clinical features showed the highest performance: the AUC and accuracy were 0.994 (95% CI 0.984 to 1.000) and 97.14% (95% CI 88.11% to 98.57%), respectively. Prediction performed using Random Forest showed the best results with both fundus images and clinical data: the AUC and accuracy were 0.987 (95% CI 0.978 to 1.000) and 95.80% (95% CI 84.61% to 97.71%), respectively. The Gradient Boosting algorithm performed best when using only clinical data in the training phase: the AUC and accuracy were 0.988 (95% CI 0.969 to 0.997) and 91.43% (95% CI 82.03% to 96.14%), respectively. The AUCs among the three algorithms were not statistically different (Ps>0.05). The performances of each model with different feature combinations were shown in online supplemental eTable 2. The results of the analysis, including cases where the exclusion criteria were applied only to the initial data, are in online supplemental eTable 3.

Table 1

Performance of each model for prediction of conversion to NTG

Prediction of time to NTG conversion

The performance of each designed network for the feature combinations showing the best performance on the test dataset is indicated in table 2. XGBoost trained with both fundus images and clinical data showed the least MSE, 2.24. Figure 2 visually represents the results of time-to-NTG-conversion predictions with the XGBoost classifier and the ground truth. Among the patients who did not develop NTG during the entire follow-up period, two cases misclassified as NTG conversion were included in the analysis of time-to-conversion prediction. In these cases (patient numbers 33 and 34), the predicted values were above 5 years. Online supplemental eTable 4 shows the MSE time-to-conversion values predicted at the baseline using each feature for each classifier.

Table 2

Performance of each model for prediction of time to NTG conversion

Figure 2Figure 2Figure 2

Regression results for prediction of time to NTG conversion. The red line represents the conversion year values as predicted from the best model (XGboost trained with ODP, red-free RNFL photography and clinical features); the black line represents the ground truth values in the longitudinal follow-up data. NTG, normal-tension glaucoma; RNFL, retinal nerve fibre layer.

Feature importance

The feature importance in each network for prediction of time to NTG conversion was extracted using the Scikit-learn library ‘feature_importances_’ attributes in each classifier and regressor (table 3). Among the 15 clinical features, baseline IOP, diastolic blood pressure and average cpRNFL thickness were identified, in both the XGBoost and Gradient Boosting models, as the top three important features. In the Random Forest model, meanwhile, diastolic blood pressure, average cpRNFL thickness and CCT were the top three features. The methodology and functions for generating feature importance are described in the online-only text in online supplemental file 1.

Table 3

Feature importance in deep-leaning models for prediction of time to NTG conversion

Discussion

In the present study, the performances of DL classifiers in predicting conversion to NTG in normotensive GS patients were evaluated. Also, feature importance to searches of factors potentially impacting on disease progression was calculated. All three of the DL models showed acceptable accuracy and AUC values in predicting NTG progression using baseline clinical measurements as well as fundus images.

Having an architecture capable of incorporating multiple data sources is key to effectively combining information used in actual clinical practice, since clinicians’ determination of disease progression is not based on a single examination result. As regards AI strategies for detection of glaucoma deterioration, however, only relatively few studies have attempted to incorporate clinical data into model training. Dixit et al, employing a convolutional long short-term memory neural network, compared distinct networks’ performances on two data sources: VFs alone and VFs as supplemented by basic clinical data (ie, IOP, CCT and CDR). Not surprisingly, the clinical-data-supplemented VF results improved the model’s utility in identifying glaucoma progression.23 Lee et al, after analysing young myopic patients’ NTG progression, noted that an extratrees model trained with both demographic and clinical features outperformed all of the test results (eg, baseline OCT and VF parameters).11 In the present study, for prediction of NTG conversion in GS patients, we constructed DL models that incorporated structural inputs along with wide-ranging clinical data. This approach, we believe, better reflects clinicians’ real-world decision-making in clinical practice.

Unlike previous studies that have emphasised diagnosis, the models proposed herein predict disease progression in GS patients. These models showed consistent performance in predicting both glaucoma conversion and time-to-conversion. Specifically, we focused on normal-range-IOP GS patients to identify NTG-conversion-related risk. The approaches and features that are described and explained herein may prove useful as clinical tools, particularly given how important early identification of progression is in cases of GS. Prediction of disease course on an individual-patient basis would help clinicians to present tailored management options to patients with regard to issues such as follow-up duration, starting (or not) of IOP-lowering treatment, and targeting of IOP levels.

As for feature importance, IOP and CCT were identified as important clinical features in our DL models. Earlier longitudinal and population-based studies on risk factors for development of open-angle glaucoma (OAG) in normal individuals have consistently reported both higher IOP and thinner CCT to be significant factors.24–27 Notably too, the present study identified diastolic blood pressure as an important feature in all of the three DL models. Blood pressure, in fact, has been suggested as an important and potentially modifiable risk factor in OAG.28 29 Especially in NTG, vascular factors have been posited as having a significant role in disease development.30 Low diastolic perfusion pressure coupled with low diastolic blood pressure could reduce ONH blood flow below the critical level, resulting in ischaemia as well as predisposition to glaucomatous damage.31–33

Whereas patient baseline age was not recognised as a primary factor in the feature importance analysis, our findings showed that the age of the group showing NTG conversion was significantly lower than that of the non-conversion group. Although the exact cause of this finding is unknown, it is possible that patients with a lower baseline age tended to follow-up longer, and thus more conversions were detected. Further research on how glaucoma conversion risk varies according to the age of GS patients with normal IOP is needed.

The present study has limitations that must be considered when interpreting its findings. First, our meaningful training results were in fact based on relatively little data, though DL is generally known to require a large dataset for training. We included only normotensive GS patients who had not undergone any glaucoma treatment over the course of a follow-up period of at least 7 years, in order to effect a more clear demonstration of the DL models’ prediction performances for both NTG conversion and potential risk factors. Also, for bias reduction and efficient training and testing of the DL models, the ratio of patients showing NTG conversion was set to be comparable between the training and test sets. The current results, thus, demonstrate only that the built model works well for a limited range of patients. Further studies with larger datasets will validate the generalisability of our algorithm in real-world settings. Second, this study rigorously employed comprehensive selection and exclusion criteria throughout the entire follow-up period. Although our sensitivity analysis, using exclusion criteria solely on initial data, yielded consistent results, it is essential to acknowledge the potential for bias in our findings. The third limitation of this study is that glaucoma progression was not assessed based on structural changes. Identifying progression from GS to glaucoma based on structural changes can be subjective, even among glaucoma specialists. Thus, we sought to evaluate the progression of glaucoma in a more specific and rigorous manner by avoiding any preconceptions in assessing VF results in light of other clinical factors. The clear challenge, then, is to define progression based on not only VF results but also structural changes and to determine whether worsening in both is integral to that definition. Fourth, our results might not be generalisable to other populations, given that all of the data were collected from a single site (an academic medical centre highly specialised for glaucoma care) that probably differs significantly from other types of practice.

In conclusion, our results suggest that DL models that have been trained on both ocular images and clinical data have a potential to predict disease progression in GS patients. We believe that with additional training and testing on a larger dataset, our DL models can be made even better, and that with such models, clinicians would be better equipped to predict individual GS patients’ respective disease courses.

留言 (0)

沒有登入
gif