Predictors of limited early response to anti-vascular endothelial growth factor therapy in neovascular age-related macular degeneration with machine learning feature importance
Scott W Perkins1, Anna K Wu2, Rishi P Singh3
1 Cleveland Clinic Lerner College of Medicine, Cleveland, Ohio, USA
2 Case Western Reserve University School of Medicine; Center for Ophthalmic Bioinformatics, Cole Eye Institute, Cleveland Clinic, Cleveland, Ohio, USA
3 Center for Ophthalmic Bioinformatics, Cole Eye Institute, Cleveland Clinic, Cleveland, Ohio, USA
Correspondence Address:
Dr. Rishi P Singh
9500 Euclid Avenue, Desk I32, Cole Eye Institute, Cleveland Clinic, Cleveland, Ohio 44195
USA
Source of Support: None, Conflict of Interest: None
DOI: 10.4103/sjopt.sjopt_73_22
PURPOSE: Patients with neovascular age-related macular degeneration (nAMD) have varying responses to anti-vascular endothelial growth factor injections. Limited early response (LER) after three monthly loading doses is associated with poor long-term vision outcomes. This study predicts LER in nAMD and uses feature importance analysis to explain how baseline variables influence predicted LER risk.
METHODS: Baseline age, best visual acuity (BVA), central subfield thickness (CST), and baseline and 3 months intraretinal fluid (IRF) and subretinal fluid (SRF) for 286 eyes were collected in a retrospective clinical chart review. At month 3, LER was defined as the presence of fluid, while early response (ER) was the absence thereof. Decision tree classification and feature importance methods determined the influence of baseline age, BVA, CST, IRF, and SRF, on predicted LER risk.
RESULTS: One hundred and sixty-seven eyes were LERs and 119 were ERs. The algorithm achieved area under the curve = 0.66 in predicting LER. Baseline SRF was most important for predicting LER while age, BVA, CST, and IRF were somewhat less important. Nonlinear trends were observed between baseline variables and predicted LER risk. Zones of increased predicted LER risk were identified, including age <74 years, and CST <290 or >350 μm, IRF >750 nL, and SRF >150 nL.
CONCLUSION: These findings explain baseline variable importance for predicting LER and show SRF to be the most important. The nonlinear impact of baseline variables on predicted risk is shown, increasing understanding of LER and aiding clinicians in assessing personalized LER risk.
Keywords: Anti-vascular endothelial growth factor, machine learning, neovascular age-related macular degeneration
Age-related macular degeneration (AMD) is a common progressive retinal disease-causing irreversible vision loss.[1] An end-stage form of AMD is neovascular age-related macular degeneration (nAMD), characterized by retinal hypoxia, choroidal neovascularization and vascular permeability driven by vascular endothelial growth factor (VEGF). These factors lead to the accumulation of intraretinal fluid (IRF) and subretinal fluid (SRF), causing subsequent visual impairment. While nAMD accounts for approximately 10% of AMD cases, it causes approximately 90% of cases of legal blindness associated with this condition.[2]
Anti-VEGF injections are first-line therapy for nAMD, typically initiated with a loading dose of three monthly injections.[3],[4],[5],[6] However, response to anti-VEGF varies, as limited early response (LER), defined as early residual fluid presence after the loading period, has been associated with poor long-term visual gains compared to those without fluid.[7],[8] It is difficult for clinicians to predict which patients will respond well to treatment, so increasing understanding of how to predict LER would be clinically useful.
Machine learning (ML) methods have become increasingly prevalent in medicine and ophthalmology due to their ability to use clinical data to classify disease states and predict prognosis, but many methods are not easily understandable due to their complexity.[9],[10],[11] A recent study by Ajana et al. predicted progression to advanced AMD using baseline genetic, clinical, lifestyle data, and ML techniques.[12] This study aims to predict LER in nAMD with ML and uses feature importance analysis to explain how the algorithm functions. This promises to increase understanding of how baseline variables influence predicted LER risk and aid clinicians in personalized assessment of nAMD prognosis.
MethodsDesign and participants
Approval was received from the Institutional Review Board for a retrospective non-randomized cohort study. The study was performed in accordance with good clinical practice (International Conference on Harmonization of Technical Requirements of Pharmaceuticals for Human Use [ICH] E6) and the Health Insurance Portability and Accountability Act. As an anonymized retrospective study, informed consent was not required.
Individuals older than 18 years with a documented diagnosis of nAMD between January 1, 2012, and March 1, 2018, were queried from the electronic medical record. Inclusion criteria were: Patients who had their first anti-VEGF injection at the institution without prior nAMD treatment, follow-up for at least 1 year after the first injection, and available OCT data every 3 months from baseline to 1 year. Exclusion criteria were: Concurrent maculopathies, unreadable OCT scans at any time point, or absence of fluid at baseline. In cases of bilateral nAMD, the first treated eye was included.
Collected variables
Baseline age, central subfield thickness (CST), and best visual acuity (BVA) were collected from patient charts. BVA was converted from Snellen units to Early Treatment Diabetic Retinopathy Study (ETDRS) letters using the formula ETDRS = 85 + 50 (log10(Snellen)). OCTs taken at baseline and 3 months by Cirrus High-Definition Spectral Domain OCT (V.9.5.1, Carl Zeiss Meditech, Dublin CA) were analyzed by Notal OCT Analyzer (Notal Vision Ltd., Tel Aviv, Israel), a validated ML algorithm that automatically quantifies IRF and SRF in nAMD.[13],[14] LER was defined as the presence of IRF and/or SRF at month 3 whereas early response (ER) was defined as the absence of fluid. This was in concordance with previous studies and in light of previous studies which correlated early residual fluid with poor long-term BVA outcomes.[7],[8]
Development of machine learning algorithm
To predict LER status from baseline age, BVA, CST, IRF, and SRF, baseline variables were min-max scaled (0-1), and LER status at month 3 was one-hot encoded. An extreme gradient boosted decision trees (XGBoost) model was developed and evaluated using Pandas 1.1.3, Numpy 1.19.2, Sklearn 1.0.1, Matplotlib 3.3.2, XGBoost 1.3.3, Seaborn 0.11.0, Shap 0.40.0, and Python 3.8.5. Class weights were balanced to correct for slight imbalance in the data. Predictive performance was measured with 10-fold cross-validation.
Feature importance analysis
Kernel density estimation calculated density plots of baseline variables for the entire cohort as well as subgroups of LERs and ERs. Feature importance values from the XGBoost model output were assessed, including mean decrease in Gini impurity (MDI), gain, coverage, and weight feature importance. Shapley additive explanation (SHAP) values were calculated for baseline variables of each sample.
ResultsMachine learning predicts month 3 limited early response from baseline
One hundred and sixty-seven eyes were true LERs and 119 were ERs, showing only a mild class imbalance correctable by class weight balancing in the XGBoost model. The XGBoost model predicted LER status from baseline variables with area under the receiver operating characteristic curve (AUC) =0.66, accuracy = 0.62, precision = 0.68, recall = 0.65, and F1 = 0.66 [Figure 1]a and [Table 1]. AUC varied from 0.49 to 0.86 for the 10 cross-validation splits [Figure 1]a.
Figure 1: Predictive accuracy of XGBoost conferred by divergence in LER versus ER variable distributions. (a) 10-fold cross-validated ROC curve; (b-f) Baseline kernel estimated density functions of LER and ER subgroups. LER: Limited early response, ER: Early response, ROC: Receiver operating characteristicTable 1: Performance metrics of XGBoost prediction of limited early responseRegions of similarity and divergence between baseline density plots of limited early responses and early responses
The estimated true distributions of baseline variables given infinite random sampling are visualized by kernel density estimation [Figure 1]b, [Figure 1]c, [Figure 1]d, [Figure 1]e, [Figure 1]f and [Figure 2]. The age distribution of LERs was slightly left-shifted compared to ERs, with LERs having greater density below 85 years and ERs having greater density above 85 years [Figure 1]b. The LER and ER distributions of BVA had similar left-skewed shapes, while ERs had greater density from 55 to 75 ETDRS letters and LERs had greater density from 15 to 55 ETDRS letters [Figure 1]c. The CST distributions of LERs and ERs had similar shapes with slight right skew, while ERs had slightly greater density from 250 to 375 μm and LERs had slightly greater density for the ranges 375–475 and above 520 μm [Figure 1]d. The LER and ER distributions of IRF both had greatest density close to zero nL, although the ER distribution had greater density close to zero than LERs, and LERs had greater density above 1450 nL [Figure 1]e. Similar to IRF, SRF distributions for LERs and ERs had greatest density near 0 nL, ERs had greater density near zero, and LERs had greater density for the ranges 250–1500 nL and greater than 2000 nL [Figure 1]f.
Figure 2: Baseline variable distributions and kernel estimation of probability density functions of entire cohort (a), LER subgroup (b), and ER subgroup (c). LER: Limited early response, ER: Early responseFeature importance of baseline variables for limited early response prediction
Feature importance metrics of the XGBoost models show the roles played by baseline variables in the model's performance [Figure 3] and [Table 2]. MDI measures how much a variable increases the model's predictive ability overall. SRF had a MDI of 0.25, the largest of any baseline variable [Figure 3]a and [Table 2]. While SRF had the greatest MDI, IRF had the least MDI [Figure 3]a and [Table 2].
Figure 3: Feature importance of baseline variables in XGBoost prediction of LER by mean decrease in Gini impurity (a), information gain (b), coverage (c), and weight (d). LER: Limited early responseInformation gain describes how much an individual decision improves the model's predictive power, and gain feature importance is the average gain of all decisions which use a given variable.[15] Similarly, coverage is defined as the total number of samples affected by a decision in the model. Coverage feature importance is the average coverage of all decisions involving a given variable.
Besides having the greatest MDI, SRF also had the greatest gain feature importance and coverage feature importance [Figure 3]b, [Figure 3]c and [Table 2]. Both MDI and gain feature importance decreased consistently in the following order: SRF, Age, BVA, CST, and IRF [Figure 3]d and [Table 2]. SRF, CST, and IRF had similar coverage feature importance values (40.67, 40.00, and 39.76, respectively) and BVA had the least coverage importance [Figure 3]c and [Table 2].
Weight feature importance, defined as the number of decisions in the XGBoost model which involve a given variable, can indicate relationship complexity as complex relationships can require more decisions to model [Figure 3]d. Similar to coverage importance, BVA has the lowest weight feature importance – 42 units lower than the next feature – and a relatively simple relationship with predicted LER risk where BVA <20 ETDRS letters decreased LER risk and BVA >20 had only a marginal impact on LER risk [Figure 3]d and [Figure 4]c. The high weight importance of age likely reflects fluctuations in SHAP values from 74 to 86 years which may be artifacts of this dataset [Figure 3]d and [Figure 4]b.
Figure 4: Shapley additive explanations of feature impact on predicted LER risk. LER: Limited early responseNonlinear impact of baseline variables on predicted limited early response risk
SHAP values showed that baseline variables influenced predicted LER risk with nonlinear relationships. A positive SHAP value indicates an increase in predicted LER risk for a given baseline feature value, while a negative SHAP value indicates decreased risk [Figure 4]. SRF had the greatest positive impact on risk, while age had the greatest negative impact on risk, as evidenced by the highest and lowest SHAP values, respectively [Figure 4]a.
CST, IRF, and BVA influenced risk both positively and negatively, and to a lesser extent than SRF and age [Figure 4]a. Overall, SHAP values suggest that age consistently increases predicted LER risk below 74 years, and predicted LER risk as a result of age decreases as age increases from 74 to 90 years [Figure 4]b. Above 90 years, risk is consistently decreased by age [Figure 4]b. Baseline BVA <20 ETDRS letters decreased predicted LER risk, and there was only a marginal impact on risk above 20 letters [Figure 4]c. Baseline CST tended to increase risk below 290 or above 350 μm and decreased risk for the range 290–350 μm [Figure 4]d. Baseline IRF had no clear trend for 0-100 nL, decreased risk for 100–600 nL, and increased risk above 750 nL [Figure 4]e. Having SRF close to zero decreased predicted LER risk, but risk increased for SRF >150 nL [Figure 4]f.
DiscussionMany previous ML models predicting ophthalmic outcomes have been difficult to interpret.[16],[17] This study is novel in using feature importance and SHAPs to understand predictors of ophthalmic disease. In addition, previous studies of clinical and anatomic predictors of nAMD prognosis have not investigated the possibility of nonlinear relationships between variables.[18],[19],[20] Therefore, the relationships between baseline variables and predicted LER risk reported in this study are novel.
The model's AUC of 0.66 showed that the study variables contribute to LER prediction, but that other variables are involved as well [Figure 1]a. Previous ML studies predicting outcomes in ophthalmic diseases have achieved AUC ranging from 0.68 to 0.79,[10],[11],[21] while higher performance has been achieved with large datasets and/or deep learning approaches.[7],[12],[16] The present study is remarkable in achieving moderate performance with only a limited number of baseline features.
The high gain importance of SRF shows that decisions involving SRF on average contribute to the model's predictive power than decisions involving other variables [Figure 3]b. Since SRF also has the highest coverage importance, decisions involving SRF affect the classification of more samples on average than decisions involving other variables [Figure 3]c. As a whole, feature importance values indicate that SRF contributes most to the prediction of LER risk, while age, BVA, CST, and IRF contribute to a slightly lesser degree [Figure 3]. The relative importance of baseline variables in LER has not been previously assessed in the literature. In addition, previous studies have solely used Gini impurity as a feature importance metric, while this study uses multiple measures to gain a nuanced understanding of feature importance in LER prediction.[9],[11]
The lack of decreased predicted LER risk at high BVA is similar to a large nAMD study in which patients with high baseline BVA were at greater risk for vision loss, although this study did not consider a threshold above which risk of a robust response was less [Figure 4]c.[9],[22] Furthermore, another study found no significant difference between mean baseline BVA of LERs and ERs, but did not investigate whether a nonlinear relationship between BVA and LER risk was present in the data.[7] Similarly, a complementary study and forthcoming manuscript using the same patient cohort and data as this study did not find a significant difference between change in BVA from baseline to 3 months for LER compared to ER patients but also did not investigate the possibility of a nonlinear relationship or interactions with other variables influencing LER risk.[23] Therefore, these results increase the understanding of potential relationships between BVA and predicted LER risk.
The observed trend of decreased predicted LER risk with increased age contrasts with the previously discovered positive correlation between age and poor BVA outcomes, although research has shown that BVA and fluid outcomes are not always linked, so it is possible that different pathological characteristics in older patients could result in decreased LER risk but not increased BVA outcomes long-term compared to younger patients [Figure 4]b.[9],[20],[24],[25] Forms of AMD that present earlier in life could target the deep capillary plexus, thought to be most involved in venous drainage, leading to increased LER risk without the contribution of other mechanisms to vision loss such as geographic atrophy that could be more prevalent in older patients.[26] Further research is needed to clarify these complex relationships.
The nonlinear impact of CST on predicted LER risk was similar to a relationship observed by Jaffe et al. A thinner or thicker retina was correlated with poor VA outcomes, while retinas with intermediate thickness tended to have better VA outcomes [Figure 4]d.[27] In this study, increased risk at very high CST could be explained by increased retinal fluid, making residual fluid at 3 months more likely, while lower CST could indicate the thinning of functional retinal tissue due to geographic atrophy, suggesting a more pathologic state and increased LER risk. While the entirety of the lower CST range identified (<290 μm) does not constitute abnormally thin CST, the overall baseline CST values in this cohort are inflated [Figure 2]a, likely due to the presence of baseline retinal fluid.[28] Therefore, thinning of functional tissue due to geographic atrophy could still be present despite apparently normal CST.
The nonlinear impact of IRF on predicted LER risk may arise from a phenomenon discussed by Jaffe et al., whereby IRF could arise by multiple mechanisms, including VEGF-mediated neovascularization and non-VEGF-mediated apoptosis and necrotic cell death [Figure 4]e.[27] Very large amounts of IRF may arise from multiple simultaneous mechanisms, leading to LER when only the VEGF-mediated IRF source-not all sources-is targeted by anti-VEGF. In contrast, intermediate IRF levels may arise from VEGF-mediated mechanisms alone, increasing the likelihood of a robust anti-VEGF response. The lack of a consistent risk trend conferred by low IRF levels may result from a mixed population of patients for this range, some of whom present with a mild form of nAMD which responds well to anti-VEGF, and others who have a poorer prognosis but present with little IRF due to an earlier stage in the development of the disease at baseline.
In contrast to IRF, predicted LER risk increased consistently but nonlinearly as SRF increased [Figure 4]f. Therefore, SRF may arise predominantly from VEGF-mediated mechanisms and confer LER risk simply by overwhelming the capacity of the retinal venous system to drain fluid throughout the loading dose. Previous research found that baseline retinal fluid can be associated with poor long-term outcomes, but the impact of the quantity of baseline fluid on predicted LER probability has not been previously investigated with ML, rendering the nonlinear relationship in this study novel.[7]
Strengths of this study include being the first to assess baseline variable importance and nonlinear relationships for predicting LER risk. Limitations include the relatively small cohort size and the fact that interactions between variables remain poorly understood. In future work, predictive accuracy may be improved by the inclusion of features representing fluid location, which have previously contributed to ML assessment of ophthalmic disease.[10] In addition, this real-world study used BVA instead of best-corrected visual acuity (BCVA), which is the metric most often used in clinical trials to assess visual outcomes. However, this is an inherent limitation to real-world practice, as BCVA is not routinely assessed at all clinical visits. In addition, a larger patient cohort may enable the use of deep learning methodology to improve accuracy.[16]
ConclusionBy considering variable importance and nonlinear relationships, this study promotes nuanced consideration of LER in nAMD, which could lead to individualized predictions of LER risk for personalized medicine. If LER risk, which is correlated with poor long-term outcomes, can be predicted, then clinicians may recommend proactive dosing regimens and closer follow-up for high-risk patients. Further investigation of the clinical features, predictive factors, and implications of LER may enable the establishment of clinical recommendations for patients with LER.
Financial support and sponsorship
Nil.
Conflicts of interest
RPS reports personal fees from Genentech/Roche, personal fees from Alcon/Novartis, grants from Apellis and Graybug, personal fees from Zeiss, personal fees from Bausch + Lomb, personal fees from Regeneron Pharmaceuticals, Inc and personal fees from Gyroscope and Asceplix. SWP and AKW have no pertinent disclosures.
References
留言 (0)