Application of AI in biological age prediction

Chronological age has been recognized as the single largest risk factor for human diseases, including cancer, cardiovascular disease and neurodegeneration. During the aging process, a variety of molecular and cellular hallmarks have been proposed, such as epigenetic alterations, telomere attrition, upregulated chronic inflammation, loss of proteostasis, mitochondrial dysfunction, cellular senescence [1] (Figure 1). Biological age estimation is a challenging task involving the systematic measurement of key aging biomarkers, mortality modeling, health status and disease risk evaluation. Accurately predicting biological age is essential for the early detection of age-related diseases, treatment monitoring of clinical interventions, development of precise and personalized medicines and is also beneficial for the understanding of aging process itself. The most commonly used method to predict biological age is to employ machine learning algorithms to develop an aging clock based on these varying biomarkers. Furthermore, by comparing the actual chronological age with the predicted age, the age-independent part of the difference obtained is called Δage or AgeDiff, which can be used to estimate the rate of biological aging [2] (Figure 2a). DNA methylation changes, as a hallmark of aging, have been harnessed to develop the epigenetic aging clocks, which are among the first and most widely used aging clocks [3]. Since the development of epigenetic aging clocks, numerous aging clocks have been built to predict biological aging using various molecular and cellular biomarkers, particularly omics-based approaches, such as transcriptomic clocks [4], proteomic clocks [5], and metabolomic clocks [6]. Recently, advances in single-cell isolation and barcoding technologies have enabled measuring biological age at single-cell resolution, allowing the cell-type-specific profiling of epigenomic, transcriptomic and proteomic changes during aging [7,8]. Moreover, aging clocks that make use of non-invasive techniques, such as brain structural MRI [9,10] or human facial imaging [11,12], have achieved high accuracy and gained wide acceptance. These approaches enable the analysis of large-scale human cohorts and the correlation of biological age with environmental factors and disease risk and prevalence.

To model biological aging measured by different biomarker metrics, a variety of artificial intelligence (AI) methods have been employed, including traditional machine learning and more recent developed deep learning approaches. Traditionally, linear regression models are widely used to build aging clocks. Due to the large dimensionality of omics sequencing data, standard linear models perform poorly and fail to model the complex distributions of sequencing data. To address the limitations, penalized linear regression models, including ridge, lasso and elastic net, are commonly used to build aging clocks using high-throughput omics data such as DNA methylation, transcriptomic and proteomic data. The advantages of linear regression models are lightweight, simplicity and interpretability.

With the advances of AI, deep neural networks (DNN) are proving to be a powerful tool to build aging clocks, particularly for high-dimensional omics data and large-scale medical imaging datasets. DNN architectures are composed of multiple hidden layers between the input and output layer (Figure 2b). Each layer contains a given number of nodes (neurons) that can perform non-linear transformations on the inputs from the prior layer [13]. Typical DNNs include convolutional neural networks (CNN), multi-layer perceptron (MLP) and recurrent neural networks (RNN). CNNs are widely used for computer vision (CV) and medical imaging analysis. CNN architectures consist of multiple convolutional layers and pooling layers followed by a downstream classifier. The convolutional layer uses a kernel (filter) to extract structures and features from the input images (Figure 2c). With multiple convolution operations, CNNs are able to capture high-level representation of the input images. Recently, Transformer based architectures are revolutionizing natural language processing (NLP) and computer vision [14]. A Transformer is a novel deep learning architecture that adopts the self-attention, which is an attention mechanism that calculates the attention weights between each element in an input sequence, allowing the model to find the long-term dependencies and relationships in sequential data (Figure 2d). Popular Transformer-based models include AlphaFold2, BERT and GPT. Due to stronger modeling capabilities and greater scalability with respect to large models and large datasets, the Transformers are increasingly employed in medical imaging classification, omics data analysis and large model building.

In this review, we explore the application of artificial intelligence to build aging clocks. Specifically, we introduce how machine learning models can help the construction of aging clocks utilizing bulk omics data or single-cell sequencing data, as well as non-invasive aging biomarkers, with a particular focus on newly developed deep learning models. We also discussed the potential challenges and prospects for deep learning models to estimate biological aging.

DNA methylation clocks are among the earliest and most widely used aging clocks, known for their high accuracy. As early as 2011, Bocklandt et al. accurately predicted human age (MAE = 5.2) using array-based DNA methylation data with Lasso linear regression model [3]. After that, several methylation aging clock studies were constructed based on linear models such as elastic net [15∗, 16, 17, 18, 19, 20, 21]. Currently, there are four widely used DNA methylation aging clocks: the Horvath clock [16], Hannum clock [15], Levine clock [22], and GrimAge clock [23]. Recently, taking advantage of the rapid development of machine learning and artificial intelligence, Levy et al. built MethylNet, a deep learning-based aging clock, and the MAE of age prediction reached 3.0. Moreover, MethylNet is not limited to prediction, but can also be used for classification, regression and multi-output tasks [24]. Galkin et al. introduced another DNA methylation aging clock based on DNN, named DeepMAge, further enhancing the accuracy of DNA methylation-based aging clocks (MedAE = 2.8 from an Independent validation) [25]. Camillo et al. built AltumAge based on a deep learning model, which achieved a MAE of 2.15 for predicting age using all 20,318 methylation sites, significantly better than other traditional models such as ElasticNet [26]. These results highlight the potential of utilizing advanced techniques to improve the precision and effectiveness of DNA methylation-based aging clocks.

The aging clocks based on other data have also been quickly established. In 2015, Peters et al. constructed the first transcriptome aging clock (MAE = 7.77) using multiple linear regression based on the gene expression data of 14,983 individuals’ whole blood [4]. Subsequently, Fleischer et al., Mamoshina et al. and Xia et al., in 2018, 2018 and 2020 respectively, also built linear transcriptome aging clocks based on human dermal fibroblasts, muscle and PBMC gene expression data (MedAE = 4.0, MAE = 6.24 and MAE = 5.68) [12,27,28]. Holzscheck et al. developed an artificial neural network built from well-described biological pathways that can predict age from gene expression data in skin tissue (MAE = 5.51), while revealing the aging state of the pathway that contribute to the prediction [29]. Urban et al. first constructed a transformer-based model [30], named Precious1GPT, for age prediction using DNA methylation and transcriptome data. However, compared to previous linear models, no significant improvement in accuracy was observed (MAE = 4.22 for methylation, MAE = 6.29 for gene expression and MAE = 5.62 for combined).

Protein homeostasis has long been recognized as one of the hallmarks of aging [1]. Protein levels can also be used to predict age. In 2014, Krištić et al. first established an aging clock based on the measurement of IgG glycosylation in human plasma (MAE = 9.7) [31]. In 2021, Sayed et al. developed an inflammatory aging clock (iAge) based on deep neural network method called guided auto-encoder using the blood immunome of 1001 individuals [32]. In 2018, Tanaka et al. created an aging clock utilizing elastic net regression models for the plasma proteome as a whole, rather than focusing on a specific protein class (r = 0.94) [5]. In 2019, Lehallier et al. achieved more accurate age predictions using lasso model and a larger number of plasma proteins (n = 373) (PCC = 0.97). Furthermore, their research creatively discovered that the number of age-associated proteins does not change steadily with age but fluctuates, exhibiting peaks at the ages of 34, 60, and 78 [33]. The following year, they refined their model to higher accuracy by further combining different proteins (MAE = 2.44) [34].

In addition, there are aging clocks based on other types of data. In 2020, Galkin et al. developed human gut microbiome aging clock based on taxonomic profiling and deep learning for predicting host age [35]. They found that the architecture of deep neural networks (MAE = 5.91) performed better compared to linear models like elastic net. Chen et al. developed a microbiome aging clock with an ensemble model with multiple heterogeneous algorithms namely multi-view learning, which combined species and pathway profiles and showed acceptable accuracy (MAE = 8.33) [36].

The advancement of single-cell high-throughput omics technologies enabled the scientific community to decode the aging hallmarks at single-cell resolution. In this part, we summarized the application of AI methods on bulk or single-cell omics to estimate biological aging of individual cells.

In 2017, our group developed an “iCpSc” package, which predicts differentiation time for individual mouse embryonic stem cells (mESC) single cells utilizing matched cell population RNA-seq (cpRNA-seq) as a reference [37]. By leveraging partial least squares regression, we demonstrated that the first two components of the model suffice to construct a linear model with satisfactory performance. Additionally, Singh et al. introduced GERAS (Genetic Reference for Age of Single-cell), a machine learning age classifier capable of predicting age stages of individual cells based on their transcriptomes [38]. GERAS demonstrated over 90 % accuracy in classifying zebrafish and human pancreatic cell ages. Bulteau et al. presented another computation method named real-age prediction from transcriptome staging on reference (RAPToR) [39]. RAPToR utilizes existing time-series bulk transcriptome data as a reference to estimate real age of a sample through Independent Component Analysis (ICA) or Principal Component Analysis (PCA) based maximum likelihood estimation. The study showcased RAPToR's capability in predicting age of not only individual animal, but also dissected tissue and single-cell data. An alternative approach has been proposed by Buckley et al., they developed cell-type-specific aging clocks to predict both chronological age and biological age (neural stem cell proliferation capacity) in the six most abundant cell types within subventricular zone neurogenic region [7]. Leveraging scRNA-seq data from 28 mice spanning 26 different age stages, they observed varying age-sensitivity among the six cell types. Moreover, the results showed that lasso models trained on bootstrap-sampled meta cells outperformed all other models including single cell models. Notably, these clocks were successfully employed to evaluate the effects of anti-aging interventions such as exercise and heterochronic parabiosis. Similarly, Lu et al. developed a human CD8+ T cell-specific aging clock using a fitted mixed-effect elastic net model [40]. By utilizing scRNA-seq data from both cross-sectional and longitudinal samples, the model accurately predicted the age of individual cells based on their transcriptomic features. Moreover, the researchers uncovered a close association between cell age, cell differentiation, and mutation burden. Instead of building aging clock based on gene expression profile, our group constructed a single cell aging clock based on cell type composition using a partial least square regression (PLSR) linear model [41]. The clock, developed from scRNA-seq data from PBMC samples, revealed supercentenarians (SCs) have delayed age-related changes in cell composition and gene expression compared to clock model expected values.

Epigenetic clocks offer another avenue for measuring single-cell aging. In 2021, Trapp et al. pioneered the development of the first epigenetic single-cell aging clock, named scAge [8]. This novel approach infers the biological age of individual cells from bulk data using maximum likelihood estimation. Apart from DNA methylation modifications, ribosomal DNA methylation can also be an indicator of aging. A multi-tissue ribosomal DNAm (rDNAm) clock was constructed using bulk DNAm data [42]. Moreover, the authors applied this clock to predict single-cell age from single-cell methylation sequencing data by converting single cell sequencing data into pseudo bulk sequencing data. In a preprint from May 2022, a mitotic age predictor called EpiTrace was developed utilizing single-cell chromatin accessibility sequencing (scATAC) data [43]. By quantifying the opened fraction of clock-like differential methylation loci (ClockDML) in single cells, EpiTrace provides accurate age prediction.

In general, there are two main strategies of estimating the age of a single cell: inferring age from either omics data of bulk tissue/purified cell population or from single-cell omics data. Datasets derived from bulk tissue offer enhanced consistency compared to those from single cells. However, bulk tissue and even purified cell populations amalgamate molecular changes across diverse tissue compositions and cell states, masking the authentic gene distribution inherent to a specific cell type. The proliferation of single-cell omics data in recent years has provided the avenue to construct aging clocks directly from single-cell omics data. This facilitates the investigation into the distinct contributions of cell types to aging and rejuvenation process. Nonetheless, it imposes higher demands on data processing and model development due to the elevated noise and heterogeneity intrinsic to single-cell omics data.

Non-invasive aging clocks are a type of aging clock that uses data collected from non-invasive examinations to evaluate the biological status of aging. These examinations can include facial images, biomedical images (e.g., MRI) and questionnaires. Using deep learning models and artificial intelligence, these clocks provide accurate estimations of chronological age with minimal error rates. In view of the convenience and cost-effectiveness of the collection methods, non-invasive examinations are more applicable for routine medical check-ups and mass screening than invasive examinations.

Taken facial images first, our group published the first 3D facial image-based age predictor in 2015 which has a MAE of 6 years and extended the facial age prediction to a new AI-based model by collecting 3D facial images from a cohort of ∼5000 Han Chinese [44]. New facial aging clock achieved MAE of 2.8 years to estimate chronological and perceived age (the age that a person is visually estimated to be on the basis of face), respectively named FaceCnnAge and FaceCnnPerceivedAge, and revealed a high heterogeneity of aging rate in middle age, providing evidence for the best time for anti-aging interventions. Besides, causal inference test between lifestyles and aging rate uncovered potential lifestyle factors, such as smoking or yogurt consumption that exacerbate or mitigate the aging rate [12].

Besides facial images, PhotoAgeClock is developed to predict chronological age with deep neural network through a single eye corner image and overall 8414 images of eye corners reach an MAE of 2.3 years for model error [45]. However, there lacks biological relevance for this aging clock. Furthermore, fundus image, involving photographing the rear of an eye and including main structures of retina, have been applied to construct retinal aging clocks with deep learning models. Zhu et al. built a CNN-based retinal aging clock with MAE of 3.55 years and showed that each one year increase in the retinal age gap (retinal age predicted minus chronological age) was associated with a 2 % increase in risk of all-cause mortality [46]. More accurate retinal aging clock has been obtained by Ahadi et al. and was reinforced through GWAS hits by identifying candidate genes associated with several age-related functions. The top GWAS locus was further validated via knockdown of the fly homolog, Alk, which slowed age-related decline in vision in flies, thus strengthening the biological foundation of retinal aging clocks [47].

Other non-invasive examinations also include biomedical images of MRIs. In 2017, Cole et al. generated 'brain-predicted age' on brain MRI data using 3D-CNN networks with MAE of 4.16 years, which is better compared to conventional machine learning method Gaussian Process Regression (GPR) [9]. Recently, Yin et al. used an interpretable 3D-CNN network to estimate MRI-derived brain age and reached MAE around 2.3 years [10]. Their models provided detailed anatomic maps of brain aging patterns that reveal neurocognitive trajectories in adults with mild cognitive impairment and Alzheimer's disease. They demonstrated that significant associations between biological age and early signs of Alzheimer's disease and provided insight into early identification of individuals at high risk of Alzheimer's disease.

In conclusion, non-invasive aging clocks represent a transformative paradigm in aging research, enabling precise and accessible assessments of the biological status of aging. The implementation of deep learning models and artificial intelligence has opened unprecedented opportunities for advancing preventive and personalized anti-aging strategies, as well as facilitating early detection of age-related health risks on a large population scale. Continued research and validation efforts will further enhance the scientific significance and applicability of non-invasive aging clocks in diverse clinical and public health settings.

Biological aging is a systemic and complex process, with diverse molecular, cellular and organ-level changes. Thus, large numbers of machine learning models were built to estimate biological age leveraging different biomarkers of aging. Traditional machine learning methods are still widely used to build the epigenetic, transcriptomic and other omics-based aging clocks due to their less data intensive, better model interpretability and computational simplicity. On the other hand, DNN-based models are quickly dominating other types of aging clocks, particularly those based on imaging and multi-modal data. In addition, AI models, especially deep learning-based models, are growing extremely large and complex in order to obtain higher accuracy, robustness and generalization in handling big data.

Here, we listed the state-of-the-art AI models for biological aging prediction, with an emphasis on DNN-based methods (Table 1) (Table 2).

However, there are still challenges for the wide use of deep learning in biological aging estimation.

Considering the multifaceted nature of aging, a single modality of data may be insufficient to capture the complexity of biological aging. Using multi-modal data, including omics data, imaging data, electronic records and clinical tests, to generate a composite aging clock may better represent biological aging, improve model interpretability and more accurately reflect the pace of biological aging [50,51]. Recently, advances in single-cell mutiomics sequencing technologies have enabled the simultaneous collection of multi-modal datasets, providing opportunities to systematically explore the cellular diversity and heterogeneity in aging tissues.

One of the major challenges for deep learning-based aging clocks is model interpretability. The “Black Box” nature of deep learning has generated widespread concerns when it comes to clinical applications. To address the issue of model interpretability, a variety of algorithms have been developed in order to explain the model's predictions and identify the important factors contributing to the model's decisions, such as CAM (classification activation map), SHAP (SHapley Additive exPlanations) and attention maps for Transformer-based models. For example, SHAP, a model interpretation method derived from game theory, explains individual predictions by computing the contribution of each feature to the prediction and has been widely used to explain the output of aging clocks based on DNN models [24,26]. Lucas Camillo and colleagues extracted the most important CpG sites based on the SHAP values from the neural network-based epigenetic clock, AltumAge, and found that the top-ranking CpG sites are highly associated with gene regulatory regions in the genome [26]; Grad-CAM has been employed to localize the anatomy contributing to CXR-Age [49]. Furthermore, an increasing number of deep-learning based models have incorporated interpretable features in model design [30,52]. For example, by transforming high-dimensional sparse expression space to low-dimensional pathway-level embeddings, TOSICA, a Transformer based model, can also achieve interpretability through calculating the attention weights between classifier token (CLS) and pathway tokens [52].

The ultimate goal of building aging clocks is to provide clinical guidance for anti-aging interventions and predict disease risk. The traditional linear models have made it possible to predict biological aging using a limited number of molecular markers, such as DNA methylation sites, protein and RNA molecules. However, challenges in dissecting the causative factors have limited their clinical intervention applications. For deep learning-based models, the effector genes and factors are difficult to retrieve from the model due to the over-parameterized “black box” nature. To tackle these problems, Cumplido-Mayoral et al. developed a brain-age clock from structural neuroimaging data and found that brain-age delta is associated with specific neurodegeneration and cerebrovascular disease biomarkers [53]; Xia et al. built a causal-inference network to infer the impact of molecular mediators in the blood and lifestyles on human facial-aging rate, guiding the lifestyle choices and anti-aging interventions [12].

The growing large datasets and multi-modal high-dimensional data collected from aging biomarkers necessitates the development of more powerful AI models to integrate, process and analyze. Deep learning-based aging clocks, particularly CNN and Transformer-based architectures, have demonstrated to be powerful tools in estimating biological aging for bulk and single-cell omics data and non-invasive imaging data. We expect these aging clocks will be able to assist in evaluating human health status, predicting disease risk and guiding the personalized interventions of human aging in the near future.

View original article

CURRENT OPINION IN STRUCTURAL BIOLOGY

分享书签

0 0 0 0 0 0 0

More from this channel

Application of AI in biological age prediction

留言 (0)