Early Alzheimer's disease (AD) includes the following three successive stages: significant memory concern (SMC), early mild cognitive impairment (EMCI), and late mild cognitive impairment (LMCI). AD is a common long-term neurological disorder in the elderly, which is generally connected with the gradual decline in understanding, judgment, memory, and executive ability until complete loss. AD is known as the leading cause of death among old people worldwide (Zhang et al., 2022), and its great harmfulness brings heavy psychological pressure and economic burden to the families of patients. According to literature (Derby, 2020), the number of people suffering from AD and other dementias in the world currently exceeds 50 million, and the aging population further aggravates the rise of the patient population. However, there is no consensus on the pathological mechanism (Yuzwa et al., 2008; Diplas et al., 2018), and many pharmaceutical companies had tried and failed to develop effective drugs to cure AD. Therefore, early detection and timely intervention for AD are the only possible way in slowing down or preventing disease deterioration (Jack et al., 2013). The development of neuroimaging has made the use of non-invasive AD study become the mainstream of current research because of no side effects on patients (Wang et al., 2018c; Grassi et al., 2019; Yu et al., 2021; Alvi et al., 2022; Lei et al., 2022; You et al., 2022). It is very promising for the scientific community to develop effective methods to detect brain disease and assist clinical treatment from medical imaging data (Wang et al., 2022).
The brain functional network (BFN) derived from functional Magnetic Resonance Imaging (fMRI) describes the functional interactions among spatially distributed brain regions. Brain science indicates that abnormal functional connectivity always appears at the early stage of AD (Berron et al., 2020). The BFN can give a universal understanding of neurological symptoms and unravel the pathogenesis of cognitive diseases. As mentioned in Yu et al. (2020) and Zuo et al. (2021a), the whole brain is divided into several Region-of-Interests (ROIs) according to the anatomical template. The BFN is modeled as a graph, where each node represents the ROI and each edge represents the functional connection strength between paired ROIs. The conventional method is to use a software toolbox to construct functional connectivity (FC) and then extract effective features for disease diagnosis. For example, Kabbara et al. (2018) investigated the abnormal hub patterns associated with patients' cognitive performance by applying graph-theory analysis on the constructed functional connectivity. This work preserved the topological structure and gained better evaluation performance than the feature extraction algorithm (Wang et al., 2017; Zuo et al., 2021b; Yu et al., 2022) in Euclidean space. Considering the complexity of brain neural activities and noisy data preprocessed from the raw fMRI, it is significant for clinicians to investigate more advanced methods for modeling effective BFNs in early AD analysis.
Brain functional network construction by using time series can be divided into two classes: static-based method, and dynamic-based method. The former utilized the whole brain time series of fMRI to bridge links between ROIs for AD analysis. The direct way of constructing a brain functional network is to compute the person's correlation (PC) between any paired brain regions (Wang et al., 2007). To reduce the possible impact of adjacent ROIs, Fransson and Marrelec (2008) employed partial correlations to handle this problem and achieved good performance in characterizing the changes of the default mode network associated with the disease. But the calculation of an inverse matrix usually comes up with multiple solutions, so researchers adopted certain constraints on the partial correlation estimation for a stable solution. For example, the matrix-regularized network was encoded as modularity prior to optimizing sparse brain network and they (Qiao et al., 2016) discovered potential biomarkers for personalized diagnosis. The latter method benefits the temporal changes of brain functional connectivity for capturing subtle transient neural abnormalities and has recently been a hot spot in neurological disease analysis. The direct approach is to generate a sequence of functional networks and designed a fused learning algorithm to jointly estimate the temporal network for early MCI detection (Wee et al., 2016). Furthermore, the work in Gong et al. (2022) treated the functional time series and functional connectivity as the node features and edges respectively, and developed a graph convolutional network (GCN) based model to generate multiple brain networks for characterizing brain temporal community by setting six-time sliding steps. To address the noisy problem of limited volumes in a sliding window, Zhou et al. (2018) proposed a matrix-regularized learning framework to learn sparse and modular high-order connectivity features for MCI classification. Although many studies have been conducted in BFN construction, they mainly rely on some specific preprocessing in the software toolboxes to obtain temporal features of each ROI. The drawbacks lie in two fields: one is that the multiple parameter settings may lead to different errors from person to person, and another is that a series of processes can consume much time and fall far away from the goal of clinical application.
Recently, data-driven models are capable of mining effective common characteristics from noisy data. It has been widely applied in various fields of medical image analysis, such as disease severe assessment (Wang et al., 2020c), lesion area segmentation (Hong et al., 2022b), health assessment (Wang et al., 2018b), disease detection (Wang et al., 2018a; Yang et al., 2022), image reconstruction (Hu et al., 2020b). To improve disease analysis performance, many advanced machine learning algorithms are designed to extract discriminative and robust features (Zeng et al., 2017; Lei et al., 2018; Hong et al., 2019; Wang et al., 2020b). Compared with the classification performance of traditional Convolutional Neural Networks (CNN) (Wang S.-Q. et al., 2015), the 3D Convolutional Neural Network (C3D) is good at capturing the local spatial features in a three-dimensional volume and has been successfully applied on the cross-modal image synthesis (Hu et al., 2020a) and disease recognition (Wang et al., 2020a). Moreover, the transformer network (Jiang et al., 2021) can model the global relationship between distant sub-patch regions. The ROI-based features can be learned by C3D and transformer in sequence from 4D fMRI data. Besides, Generative adversarial networks (GANs) are regarded as a special case of variational inference (Mo and Wang, 2009; Wang, 2009) and demonstrates impressive performance in matching generated data distributions. The obvious evidence is the success in generating cross-modal medical images (Hu et al., 2019, 2021) and domain adaptation segmentation (Hong et al., 2022a). It can be used as a regularizer to constrain the representation learning for stable and generalizable disease analysis.
Inspired by the above observations, in this paper, a novel Adversarial Temporal-Spatial Aligned Transformer (ATAT) model is proposed to automatically learn brain functional networks from 4D fMRI for detecting early AD. The constructed brain functional networks are also analyzed to identify important ROIs and abnormal connections. The main contributions of this work are as follows: (1) The region-sequence aligned generator (RAG) is developed to first learn rough ROI-based features by incorporating the brain anatomical information, then finely adjust the boundary features of adjacent ROIs to generate ROI time series and connectivity features. It greatly enhances the ROI time series learning and fully explores the spatial-temporal characteristics and connectivity information among the whole brain. (2) The multi-channel temporal discriminator is designed to constrain the learned ROI time series with the empirical samples. It regularizes the generator optimization and makes the connectivity feature more robust. (3) Experimental classification results prove the effectiveness of our model, and the discovered important ROIs and abnormal connections may be potential biomarkers for early AD diagnosis or treatment.
The rest of this article is organized as follows. Section 2 describes the novel proposed ATAT model for brain functional network construction. The experimental settings and prediction results with competing methods are presented in Section 3. In the Section 4, the reliability and limitations of this work are discussed. Finally, the Section 5 summarizes the main remarks of this paper.
2. Materials and methodsThe proposed model includes three main parts, such as (1) data preprocessing, (2) architecture of the proposed model, and (3) objective functions for optimization.
2.1. Data description and preprocessingThe experimental data comes from the public Alzheimer's Disease Neuroimaging Initiative (ADNI-3). A total of 330 subjects with functional Magnetic resonance (fMRI) were downloaded from the website, including 86 Normal Control (NC), and three successive stages of early AD (i.e., 82 SMC, 86 EMCI, 76 LMCI). The fMRI data is acquired under the 3.0 Tesla machine. The detailed scanning parameters for fMRI are as follows: the imaging resolution ranges from 2.5 to 3.75 mm along X and Y dimensional direction, the imaging slice thickness is between 2.5 and 3.4 mm; the time of repetition (TR) ranges from 0.607 to 3.0 s, and the time of echo (TE) value is in the range of 30 to 32 ms. The recording time length is about 10 min. The mean age of NC, SMC, EMCI, and LMCI is 74.4, 76.1, 75.7, and 75.8, respectively. The gender is roughly the same in each category.
The fMRI data is preprocessed by the software toolbox GRETNA (Wang J. et al., 2015), which contains about six procedures for constructing ROI-based time series. Each 4D fMRI data is processed by balancing magnetization equilibrium, removing head-motion artifacts, normalizing spatial space, smoothing, and filtering (0.01Hz ≤ f ≤ 0.08Hz). Finally, the automated anatomical labeling (AAL) atlas (Tzourio-Mazoyer et al., 2002) warps the preprocessed image to 90 non-overlapping spatial ROIs, and the final functional features with the size 90 × 187 are obtained as the truth samples. Meanwhile, the empirical functional connectivity is estimated by calculating the Pearson correlation coefficients between paired ROI time series, and this procedure can generate a 90 × 90 correlation matrix for each subject.
2.2. ArchitectureThe architecture of the proposed ATAT is shown in Figure 1. It contains three parts: the region-sequence aligned generator (RAG), the multi-channel temporal discriminator (MTD), and the global-local connectivity classifier (GCC). The RAG includes a region-guided feature learning network(RFLNet), and a spatial-temporal aligned transformer (SAT), which transforms the 4D fMRI into ROI time series and brain functional network. Firstly, the raw fMRI data is first sent to the RFLNet for rough ROI-based feature extraction, and the SAT is utilized to finely adjust the feature for adjacent ROIs and align the global temporal correlation between any paired ROIs. Meanwhile, the obtained ROI time series is linearly transformed into brain functional networks through the connectivity learning (CL) network. After that, the generated ROI time series is constrained with the real sample distribution by the MTD. Finally, both ROI time series and brain functional networks are sent to the GCC for disease prediction. There are five objective functions in the model's optimization, including generator loss, discriminator loss, reconstruction loss, classifier loss, and regularized loss.
Figure 1. The framework of the proposed model. It consists of three parts: generator, discriminator, and classifier. The input is a four-dimensional fMRI, and the output is a brain functional network.
2.2.1. Region-sequence aligned generator 2.2.1.1. Region-guided feature learning networkAs illustrated in Figure 2. This network learns a rough mapping from the raw 4D fMRI to ROI-based time series by introducing the position and volume of the brain anatomical regions. The size of input data X is 64 × 64 × 48 × 187. It first passes through four blocks with three successive layers: 3 × 3 × 3 convolutional layers with 1-stride, 2 × 2 × 2 average pooling layer with 2-stride, and a combination layer of batch normalization(BN)+ReLu activation. The channel number of the above four convolutional layers are 8, 16, 32, 64. Then one 1 × 1 × 1 convolutional layer with 1-stride is used to increase the channels for matching the N ROIs, followed by a sigmoid activation function.
Figure 2. The detailed structure of the RFLNet. The input is an example of three-dimensional fMRI volume with the size 64 × 64 × 48, and the ROI information of the anatomical atlas with the size N × 4. The FRFNet outputs the initial feature for each ROI.
Next, we normalize the central location (x, y, z) and volume (v) of N anatomical ROIs to constrain the brain region information in the range 0 − 1. Finally, the (x, y, z, v) of N ROIs are treated as ROI embeddings, which are concatenated with the flattened feature of sigmoid layer output, which is sent to a one-layer linear projection (LP) layer for generating rough ROI features. The rough ROI feature can be expressed as:
F1=RFLNet(X,x,y,z,v) (1)here, the X is the four dimensional volume data fMRI; x, y, z, v∈ℝN×1; F1∈ℝN×q, the RFLNet is a combination of several convolution and pooling operations.
2.2.1.2. Spatial-temporal aligned transformerTo learn more fined ROI temporal features, the spatial-temporal aligned transformer module is designed to recalibrate boundary ROI features and time sequence variations. It splits into two parts: the spatial multi-head central attention (SMCA) and the temporal aligned feed-forward (TAFF). Every ROI is regarded as a token. The rough ROI feature is first sent to three parallel LP layers to get query (Q), key (K), and value (V). Note that, the calculation of K and V needs to consider the ROI embeddings. The formulas can be defined as:
Q=LP(F1), K=LP(F1||x||y||z||v), V=LP(F1||x||y||z||v) (2)where, || means the concatenation operation. Then Q, K, V∈ℝN×q are separated into h heads. Each head of token (i.e., Qi, Ki, Vi) has the dimensional size q/h. Taking one head as an example, the central attention (CA) can be expressed:
CAi=Softmax(QiKiT/q/h)Vi (3)here, i means the index of h heads. The output of the spatial multi-head central-attention module is the concatenation of all heads and then with an LP layer (including residual mapping and layer normalization). It can be defined as:
SMCA=LP(CA1||CA2||...||CAh)+F1 (4)The SMCA has the size N×q.
Next, the TAFF module adjusts the temporal characteristics through the down mapping (DM) and up mapping (UM) layers and reduces the potential noise effect. The DM layer reduces the dimensional of SMCA from q to q/2, and the UM layer recovers the feature's dimension. Finally, the output of the TAFF module can be defined as:
Fg=UM(DM(SMCA))+SMCA (5)where, Fg is the generated ROI time series with the size N×q.
To learn an effective brain functional network Ag, we first compute the Euclidean distance between any pair of ROI features and then apply a mapping matrix to it for similarity adjustment. Finally, a Gaussian kernel is introduced to learn non-linear projection for precise connectivity estimation. The formula can be defined as:
Ag(i,j)=exp(-(Fgi-Fgj)2W2σ2) (6)here, Ag(i, j) represents the functional connectivity between pairwise ROIs. Fgi∈ℝ1×q means the ith ROI time series. W∈ℝq×q is time series transformable matrix. σ is the bandwidth of the Gaussian kernel, controlling the sparsity with the default value 2.
2.2.2. Multi-channel temporal discriminatorAs shown in Figure 3, the multi-channel temporal discriminator (MTD) is used to constrain the generated functional time series (Fg) distribution consistent with the empirical functional time series (Fe). The Fe is computed from the software toolbox, which is treated as the true sample. The structure of MTD consists of N parallel networks, containing three linear projections with q/2, q, and q/2 neurons. Each MTD accepts i-th ROI time series and outputs one discriminate value. Averaging all the discriminate values is the final discriminate result.
Figure 3. The structure of the multi-channel temporal discriminator. It accepts the empirical time series or the generated time series, and outputs the average discriminant result for distribution constraints of all ROI time series.
2.2.3. Global-local connectivity classifierThe structure of the global-local connectivity classifier (GCC) is illustrated in Figure 4. It accepts both functional time series (i.e., Fe or Fg) and functional network (i.e., Ae or Ag), outputs the disease label. A total of 5 layers are designed in the GCC, including three graph convolutional layers, one graph pooling layer, and one three-layer perceptron. It is based on the graph convolutional network. The first three layers (i.e., Gconv1, Gconv2, Gconv3) are utilized to diffuse global features and reduce the ROI feature dimension. The graph pooling layer (Gpool) is utilized to average features along the ROI feature dimension and get one value for each ROI. And the MLP layer learns a linear mapping to recognize the disease.
Figure 4. The structure of the global-local connectivity classifier.
2.3. Objective functionsIn this section, the five loss functions defined below are utilized to optimize the model for disease prediction and analysis. The reconstruction loss Lrec can constrain the generator and retain the empirical features Fe, the generate loss Lg and discriminate loss Ld are combined to optimize the generator and discriminator, the classification loss Lcls and regularized loss Lreg are utilized to upgrade the parameters of CL network and GCC network. For the convenience of explanation, we make the following simplification: G means all the operations in the Region-sequence aligned generator, D means the multi-channel temporal discriminator, and C is the global-local connectivity classifier. The raw fMRI data X follows the distribution PfMRI, the PFe and PAe represent the empirical functional time series Fe and empirical BFN Ae distribution, respectively. Y is the truth label. These loss functions are defined as follows:
Lrec=?X~PfMRI,Fe~PFe( ||G(X)-Fe|| ) (7) Lg=?X~PfMRI[ (1-D(G(X)))2 ] (8) Ld=?X~PfMRI[ (D(G(X)))2 ]+?Fe~PFe[ (1-D(Fe))2 ] (9) Lcls=?X~PfMRI[-Y·log(C(G(X)))] +?Ae~PAe,Fe~FAe[-Y·log(C(Ae,Fe))] (10) Lreg=E(||W||) (11)The hybrid cost of the proposed model is:
Lall=Lrec+Lg+Ld+Lcls+λLreg (12) 3. Experiments and results 3.1. Experimental setupThere are six binary classification tasks for the evaluation of the proposed model, including NC vs. SMC, NC vs. EMCI, NC vs. LMCI, SMC vs. EMCI, SMC vs. LMCI, and EMCI vs. LMCI. The evaluation metrics are Accuracy (ACC), Sensitive (SEN), Specificity (SPE), and F1-score. We repeated the 10 times experiment using the five-fold cross-validation on each binary classification and utilized the mean value metrics for the final prediction. To demonstrate our model's good ability in FBN construction, we introduce two classifiers [i.e., SVM (Suthaharan, 2016) and GCN (Kipf and Welling, 2016)] to compare the BFN constructed by ATAT and Empirical.
Our proposed model was implemented with the TensorFlow framework on Ubuntu18.04 and the GPU of NVIDIA GeForce RTX 3080 Ti. The parameters in the experiments are defined as follows: N = 90, q = 187, h = 11, m = 3, λ = 10−5. During the training, we first update the weights in the generator and the discriminator, then fix part of the generator and optimize the network of CL and GCC. The learning rate of the generator and the classifier were set to 0.0001, and for the discriminator, the learning rate is set to 0.0004. The Adam was adopted for training the proposed model with batch size 2.
3.2. Prediction resultsThis section demonstrates the good performance of BFN constructed by the proposed model. As shown in Figure 5, the upper row shows the four stages of empirical FBN derived from the GRETNA, while the lower row displays the corresponding FBNs by the proposed model. The main connectivity patterns have been preserved and dense connections become sparse by comparing the empirical and ours. Figure 6 gives the classification result comparison in terms of three scenarios tasks. For the GCN classifier, the BFNs constructed by ours achieve the best prediction results with a mean ACC of 87.50%, a mean SEN of 84.26%, a mean SPE of 90.58% and a mean F1 of 86.81% in NC vs. SMC task; the mean values of SMC vs. EMCI are 90.47, 91.86, 89.02, and 90.80%; in EMCI vs. LMCI task, the predicted results are 85.61, 84.86, 86.27, and 84.70%. The standard error also shows the superior stability of the proposed model.
Figure 5. Display of brain functional network examples at different disease stages. The BFNs in the upper row are generated by the GRETNA toolbox, and the BFNs in the lower row are generated by the proposed model.
Figure 6. Prediction results of three scenarios tasks using (A) the SVM classifier and (B) the GCN classifier.
To investigate the potential AD-related ROIs, we shield one brain region and calculate the classification ACC as the effect of this ROI on AD progression. After sorting the ACCs in ascending order, the top 10 values are the most important ROI in the classification evaluation. As is shown in Figure 7, the spatial distribution of 10 important AD-related ROIs is displayed in lateral, medial, and dorsal views using the BrainNet Viewer (Xia et al., 2013). Specifically, the top 10 related ROIs are IFGoperc.L, MTG.R, PCL.L, PUT.R, CUN.L, SMA.R, LING.L, DCG.R, PCUN.R, DCG.L in NC vs. SMC classification scenario; The ten ROIs, including PCL.R, CAL.L, CUN.R, HIP.R, CAL.R, TPOsup.L, SFGdor.L, ACG.R, CAU.R, PCL.L, are important for NC vs. EMCI; also, the top 10 ROIs of NC vs. LMCI are SOG.L, ORBsup.L, REC.L, PUT.L, PCG.L, ITG.L, PCUN.R, MTG.R, PUT.L, ORBsupmed.L; For SMC vs. EMCI and SMC vs. LMCI classification, the important ROIs are OLF.L, CUN.R, PCUN.L, CAL.R, CAU.R, LING.L, ACG.R, CAL.L, PCL.L, DCG.R, and PCUN.R, PUT.L, PUT.R, PCL.L, SMA.R, ORBsup.L, LING.L, ANG.L, HIP.R, ACG.R, respectively; For EMCI vs. LMCI, the important ROIs are PCUN.L, ORBsupmed.R, THA.L, ORBsupmed.L, ORBsup.R, CAU.R, CAL.L, PUT.L, REC.L, ORBsup.L.
Figure 7. Spatial visualization of top 10 brain regions in the six classification scenarios. (A) NC vs. SMC. (B) NC vs. EMCI. (C) NC vs. LMCI. (D) SMC vs. EMCI. (E) SMC vs. LMCI. (F) EMCI vs. LMCI.
3.3. Brain network analysisBesides the prediction of different early AD stages, the other major purpose is to analyze the learned FBNs. After applying the ATAT model to each subject, we can obtain the mean FBN for each group of patients (i.e., NC, SMC, EMCI, and LMCI). To investigate the altered connectivity of FBN between different groups, we compute the difference of six paired scenarios as shown in Figure 8. In each subplot, reduced and increased connectivity can be observed between different paired groups. To analyze the significant connections, we set the 90% quantile value of the altered connectivity strength as the threshold. The pictures in the lower row of each subplot are the corresponding connectivity matrices by setting the threshold value. Figure 9 shows these significant connections in a circular graph. The number of reduced connections are 219, 263, 235, 251, 222, and 163 for NC vs. SMC, NC vs. EMCI, NC vs. LMCI, SMC vs. EMCI, SMC vs. LMCI, EMCI vs.LMCI, respectively; the corresponding number of increased connections are 183, 139, 166, 150, 179, 239. To show the main connectivity patterns in different classification scenarios, we select the top 2% largest altered connections (i.e., reduced, and increased). As shown in Figure 10, different connectivity patterns can be seen in different classification scenarios. Figure 11 depicts the top 5 reduced and top 5 increased connections in the axial and coronal view direction. The connectivity-related ROIs are listed in Tables 1, 2.
Figure 8. (A–F) The results of the altered functional connectivity estimated from the averaged BFNs between different groups (i.e., NC vs. SMC, NC vs. EMCI, NC vs. LMCI, SMC vs. EMCI, SMC vs. LMCI, EMCI vs. LMCI). In each subfigure, the upper row means the reduced and increased connections, the lower row shows the altered connections selected from the upper row with a threshold of 90% quantile value.
Figure 9. Circular graph of altered functional connectivity in MCI patients among 90 Anatomical Automatic Labeling (AAL) atlas regions. (A) From NC to SMC. (B) From NC to EMCI. (C) From NC to LMCI. (D) From SMC to EMCI. (E) From SMC to LMCI. (F) From EMCI to LMCI.
Figure 10. Top 2% altered functional connections in strength evaluation in the six classification scenarios. Each subfigure shares the same color bar, which means the absolute connection strength. (A) NC vs. SMC. (B) NC vs. EMCI. (C) NC vs. LMCI. (D) SMC vs. EMCI. (E) SMC vs. LMCI. (F) EMCI vs. LMCI.
Figure 11. The most significant 5 reduced connections and 5 increased connections mapped on the AAL 90 template using the BrainNet Viewer software package. Blue color means the ROIs, red color means reduced connections, and green color means increased connections. (A) From NC to SMC. (B) From NC to EMCI. (C) From NC to LMCI. (D) From SMC to EMCI. (E) From SMC to LMCI. (F) From EMCI to LMCI.
Table 1. The top 10 significant altered connections estimated from the generated FBNs in NC vs. SMC, NC vs. EMCI, NC vs. LMCI using AAL90 template (− means reduced connections, + means increased connections).
Table 2. The top 10 significant altered connections estimated from the generated FBNs in SMC vs. EMCI, SMC vs. LMCI, EMCI vs. LMCI using AAL90 template (− means reduced connections, + means increased connections).
4. Discussion 4.1. Effect of the generatorThe main goal of the proposed model is to generate BFNs from 4D fMRI data. The modules in the generator play an important role in disease prediction and brain network analysis. To investigate the influence of the generator structure on the classification performance (i.e., NC vs. LMCI), we replace the RFLNet and the SAT modules with traditional C3D (Hong et al., 2020) and transformer (Jiang et al., 2021), respectively. In both cases, the anatomical ROI information is not included in the module. Figure 12 shows that either the C3D or Transformer network can degrade the prediction performance, and the traditional transformer network has a worse influence on classification than the C3D network. It may indicate the proposed RFLNet learns rough ROI-based features with a litter effect on the results, and the SAT network finely adjusts the adjacent ROI-based temporal features which may greatly influence the classification performance. Furthermore, the reconstructed error of the ROI time series is measured by the mean absolute error (MAE) metric. As shown in Figure 13, the divergence of the MAE for each disease (i.e., NC, SMC, EMCI, and LMCI) demonstrates the reliable results of the designed generator.
Figure 12. Influence of different generator structures on the classification performance.
留言 (0)