Artificial Intelligence Approaches for Early Detection and Diagnosis of Alzheimer's Disease: A Review

: Alzheimer's Disease (AD) is an irreversible neurodegenerative disease common in the elderly. The application of artificial intelligence technology to the early diagnosis of AD can not only improve the accuracy of prediction compared with traditional methods, but also save the complicated manual feature extraction of traditional methods and speed up the diagnosis. This paper reviews various applications of artificial intelligence algorithms in AD diagnosis, including machine learning, convolutional neural network, graph convolutional neural network, cyclic neural network and other mainstream deep learning technologies. The advantages and disadvantages of each approach are discussed, and finally, we discuss limitations and future prospects.


Introduction
Alzheimer's disease (AD) is an irreversible degenerative disease of nervous system. It is also known as "senile dementia" because it is more common in the aged group over 60 years old, and its specific clinical manifestations are cognitive decline, memory decline, language function degradation [1]. With the deepening of aging population, the incidence of AD is constantly increasing. According to statistics, there is one AD patient every 3 seconds in the world, and the costs related to dementia continue to increase [2]. It is estimated that by 2050, one out of every 85 people in the world will be affected by this disease, and the number of AD patients will exceed 114 million [3].
Mild Cognitive Impairment (MCI) is a transitional state between AD and Control Normal (CN). In this stage, patients' memory function and thinking ability are slightly decreased, but daily activities are not affected [4]. According to different classification criteria, MCI patients can be divided into different subtypes. Based on the progression of the disease, MCI can be divided into Early MCI (Early MCI, EMCI) and Late MCI (Late MCI, LMCI). Based on the mechanism of disease transformation, MCI can be divided into Stable MCI (sMCI) and Progressive MCI (pMCI). Studies have shown that 32% of MCI patients will develop AD within five years, while the annual conversion rate of AD among cognitively normal elderly is only about 1% [5]. However, if patients with MCI are identified in time and treated effectively, patients with MCI may not eventually develop AD. Therefore, early detection and treatment of MCI can effectively avoid the occurrence of AD, which has important clinical and social significance.
With the rapid development of neuroimaging technology, Structural Magnetic Resonance Imaging (sMRI), Diffusion Tensor Imaging (Diffusion Tensor Imaging, DTI), Functional Magnetic Resonance Imaging (fMRI) and Positron Emission Computed Tomography, Brain imaging such as PET has been widely used in the clinical diagnosis of AD. The rich neuroimaging technology brings convenience to the clinical diagnosis of AD, but also raises new problems. First of all, with the continuous update and improvement of neuroimaging technology, the types of brain imaging images of each patient in clinical practice have increased significantly, which not only AIDS the diagnosis of doctors but also increases the burden of doctors. Secondly, doctors find diseased areas mainly by observing neuroimages with naked eyes based on clinical experience, which makes the diagnosis results of patients mainly depend on the clinical experience of doctors, and the rich disease-related information contained in neuroimages is difficult to be observed with naked eyes only.
In recent years, as an important technology of Artificial Intelligence (AI), deep learning algorithm has achieved great success in many fields. More and more deep learning-based diagnosis algorithms for Alzheimer's disease have been proposed.
The rest of this paper is organized as follows. Section 2 discusses artificial intelligence technologies for AD diagnosis and discusses the advantages and disadvantages of these technologies. Section 3 provides an in-depth discussion of existing technologies and prospects for future work.

Literature Reviews
With the improvement of computer hardware, intelligent recognition algorithm based on machine learning has been more and more applied in Alzheimer's disease related research. At present, there are three mainstream research directions, which are the classification of the course of CN, MCI and AD based on MRI images. The brain MRI images were preprocessed and segmented. The morphologic transformation mechanism of brain from MCI to AD was studied, and the transformation course of patients was predicted based on MRI images. Among them, disease course classification based on MRI images is the direction with the most relevant studies and the greatest application significance. Since it is difficult for doctors to make clinical diagnosis directly from patients' brain MRI images, machine learning and data science technology are expected to assist doctors to identify patients with Alzheimer's disease through MRI images and improve the diagnostic accuracy.

Recognition of Alzheimer's disease based on machine learning methods
The recognition of Alzheimer's disease based on machine learning method is generally divided into the following steps: Firstly, preprocessing of the acquired neuroimage data of the subjects (including image registration, brain tissue segmentation, image noise reduction, etc.); Then the feature dimension is improved by dimensionality reduction method. Finally, the classification is carried out in the classifier and the result is obtained. As shown in Figure 1.

Feature Extraction
The process of preprocessing the original image data, eliminating the interference of redundant factors and highlighting the key information is called image feature extraction. Commonly used feature extraction methods include: (1) Voxel based Morphometry (VBM) [6]. (2) Region of interest, (ROI) methods (3) Patch based Approach methods [7], (4) slice based methods, (5) global image based methods.
VBM methods are often used in MRI image analysis. The principle is to compare the changes of Voxel intensity in three-dimensional images, and then reflect the morphological changes of the corresponding brain tissue. Zhang et al. [8]combined VBM extraction features with Support Vector Machine (SVM) classifier for clinical diagnosis of AD. As it directly uses the voxel strength as the classification feature, it is the most simple and intuitive. But the defect of VBM method is that it ignores the region information and the high dimension of feature vector. Good et al. [9]proposed an improved VBM method to avoid the interaction between brain tissues during spatial standardization.
Studies have found that some brain tissues (such as hippocampus, temporal lobe, amygdala, etc.) in patients with Alzheimer's disease show a certain degree of atrophy compared with normal people. Roi-based method classifies by extracting the image features of specific brain regions, which can eliminate the interference of irrelevant brain regions and improve the classification effect. Ortiz [10] used voxel preselection to select 98 ROIs from MRI and PET data and designed a depth model for each ROI. Wee et al. [11] built the alarm network through the correlation between the average cortical thickness of ROI, and adopted the twosample t test to select brain connections with significant differences in the brain network, so as to improve the accuracy of AD. Cui et al. [12] proposed a hippocampal analysis method for AD diagnosis based on the combination of 3D dense connected convolutional network and traditional manual features, which combines multi-level and multi-type features to improve the accuracy of disease classification. Li et al. [13] combined convolutional neural network (CNN) and recursive neural network (RNN) to cascade the RNN to the density and shape features of bilateral hippocampus extracted by CNN, so as to learn high-level relevant features for AD classification.
ROI methods extract very crude features and may miss some subtle brain tissue changes. Therefore, Liu et al. [14] proposed a feature extraction method based on structural blocks. The method based on structural blocks can capture the patterns related to disease in the brain by extracting features from small three-dimensional image blocks, which can not only consider the slight changes in brain tissue but also reduce the feature dimension, so as to avoid overfitting. Liu [15] proposed a hierarchical integration classification method. Firstly, different low-level classifiers were constructed by using the imaging of local brain blocks and the spatial correlation features between blocks, and then the output of low-level classifiers and the statistical features of voxels in each local block were integrated into a feature vector to construct a high-level classifier for classification. Li et al. [16] evenly divided MRI images into local regions of the same size, extracted several 3D blocks from each region, and then used K-means clustering algorithm to divide the blocks in each region into different groups for final classification. Liu [17] integrated image blocks and more prior information of subjects (age, gender, education level) into the learning model, and proposed an AD classification regression framework based on deep multi-task and multi-channel learning. Ahmed et al. [18] proposed an AD diagnosis algorithm based on block classifier integration. Firstly, bilateral hippocampal regions were extracted in the form of blocks, then different block classifier models were established based on the left, right and bilateral hippocampi respectively, and finally, the weighted majority voting method was used to complete AD diagnosis.
The section-based method extracts two-dimensional slices from a plane of MRI image as input to reduce the number of hyperparameters. Islam et al. [19], under the guidance of neuroexperts, selected the largest 50 sections of intracranial part from MRI of each subject and classified them using CNN. Luo et al. [20] extracted seven groups of sections (five in each group) from the axial plane of MRI images, and each group was classified by a classifier. Wu et al. [21]combined three axial plane slices into one RGB color image, and finally obtained 16 RGB color images from each MRI image. Gao et al. [22] selected 50 largest sagittal plane sections from MRI images, and then used CNN to extract longitudinal features to complete the classification. Jian et al. [23] calculated the image entropy of axial slices by using histograms and selected 32 slices with the most information, which were then classified by CNN.
The whole MRI image is used as the model input in the global image-based method, which can fully utilize the spatial information of the image without manual feature extraction. Korolev et al. [24], Backstrom et al. [25], Basaia et al. [26] input pre-processed whole brain MRI images into 3D convolutional neural network structure for AD classification. Wang et al. [27] proposed a probability-based fusion method for the fusion of three-dimensional dense connected networks with different structures to diagnose AD.

Feature selection and dimension reduction
Data dimensionality reduction refers to the operation of converting data points in a higher-dimensional space into a lower-dimensional space by using some kind of mapping. In order to ensure that the amount of data is not reduced, the redundant information in the original data is removed, so as to improve the accuracy of recognition. Mainstream dimensionality reduction methods include Principal Component Analysis (PCA) [28], Partial Least Squares, PLS), Singular Value Decomposition (SVD), Low Variance Filter (LVF), etc. PCA is the most commonly used data dimension reduction method, whose essence is to carry out orthogonal transformation of existing coordinates in space according to certain rules. Khedher et al. [29] carried out feature extraction and dimension reduction by PCA and PLS methods, and completed the classification task for Alzheimer's disease patients and normal subjects.

Classification
Common Machine learning classifiers include Support Vector Machine (SVM), K-Nearest Neighbors (K-NN) and Logistic Regression Classification. LRC), Random Forest (RF) and so on.
Mesrob et al. [30] proposed a method based on SVM classifier to classify AD patients and normal subjects. The subjects were divided into two experimental groups, and the features of ROI brain region (hippocampus, amygdala, etc.) and brain tissue (gray matter, white matter, etc.) were extracted respectively. The classification results of the two experimental groups were compared to find out the brain region with the highest correlation with AD. Dimitriadis et al. [31]proposed a classification method of Alzheimer's disease based on RF classifier. Features such as cortical thickness, cortical surface area, cortical curvature and hippocampal volume were extracted, and the results of each classifier were weighted and fused into the final decision using the proximity strategy. Finally, the recognition accuracy of the multiclassification model was 76%. Telagarapu et al. [32] analyzed T1-weighted MRI images of AD patients by using the texture features of the hippocampus and K-NN classifier. By using GLCM method, the classification accuracy of 74.73% was achieved.
To sum up, the research of machine learning in the field of Alzheimer's disease diagnosis has achieved fruitful results, but there are also obvious shortcomings. 1. Voxel-based methods can obtain all three-dimensional information in a single scan, but they deal with all brain regions uniformly and are not adapted to specific anatomical structures. In addition, voxel-based methods ignore local information because they deal with each voxel independently, often with high characteristic dimensions and high computational load. 2. Slice-based method uses two-dimensional slicing rather than the whole three-dimensional image as input, which can reduce a large number of parameters and simplify the network. However, slicing method requires certain prior knowledge, and only slices are selected from the plane in one direction, so the spatial information of the image cannot be fully utilized. 3. Although block-based methods can compensate for the three-dimensional information loss in slice-based methods, most of them use integrated classifier methods, and feature extraction and classification among each block are independent of each other. However, the tissue changes between brains are correlated, and block-based approaches do not integrate the features of each block well. 4. Roy-based methods reduce the number of dimensions by selecting specific areas of the brain to extract features. However, an abnormal region may represent only a small fraction of the predefined ROI region and may be distributed to other brain regions, resulting in loss of identification information. In addition, ROI-based methods also require prior knowledge of experts to divide ROI. 5. The method based on global image takes the whole three-dimensional image as the input, which can make full use of all the information of the image. However, for samples with small lesions, there is a large amount of redundant information. In addition, similar to the block-based method, the global image-based method has a high feature dimension and a high computational load.

Recognition of Alzheimer's disease based on deep learning methods
Deep learning technology has been developing in recent years. Under the background of improving the level of computer hardware and increasing the amount of training data, the recognition effect of the model has been better than the traditional machine learning algorithm.

AD classification based on CNN
Convolutional neural network is a feedforward neural network that extracts image features by convolutional kernel. It is composed of input layer, convolutional layer, pooling layer, full connection layer and output layer. The basic results are shown in Figure 2.  Wang et al. [33] used 2D sections of MRI as training samples to construct an 8-layer CNN model to achieve AD prediction. Due to the small number of medical image samples, the training effect of random initialization parameters is generally poor. Therefore, Liu et al. [34] firstly used the large-scale data set ImageNet to pre-train the twodimensional convolutional neural network (2D-CNN) model, and used the pre-trained model to train MRI images. Sarraf et al. [35] applied LeNet-5 architecture to classify fMRI data of AD and HC control subjects, and the classification accuracy reached 96.86%. In addition, Dai et al. [36]improved the existing LeNet-5 model, designed a 10-layer CNN, and conducted training and testing on MRI, PET and multi-mode fusion images respectively. The average accuracy of clinical MMSE classification results combined with Bayesian method at the network output layer reached 88.244%. Shakarami et al. [37] added a full-connection layer to the end of AlexNet to reduce the length of feature vectors, and used support vector machine (SVM) to replace the original classification output layer as the classifier. The average classification accuracy reached 96.39%. Kazemi et al. [38] adopted AlexNet model, updated weights with stochastic gradient descent solution, and used deep learning method for the first time to classify different stages of AD, namely normal healthy control (NC), significant memory concern (SMCI), early mild cognitive impairment (EMCI), late mild cognitive impairment (LMCI) and AD. The average accuracy was 97.63%. Hon et al. [39] selected information-rich slices in MRI by image entropy, adopted the pre-trained VGG network on a large-scale data set as a classification model, and realized the binary classification of AD and NC by modifying the number of fully connected output nodes of VGG. Ding et al. [40]selected 16 slices of PET image at equal intervals to form a 4×4 grid image, used Inception V3 model to train the grid image, conducted training on ADNI open data set and tested on 40 outlier independent test sets. The results showed that, AD was predicted an average of 75.8 months earlier than the final diagnosis. Yee et al. [41]used the algorithm designed by residual structure in ResNet to classify NC and AD with an accuracy of 93.5%. In addition, the accuracy of predicting the conversion of sMCI into AD within 3 years was 74.0%, and the decrease of accuracy was mainly due to the misclassification of NC and sMCI. Fulton et al. [42] classified AD and MCI in the improved 50-layer residual network, namely ResNet50, and achieved a classification accuracy of 98.99%. Wang et al. [43] adopted the DenseNet model on the basis of 3D-MRI to conduct a series of hyperparameter optimization experiments, select several optimized 3D-DenseNet classifiers, and then integrate the results of each classifier, finally proving the superiority of the integration method. In the study of literature [44], a convolutional recursive hybrid neural network combining 3D DenseNets and bi-directional gated recurrent unit (BGRU) is proposed. Hippocampal features were extracted and analyzed by sMRI images to classify and diagnose the course of AD. In the classification of AD and NC, MCI and NC, pMCI and sMCI in ADNI data set, the area under ROC curve reached 91.0%, 75.8% and 74.6%, respectively.
Since the brain image is a three-dimensional image, 2D-CNN model cannot fully extract the features of each dimension. In order to solve this problem, Liu et al. [45] proposed a classification model based on CNN and recurrent neural network (RNN) and extracted the internal features of slices through 2D-CNN. Bidirectional recursive neural network (BGRU) was used to extract the features between slices. Finally, the classification was realized by multi-layer perceptron. Li et al. [46]proposed to directly construct a threedimensional convolutional neural network (3D-CNN) model to fully extract information of various dimensions, extract features from MRI gray matter (GM) density map and PET image, and train the extracted features through sparse regression classifiers to achieve AD prediction. Hosseini-Asl et al. [47] constructed a 3D autoencoder to capture MRI anatomical shape changes, and then modified the pre-trained autoencoder into a 3D-CNN model to achieve the final classification. Huang et al. [48], referring to the idea of VGG, designed a variant model of 3DVGG for single mode. They conducted experiments based on MRI and PET images respectively, and realized multi-mode prediction through feature fusion. The results show that the best results can be obtained with full image input, which proves that the segmentation of key parts is not a prerequisite for classification based on CNN. Liu et al. [49]proposed a 3D-CNN model based on MRI to achieve three classifications, and integrated the age information of patients into the model, further improving the prediction accuracy.

AD classification based on GCN
Similar to two-dimensional convolution, GCNs also carries out sliding calculation through convolution check of input data. This kind of model regards each node in the graph structure as a point in Euclidean space, and updates the node representation by carrying out convolution operations around this point. As shown in Figure 3, the convolution kernel of traditional two-dimensional convolution is a matrix, which slides on the input data at a fixed stride and calculates the output value through convolution operation. The shape and size of convolution kernel window of GCNs based on spatial domain are dynamic. It is based on adjacency matrix of graph structure, and adaptively adjusts convolution kernel according to geometric relation between nodes.
Zhao et al. [50] firstly extracted the functional connection coefficient matrix from the brain functional connection of rs-MRI images, and each matrix was combined with the subject's gender, scanning device information and label to generate the subject vector. Then, each subject vector was taken as the vertex, and the similarity between nodes was taken as the edge. Finally, AD prediction was completed through node classification. Guo [51] takes the region of interest (ROI) as the vertex, calculates the similarity of the data in the ROI region, then converts each MRI image into a graph, and finally realizes the purpose of predicting AD through graph classification.  Parisot et al. [52] used MTFS-gLASSO to fuse the brain structure and functional network features into a vector with dimension 90. Taking this vector as the vertex, phenotypic information of subjects (gender, age, imaging equipment) was integrated into the edge for composition. Finally, the classification accuracy of 84%AD was achieved by using graph classification. The approach of AD prediction based on GCN is consistent with the approach based on ROI. However, compared with previous work using CNN to model ROI features, GCN can effectively model global features, avoid the defect that CNN cannot do global modeling, effectively reduce the learning time and improve the classification accuracy. However, using GCN to predict AD requires first converting Euclidean spatial data into graph data, which requires third-party medical image processing software to generate data and integrate it with the data processing framework.

AD classification based on RNN
RNN include loop connections in the network structure, allowing them to retain information about previous time steps when processing sequence data. At each time step, the RNN receives an input and calculates a new hidden state based on the current input and the hidden state of the previous time step. The new hidden state is then used to produce the output of the current time step and passed to the next time step to update the state. In this way, RNN can capture dependencies and dynamic characteristics in sequences. Its structure is shown in  Li et al. [53] used LSTM and an automatic encoder to generate image representations of MRI and combined this feature with hand-made features such as hippocampus volume measurements and demographic information to construct a prognostic model predicting the early stages between MCI and AD using longitudinal analysis. Srivastava et al. [54] used LSTM autoencoders to generate MRI representations of patient information and used a K-means clustering T-distributed Random Adjacency embedding algorithm (t-SNE) to classify AD. Lee et al. [55] proposed a multimodal recursive neural network (MRNN) to analyze the conversion from MCI to AD. The MRNN is trained by using population information, neuroimaging information, cognitive performance, and time series measured by cerebrospinal fluid (CFS), respectively, to input a gated recursive unit (GRUs), which ultimately concatenates the characteristics of all the data to make the final prediction. The above work can effectively predict the transformation process of MCI to AD through longitudinal analysis using RNN. However, these work are heavily dependent on the time series related to MRI measurements, and it is extremely challenging to collect complete and high-quality longitudinal MRI data for each subject. Therefore, the lack and incomplete data is an important limitation of using RNN to predict AD.
To sum up, compared with traditional machine learning algorithms, deep learning is more widely used in AD assisted diagnosis research. Deep learning technologies, such as CNN, GCN and RNN, can achieve better performance for high-level automatic feature extraction of data sets. It can not only combine manually made features with feature maps of input data, but also directly extract specific input data patterns for classification or regression tasks. However, there are still some limitations in applying deep learning to the classification or regression of AD patients, as follows: (1) Interpretability. Traditional machine learning methods require experts to participate in the preprocessing steps in order to extract and select features from images. However, deep learning does not require manual intervention and can achieve better performance compared with traditional machine learning that relies on pre-processing, which also leads to the uncertainty of deep learning and the demand for large amounts of training data. (2) generalization. Due to the different imaging equipment in different hospitals, the scanning parameters are not all the same, so the classification effect of CNN on diseases will be affected. There will be a situation that the model has a good training effect on one data set, but its effect will suddenly decline when it is used on another data set. To solve this problem, in the current research, on the one hand, the difficulty of generalization in practical application is alleviated by expanding the sources of data sets; on the other hand, it is overcome by technical means, such as transfer learning, that is, pre-training a large number of data sets and then fine-tuning a small number of data sets. Or multi-task learning (MTL) method. However, this problem has not been effectively solved and is still being explored and studied.

Summary and Prospect
Traditional machine learning methods have previously been widely used to manually extract features to predict AD. In contrast, CNN, GCN, RNN and other deep learning technologies can automatically extract useful features from complex image data in an end-to-end way, which can avoid complex manual feature extraction steps and save a lot of manpower and material resources. However, deep learning technology still has the following limitations: 1. Data dependence: Deep learning models require a large amount of annotated data for training, which may lead to high cost of data collection and annotation, and may also be limited in areas where data is scarce.
2. Computing resource requirements: Training and reasoning of deep learning models usually require a large amount of computing resources, which can lead to high computing costs and energy consumption.
3. Model interpretability: Deep learning models are often considered "black boxes" whose inner workings are difficult to explain. This can lead to limited applications in areas that require interpretability, such as healthcare and finance. 4. Generalization ability: Although deep learning models perform well on training data, they may have limited generalization ability on new, unseen data. This can lead to overfitting problems and difficulties in moving tasks across domains.