1. Introduction
Alzheimer’s disease is a pressing global issue, with an alarming prediction of 131.5 billion individuals worldwide projected to be affected by the year 2050 [
1]. This condition instigates small strokes in the brain, leading to gradual cell deterioration and nerve complications [
2]. While its primary causes are linked to factors such as age, lifestyle choices, and health-related parameter variations like blood pressure and diabetes, detecting the disease early poses a significant challenge [
3], and accurate diagnosis remains elusive [
4]. The brain, responsible for processing information, retaining memories, solving intricate puzzles, and facilitating communication with the other organs, is profoundly impacted by Alzheimer’s. In the realm of treatment, acetylcholinesterase inhibitors like donepezil are prescribed for mild to severe dementia, while memantine is recommended for moderate to severe cases. The diagnostic journey encompasses steps such as detection, evaluation, diagnosis, and treatment, marking initial assessments crucial for patient care [
5]. A robust health detection model holds a key to identifying internal health issues at an early stage, thus enhancing the process for timely intervention. Research indicates that the entorhinal cortex is among the primary areas affected in the early stages of Alzheimer’s progression. Data from the World Health Organization [
6] in 2023–2024 showcases country-specific male and female mortality rates per 100,000 related to Alzheimer’s disease, offering insights into the differential impact across populations.
Figure 1 visually presents a comparative analysis of male and female mortality rates in the form of a bar graph, shedding light on the gender-specific implications of this pervasive disease.
In countries with a large population, like the United States of America [
2], there are currently 4.5 million individuals affected by Alzheimer’s disease, a number estimated to increase to 14 million by 2050. One study [
1] on dementia suggested an asymptomatic phase lasting 6–10 weeks after the onset, indicating the presence of preclinical Alzheimer’s disease before functional impairment is evident. A follow-up clinical study [
7], spanning four years, showed that 29.1% of patients in a preclinical stage progressed to MCI. A notable case study involved a 63-year-old Caucasian patient diagnosed with Alzheimer’s disease, characterized by cognitive decline, reduced arithmetic capabilities, and clinical evidence of decreased t-tau and p-tau tangles. Diagnostic imaging techniques such as MRI and PET scans are crucial for detecting structural and molecular changes in the brain [
8]. MRI assessments can identify alterations in grey and white matter composition in the brain using the RBF-SVM method [
9], calculating volumes of the brain regions with the ADNI dataset, and employing multi-atlas propagation for refinement segmentation. Machine learning methods, including radial-based classifiers, neural networks in deep learning frameworks, and decision tree classifiers, have been used to predict disease progression, including the splits into groups such as Alzheimer’s disease vs. health control and early mild cognitive [
10] impairment vs. late-mild cognitive impairment has an accuracy of 89% for each dataset, which is evaluated using cross-validation, n-fold stratification, and k-fold stratification in the classification techniques. In [
11], the model of a neural network based on a deep learning framework predicted modeling choices using three classifiers. The models used are linear single-year models, MCI-to-AD prediction problems, and non-linear single-year models, progressing from CN to MCI.
In [
12], the authors compared the decision tree classifier for repeated data splitting based on cut-off values, random forest, support vector machine, hyperplane separating two categories of variables, and gradient boosting, and XG Boosting was used to maximize speed and efficiency. Voting classifiers, chosen with the majority of votes, combine different datasets and algorithms to predict outcomes accurately and efficiently. The OASIS dataset was used for brain disorder diagnosis, involving 150 right-handed patients aged 60–96 years with attributes of gender, age and clinical dementia rating to find out if the person was classified as having dementia or not having dementia. Cross-validation calculates accuracy with an iteration of (n − 1) analyzed ‘n’ times. This research [
13] provided insights into medial temporal lobe atrophy, a neurodegenerative sign of Alzheimer’s, using coronal MRI slices covering the temporal lobe with a CNN approach to classify AD and controls using 2D images as input data. The authors in [
14] differentiated the differences between types of dementia like vascular, Lewy body, frontotemporal, and mixed dementia and concluded that Random Forest Grid Search Cross Validation was better compared to other algorithms. Moreover, in [
15], feature selection methods like Bestfirst and CfssubsetEval, along with algorithms such as Naïve Bayes, logistic regression, SMO/SMV, and random forest techniques, were crucial for predicting cognitive impairment algorithms. Advanced model evaluation measures were utilized to assess the performance of classification models, ensuring accurate predictions regarding cognitive health and disease progression. The authors [
16] examined stage–wise data classification and collection, data processing and feature extraction, and segmentation performed through SPM12 software, as well as data acquisition achieved through an MR scanner.
In recent years, academicians and researchers have shown interest in computer-assisted learning methodologies to analyze and predict disease using medical data. The surveyed algorithms apply traditional analysis techniques like deep learning, XGBoost, support vector machine, decision tree, logistic regression, KNN, random forest, Transfer Learning, Voting-Classifier, Laser-Induced Breakdown Spectroscopy, and Multi-Layer Perceptron. Amyloid and tau-tangle image processing techniques [
1] are not readily available to process and classify diseases. The algorithm’s performance [
2], including its precision, recall, and F1 score, were evaluated on males and females using KNN, logistic regression, SVM, and AdaBoost but were not efficient. The study lacked [
10] an interpretation of well-defined methods to interpret deep-learning models used for clinical decision-making. The accuracy decreased with increasing time horizons for CN subjects [
11], and molecular biomarkers were ineffective for CN compared to MCI. The OASIS dataset and specified ML algorithm [
12] are limited algorithms like random forest and decision tree, and their applicability to later stages of the disease was not explored. The study limited itself only to the [
17] textural radiomics features extracted from grey matter probability volumes in subcortical regions, limiting the features of the brain. The authors of [
18] explored the performance analysis of the deep neural network model on other neurodegenerative diseases or conditions beyond AD. The performance of the models used with the CNN [
13] dataset validation was slightly lower than within-dataset validation. The [
19] RNN model requires significant computational resources due to the complexity of deep neural networks, considered a limitation for real-time applications. The interpretability of results from deep neural networks can be challenging compared to traditional statistical models [
20]. Further, the quality of EEG data [
21], if noisy or corrupted, affect the performance of the model, and a limited dataset may give incorrect results. MLP was used [
22] for the classification of patients, but further validation with larger and more diverse cohorts is necessary to confirm the robustness and reliability of the classification method.
The limitations of machine learning algorithms are data dependency, complex algorithms, and overfitting [
23]. To overcome them, Visual Geometry Group-16, Visual Geometry Group-19, and Alex Net transfer learning approaches are used as they are simple, unified, and have pre-trained models with greater depth. The proposed transfer learning approach model with neural network approach algorithms with modalities of VGG16, VGG19, and AlexNet are used for building the model by choosing inputs and evaluating them on healthy datasets from the Behavioral Risk Surveillance Systems [
18] to classify and predict Alzheimer’s disease and enhance diagnostic accuracy by providing a more comprehensive view of an individual’s health. The proposed algorithms, by and large, are used for image data, but to convert them to numerical data, several steps like data reshaping, adaptation of the input layer, data processing by the convolutional layers, and training are involved. VGG16 reshapes data and feeds it into the convolutional layer, and the model is trained from scratch or fine-tuned as it is exclusively used for numerical data. The convolutional layers’ output is fed into the fully connected layers. VGG19 has 19 deeper layers, and thereby, capturing the complex features and hierarchical representations is easier. VGG16 and VGG19 use 3 × 3 convolutional and max pooling layers, ensuring that they capture broader and better features. Reshaped data are fed into the VGG19 architecture, and they need to be modified to accept the specific reshaped dimensions. The convolutional layer helps with data processing and applies filters to the reshaped feature arrangement. Extensive hyperparameter tuning and regularization can be used to reduce overfitting.
The detailed analysis of the existing relevant work, along with the models and demerits, is specified in
Table 1. The existing exhaustive literature survey was based on neural networks and machine learning algorithms, as they provide better accuracy in comparison to the present traditional approaches. The training data were 80% training.
The main contributions of the proposed work are summarized below:
The chosen healthy aging dataset was pre-processed; the Transfer Learning Approach model was used; and VGG16, VGG19, and Alex Net algorithms were applied that aim to classify Alzheimer’s disease (AD) and employ 10-fold cross-validation.
A reliable and efficient model was trained, tested, evaluated, and compared with the existing neural network and machine learning models.
Data were validated, and performance evaluation was described in a confusion matrix. The metrics evaluated were accuracy, error rate, precision, recall, F1 score, sensitivity, specificity, Kappa statistics, ROC, and RMSE. These were contrasted with existing algorithms like K-Nearest Neighbors, logistic regression, decision tree classifiers, random forest classifier, XGBoost, support vector machine, AdaBoost, voting classifiers, Laser-Induced Breakdown Spectroscopy, Bidirectional LSTM, Naïve Bayes with feature selection, Gaussian NB, CAE, and CHAID.
The rest of the paper is organized as follows:
Section 2 gives insights into the methodology proposed, and
Section 3 discusses the results, and lists the metric comparisons in tabular form. To predict and classify Alzheimer’s disease, neural network approaches are discussed.
Section 4 concludes the paper, followed by the references.
3. Results and Discussion
Data visualization was observed in the correlation matrix. It used a heatmap to correlate the variables. The proposed VGG16, VGG19, and Alex Net models have been tested on Kaggle datasets taken from Alzheimer’s disease (AD) and Healthy Aging Data in the US [
28]. The dataset after pre-processing had 31,676 instances, and the model was built using Python 3.7.0. For experimentation, software specifications included 8 GB of RAM, a 64-bit operating system, an i5 Intel Core Processor, and Windows, version 11.
Pre-clinical Stage 1 is a starting stage, and Stage 6 is severe AD dementia. The need is to promote people by increasing the level of their sedentary life by either being physically active, allowing them to take part in social activities, having a healthy and balanced lifestyle and diet, or keeping their mind and body active. Diet and exercise for early symptomatic AD aim to maintain cognitive function through the intake of green leafy vegetables, proteins, and nuts; physical exercise. Cholinesterase inhibitors are used. Donepezil is used for Alzheimer’s treatment and can be detected through scanning techniques like CT (computer tomography) and PET (positron emission tomography) [
29]. Predicted labels help classify and evaluate the model’s correctness and accuracy. Therefore, the model‘s performance was predicted through accuracy, precision, F1 score, and recall [
30]. To analyze the key indicator metrics of models, we used a confusion matrix, which gave the correctness and incorrectness of the model. The true positives and negatives gave correctly predicted values, while the false positives and negatives gave wrongly predicted values. The generated training and testing confusion matrix over AD and healthy aging data in the US is visualized in
Figure 9 for VGG16 and
Figure 10 for VGG19, and
Figure 11 displays the Alex Net confusion matrix. On training, the model accuracy was 99.9%, and on testing, the accuracy was 99.89%.
Accuracy was measured as true positive (TP) + true negative (TN) to the total instances (true positives and negatives (TP and TN) + false positives and false negatives (FP +FN)). Accuracy predicts the exact correct outcomes for a total number of outcomes. However, accuracy alone cannot evaluate the model; therefore, we used other parameters, such as precision, F1 score and sensitivity, specificity, and recall. Results are tabulated in
Table 5 and graphically shown in
Figure 12. Equation (3) gives the parameter accuracy of how often the model makes correct predictions across the classes in the dataset. Error rate gives the model’s degree of prediction of error with respect to the true model, and Equation (4) gives the parameter error rate formulas.
The proposed models, VGG16, VGG19, and Alex Net, outperformed when compared with existing neural network approaches, for which the accuracy was 98.20% for VGG19 and 100% for VGG16 and Alex Net, and the error rate was very minimal for the neural network approaches.
Results are tabulated in
Table 6, and a graphical sketch of accuracy and error rate is shown in
Figure 13 for a neural network approach. VGG16 and Alex Net had 99.9%, and VGG19 had 98.8%, outperforming the existing neural network approach with the graph in
Figure 14.
Figure 15 indicates the training and testing data for the proposed VGG16 model for accuracy, and for each epoch, a complete pass through the entire training algorithm was required to determine when the parameters of the model needed to be updated and train it for multiple epochs, and its values are tabulated in
Table 7.
The performance measures of VGG19 and Alex Net are tabulated in
Table 8 and
Table 9, with graphical approaches in
Figure 15,
Figure 16,
Figure 17 and
Figure 18. After each epoch, the model’s performance was evaluated on a separate dataset called the validation dataset, which was used to detect underfitting or overfitting. The number of additional epochs may be decided based on performance and convergence area criteria.
The precision values of different algorithms are observed in
Table 10. If the precision value is high, it means that the model’s positive predictions are correct, while a low value suggests that its false positive predictions are high. Formulae to calculate precision are given as true positive (TP) to the total true and false positive rates of 1, stated as a good classifier.
Figure 19 represents the precision parameter in the graphical sketch proposed models: 99.9% for VGG16, 96.6% for VGG19, and Alex Net of 100%, in comparison with existing neural network and ML approaches for precision. Equation (5) specifies the precision parameter, indicating the model’s ability to avoid false-positive predictions.
Recall values are given in
Table 11. If the value is 1 (high), it is called a true positive rate or recall, used for effective binary classification models.
Figure 20 shows the graphical aspect of recall, while a low recall value suggests that the model has significant missing instances. The recall value must be high for scenarios like banking and medical diagnosis. The proposed VGG16, VGG19, and Alex Net models had 99.9%, outperforming the existing neural network and ML approaches for recall. Equation (6) gives the metric recall, indicating the correct instances classified by the model.
The F1 score specifies the performance metrics of the classification model, where precision and recall are the combinations of the F1 score in
Table 12, which can be calculated as the harmonic mean of precision and recall, taking into account the false positives and negatives, A high F1 score indicates good precision and recall, while a low F1 score indicates a lack of the aspects. A line graph is shown in
Figure 21. The proposed models for VGG16 values were 99.9%, 98.2% for VGG19, and 98.8% for Alex Net, which outperformed compared to the existing neural network and ML approaches for the F1 score parameter. Equation (7) gives the F1 score, which combines precision and recall into a single metric, providing a balance between their model performances.
In
Table 13, the comparison of all the sensitivity parameters and their percentages is indicated. Sensitivity interprets how well the machine learning detects any positive instances, as indicated in
Figure 22. The proposed models for VGG16 had a value of 96.8% for VGG16, 97.9% for VGG19, and 98.9% for Alex Net, outperforming existing neural networks and ML approaches for the sensitivity metric. Equation (8) defines the sensitivity that gives the model the ability to identify positive instances within a dataset.
Table 14 indicates the specificity score, which was formulated as the negative of the true negative to the total true and false predictions in the dataset, and its corresponding graphical analysis is shown in
Figure 23. The true negative or negative instance was identified by the model or a true-negative rate. The sum of true and false negatives must be equal to 1. The proposed models for VGG16 had 96.5% performance for VGG16, 97.7% for VGG19, and 98.8% for Alex Net, outperforming in comparison to the existing neural network and ML approaches for specificity scores. Equation (9) complements sensitivity and identifies the negative instances within a dataset.
For accessing better performance, Cohen Kappa was utilized. Cohen Kappa is used for analyzing reliability, which demonstrates that the degree of data representation is appropriately illustrated and evaluated. Cohen is executed based on the equation, where P
r (a) is the observed agreement and P
r (e) is the chance agreement, which always lies between 0 and 1, as given in Equation (10).
Agreements were classified based on values, such as 0.10–0.20 as minor, 0.21–0.40 as fair, 0.41–0.60 as moderate, 0.61–0.80 as substantial, and 0.81–1.00 as nearly perfect agreements [
22]. The comparison of Kappa statistics values is classified in
Table 15, and the corresponding graphical value is seen in
Figure 24.
The proposed models for VGG16 had 0.96, 0.95 for VGG19, and 0.96 Alex Net outperformed when compared with the existing neural network and ML approaches for Cohen Kappa statistics. The Kappa statistics for various algorithms were compared with Naïve Bayes, Logistic Regression, SVM, C4.5, and CHAID from surveys [
15,
31] and are tabulated.
Receiver operating characteristic is abbreviated as ROC, used for the performance measure of classification modeled according to the tabulated values in
Table 16 and using the graphical analysis in
Figure 25. Between the true- (sensitivity) and false-positive rate (1-specificity) at numerous threshold settings, there was a trade-off. Under the ROC (area under curve) curve, if more area was covered, its model performance was better. The curve of the receiver operating characteristics was plotted by plotting true- and false-positive rates at diverse threshold levels. ROC for Alzheimer’s classification was compared using different algorithms shown in tabular form and compared with algorithms of Naïve Bayes, Logistic Regression, SVM, random forest, and MLP [
11]. Equations (11) and (12) give the actual positive cases and actual negative cases correctly identified by the classifier.
RMSE is the root mean square error used to measure accuracy metrics for a given regression model. The mean square error rate for Alzheimer’s disease classification was compared using different algorithms shown in tabular form [
1] and compared with algorithms of Naïve Bayes, Logistic Regression, SVM, random forest, and MLP. The square root of an average of squared differences between the predicted and actual values, shown below in
Table 17 and its graphical display in
Figure 26, which only has a reduced value of 1.30%, has outperformed the existing approaches. The lower the RSME was, the better the regression model was in terms of accuracy.
4. Conclusions
Alzheimer’s is a progressive, neurodegenerative disease that primarily affects cognitive, memory, and behavioral functions, of which 60–80% are affected throughout the world, and it is important to reduce the risk by diagnosing the disease early. Better quality of life, prevention, or delay of the onset helps to preserve cognitive health, enhancing the well-being of society. The study for Alzheimer’s disease diagnosis focuses on the proposed transfer learning approaches, which were the VGG16, VGG19, and Alex Net models. The models were trained, and the data were split into different training and testing darasets from the dataset of Alzheimer’s disease (AD) using Healthy Aging Data in the US.
The three proposed methods of transfer learning approach were VGG16, VGG19, and AlexNet. VGG16 and VGG19 are well-known methods for depth and small convolutional filters that allow them to learn intricate patterns in images. They are highly accurate but need a lot of memory, Alex Net is faster and less resource-intensive in trade-off with accuracy.
The evaluation and comparison of the three proposed methods, VGG16, VGG19, and Alex Net, were outperformed in terms of accuracy: 100% for VGG16, 100% for VGG19, and 98.20% for Alex Net; the precision was 99.9% for VGG16, 96.6% for VGG16, and 100% for Alex Net; recall values were 99.9% for VGG16, VGG19, and Alex Net; the F1 score was 99.9% for VGG16, 98.2% for VGG19, and 99.9% for Alex Net; the sensitivity was 96.8% for VGG16, 97.9% for VGG19, and 98.9% for Alex Net; the specificity was 96.5% for VGG16, 97.7% for VGG19, and 98.8% for Alex Net; Kappa statistics were 0.96 for VGG16, 0.95 for VGG19, and 0.96 for Alex Net; the RMSE was very good and nil for VGG16 and Alex Net and very negligible for VGG19, which were outperformed in the experimented metrics when compared with the existing approaches.
VGG16 was computationally intensive but had a slow interference time due to its depth, VGG19 had higher memory usage but had a slower interference time due to its increased depth. AlexNet had simpler and faster-to-train architecture but compromised on accuracy.
The proposed study has a beneficial outcome, and continued research efforts are required to identify and assess their impact on onset using various approaches. Efforts to address Alzheimer’s disease, which is a growing concern, have significance in research and development to understand the cause, its treatment, and finding its potential cure. Public health initiatives include raising awareness, improving early diagnosis, and providing support for patients and caregivers. The introduction of new screening and diagnostic tools could ultimately help lower the burden on specialists and ensure patients are diagnosed in a timely manner.
A few limitations of the proposed model using the algorithms: VGG16, VGG19, and AlexNet are complicated for numerical data as it has large parameters which may increase the computational cost and longer training times. Numerical data were proposed, but they lack spatial relationships; therefore, they may become less effective. Filters were not used and are also prone to overfit if the dataset is not large. The algorithms may lack feature importance often required for numerical data, so this may be interpreted as a limitation.