Brain Age Prediction: A Comparison between Machine Learning Models Using Brain Morphometric Data
Abstract
:1. Introduction
2. Materials and Methods
2.1. Datasets
2.2. Image Processing and Feature Extraction
2.3. Machine Learning Algorithms
2.3.1. Parametric Algorithms
Linear Models
- Least Absolute Shrinkage and Selection Operator (Lasso) Regression [2,33]: this is a linear algorithm that minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. This algorithm tends to produce some coefficients that are exactly zero.
- Ridge Regression [2,15,32]: this is a model tuning approach that is used to analyze the data that suffer from multi-collinearity. This method uses L2-norm regularization. When the issue of multi-collinearity occurs, least squares are unbiased and variance is significant. This algorithm shrinks the coefficients and it helps to reduce the model complexity and multi-collinearity.
- Elastic Net Regression [2,34]: this is a regularized linear regression model that combines both the L1 and L2 penalty functions. This algorithm performs variable selection and regularization simultaneously. This method is most appropriate where the number of features is greater than the number of samples. This allows the number of selected features to be larger than the sample size while achieving a sparse model.
- Least Angle Regression (LAR) [35]: this algorithm is similar to forward stepwise regression. It finds a variable that is most highly correlated with the target. When we have multiple variables having the same correlation, it extends in a direction that is equiangular (has the same angle) to the variables. It can compute the entire regularization path for approximately the same computational cost as a single least-squares fit.
- Lasso Least Angle Regression (Lasso LAR) [35]: this algorithm computes the Lasso path along the regularization using the Least Angle Regression algorithm. The Lasso parameters are solved using the Least Angle Regression algorithm, which yields piecewise linear solution paths as a function of the norm of its coefficients.
- Orthogonal Matching Pursuit (OMP) [36]: this algorithm starts the search by finding a column with maximum correlation with measurements at the first step, and then, at each iteration, it searches for the column with maximum correlation with current residual. The residuals after each step are orthogonal to all the selected columns. This algorithm is iteratively updated till a stopping criterion is met or the number of iterations passes a limit.
- Bayesian Ridge Regression [37]: this algorithm allows a natural mechanism to survive insufficient data or poorly distributed data by formulating linear regression using probability distributions rather than point estimates. It makes use of conjugate priors for the precision of the Gaussian and, because of that, is restricted to use gamma prior, which requires four hyperparameters chosen arbitrarily to be small values. It also requires initial values for parameters and alpha and lambda that are then updated from the data.
- Automatic Relevance Determination (ARD) [32]: this algorithm is very similar to the Bayesian Ridge Regression, but ARD makes the coefficients sparser. This is also known as sparse Bayesian learning and Relevance Vector Machine that ranks input variables in the order of their importance on predicting the output. It uses a parameterized, data-dependent prior distribution that effectively prunes away redundant or superfluous features.
- Passive Aggressive Regression (PAR) [38]: this algorithm is generally used for large-scale learning. It is one of the few online-learning algorithms. In online learning, the input data come in sequential order and the machine learning model is updated step by step, where the entire training dataset is used at once. This is suitable in situations where there is a large amount of data and it is computationally infeasible to train the entire dataset because of the sheer size of the data. If the prediction is correct, the model is kept and no changes are made (passive). If the prediction is incorrect, changes are made to the model (aggressive).
- Random Sample Consensus (RANSAC) [39]: this is an iterative method that is used to estimate parameters of a model from a set of data containing outliers. This algorithm assumes that all of the data consist of inliers and outliers. Inliers can be explained by a model with a particular set of parameter values, while outliers do not fit that model in any circumstance. This model can optimally estimate the parameters of the chosen model from the determined inliers.
- Huber Regression [40]: this is a regression method that is robust to outlier. It uses the Huber loss function rather than the least squares error. This function is identical to the least squares penalty for small residuals but, on large residuals, its penalty is lower and increases linearly rather than quadratically. It is, thus, more forgiving of outliers.
Nonlinear Model
- Multi-layer Perceptron (MLP) Regression [41]: this is an artificial neural network that has three or more layers of perceptrons. These layers are a single input layer, one or more hidden layers, and a single output layer of perceptrons. This has multiple layers of neurons with an activation function and a threshold value. Backpropagation is a technique where the multi-layer perceptron receives feedback on the error in its results and the MLP adjusts its weights accordingly to make more accurate prediction in the future.
2.3.2. Nonparametric Algorithms
Linear Models
- Relevance Vector Regression (RVR) [2,14,16,22,42]: this is a Bayesian framework for learning sparse regression models. RVR has an identical functional form to SVR, but the Bayesian formulation of the RVR avoids the set of free parameters of the SVR. The sparsity of the RVR is induced by the hyperpriors on model parameters in a Bayesian framework, with the maximum a posteriori (MAP) principle. The behavior of the RVR is controlled by the type of kernel, which has to be selected, while all other parameters are automatically estimated by the learning procedure itself.
- Theil–Sen Regression [43]: this algorithm is a nonparametric method that determines the slope of the regression line via the median of the slopes of all lines that can be drawn through the data points. Alternative to least squares for simple linear regression, it uses a generalization of the median in multiple dimensions and is, thus, robust to multivariate outliers.
Nonlinear Models
- Support Vector Regression (SVR) [14,15,16,23,44]: this is characterized by the use of kernels, sparsity, control of the margin of tolerance (epsilon, ε), and the number of support vectors. SVR supports both linear and nonlinear regression. A kernel helps us find a hyperplane in the higher dimensional space without increasing the computation cost. This algorithm constructs a hyperplane or a set of hyperplanes in a high or even infinite dimensional space. There are two lines that are drawn around the hyperplane at a distance of ε, which is used to create a margin between the data points. It identifies a symmetrical ε-insensitive region (ε-tube). We can choose any kernel, such as sigmoid kernel, polynomial kernel, and radial basis function kernel. A linear kernel was chosen for SVR.
- Gaussian Processes Regression (GPR) [8,16,45]: this is a nonparametric kernel-based probabilistic approach. GPR model can make predictions incorporating prior knowledge (kernels) and provide uncertainty measures over predictions. The Gaussian processes conduct regression by defining a distribution over an infinite number of functions.
- Kernel Ridge Regression (KRR) [32]: this algorithm combines Ridge Regression with the kernel trick. It uses squared error loss, whereas Support Vector Regression uses ε-insensitive loss, both combined with L2 regularization. A polynomial kernel was chosen for KRR.
- K-Nearest Neighbors (kNN) Regression [15,46]: this algorithm uses feature similarity to predict the values of any new data points, which means that the new point is assigned a value based on how closely it resembles the points in the training set. This method uses Euclidean distance to find the nearest neighbors to an object. The closest “k” data points are selected based on the distance. The average value of these data points is the final prediction for the new point.
Ensemble Models
- Decision Tree (DT) Regression [41]: this is a decision-making algorithm that uses a flowchart-like tree structure. This algorithm observes features of an object that train a model in the structure of a tree to predict data in the future to produce meaningful continuous output. Starting from a root node, it builds a decision tree with decision nodes and leaf nodes, which employs a top-down, greedy search through the space of possible branches with no backtracking. A decision tree is built top-down from a root node and involves partitioning the data into subsets that contain instances with similar values.
- Random Forest (RF) Regression [15,47]: this is a supervised learning algorithm that uses ensemble learning method for regression. It operates by constructing multiple decision trees during training time and determining the final output rather than relying on individual decision trees. Each tree is constructed by bootstrapping that performs row sampling and features a sample from the dataset. The final output is the mean of all the outputs (aggregation).
- Extra Trees (ET) Regression [48]: this is an ensemble machine learning algorithm that combines the predictions from many decision trees. It is similar to other methods, such as decision trees and random forests, but it uses extra information about the data to improve predictive accuracy. This method aggregates the results of multiple decorrelated decision trees collected in a forest to output. Random forests use bootstrapping that subsamples the input data with replacement, whereas extra trees use the entire original dataset. In terms of the selection of cut-points to split nodes, random forests choose the optimum split, while extra trees choose it randomly.
- Adaptive Boosting (AdaBoost) Regression [49,50]: this algorithm involves using very short (one-level) decision trees as weak learners that are added sequentially to the ensemble. This is a boosting ensemble algorithm where models are added sequentially and later models in the sequence correct the predictions made by earlier models in the sequence.
- Multi-layer Perceptron (MLP) Regression [41]: this is an artificial neural network that has three or more layers of perceptrons. These layers are a single input layer, one or more hidden layers, and a single output layer of perceptrons. This has multiple layers of neurons with an activation function and a threshold value. Backpropagation is a technique where the multi-layer perceptron receives feedback on the error in its results and the MLP adjusts its weights accordingly to make more accurate prediction in the future.
- Gradient Boosting Machine (GBM) [51]: this is an ensemble algorithm that fits boosted decision trees by minimizing an error gradient. Models are fit using any arbitrary differentiable loss function and gradient descent optimization algorithm. The general concept of gradient boosting and adaptive boosting is essentially the same: they are both ensemble models boosting (stacking) trees on top of each other based on the model mistakes. The main difference is that, in gradient boosting, each new weak learner is stacked directly on the model’s current errors rather than on a weighted version of the initial training set.
- Extreme Gradient Boosting (XGBoost) [52]: this is an optimized distributed gradient boosting algorithm designed to be highly efficient, flexible, and portable. Both XGBoost and gradient bosting algorithm are ensemble tree methods that apply the principle of boosting weak learners using the gradient descent architecture. However, XGBoost improves upon the base gradient boosting framework through systems optimization and algorithmic enhancements.
- Light Gradient Boosting Machine (LightGBM) [53]: this extends the gradient boosting algorithm by adding a type of automatic feature selection and focusing on boosting examples with large gradients. It is based on decision trees to increase the efficiency of the model and reduces memory usage using gradient-based one side sampling (GOSS) and exclusive feature bundling (EFB), which fulfills the limitations of a histogram-based algorithm.
- Category Boosting (CatBoost) Regression [54]: this algorithm is another member of the gradient boosting technique on decision trees. CatBoost provides an inventive method for processing categorical features, based on target encoding. This method, named ordered target statistics, tries to solve a common issue that arises when using such a target encoding, which is target leakage. It uses oblivious decision trees, where the same splitting criterion is used across an entire level of the tree. Such trees are balanced, less prone to overfitting, and allow speeding up prediction significantly at testing time.
2.4. Brain Age Prediction Framework
2.5. Age-Bias Correction
2.6. Comparative Evaluation of the Algorithms
2.7. Feature Importance
3. Results
3.1. Algorithm Performance for Brain Age Prediction
3.2. Comparative Performance of the Algorithms for Brain Age Prediction
3.3. Computational Speed of the Algorithms
3.4. Comparison of the BrainPAD of the Algorithms
3.5. Regional Contributions to Brain Age Prediction
4. Discussion
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Cole, J.H.; Franke, K. Predicting Age Using Neuroimaging: Innovative Brain Ageing Biomarkers. Trends Neurosci. 2017, 40, 681–690. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Lee, W.H.; Antoniades, M.; Schnack, H.G.; Kahn, R.S.; Frangou, S. Brain age prediction in schizophrenia: Does the choice of machine learning algorithm matter? Psychiatry Res. Neuroimaging 2021, 310, 111270. [Google Scholar] [CrossRef] [PubMed]
- Wrigglesworth, J.; Yaacob, N.; Ward, P.; Woods, R.L.; McNeil, J.; Storey, E.; Egan, G.; Murray, A.; Shah, R.C.; Jamadar, S.D.; et al. Brain-Predicted age difference is associated with cognitive processing in later-Life. Neurobiol. Aging 2022, 109, 195–203. [Google Scholar] [CrossRef] [PubMed]
- Anaturk, M.; Kaufmann, T.; Cole, J.H.; Suri, S.; Griffanti, L.; Zsoldos, E.; Filippini, N.; Singh-Manoux, A.; Kivimaki, M.; Westlye, L.T.; et al. Prediction of brain age and cognitive age: Quantifying brain and cognitive maintenance in aging. Hum. Brain Mapp. 2021, 42, 1626–1640. [Google Scholar] [CrossRef]
- Baecker, L.; Garcia-Dias, R.; Vieira, S.; Scarpazza, C.; Mechelli, A. Machine learning for brain age prediction: Introduction to methods and clinical applications. EBioMedicine 2021, 72, 103600. [Google Scholar] [CrossRef]
- de Lange, A.G.; Anaturk, M.; Rokicki, J.; Han, L.K.M.; Franke, K.; Alnaes, D.; Ebmeier, K.P.; Draganski, B.; Kaufmann, T.; Westlye, L.T.; et al. Mind the gap: Performance metric evaluation in brain-age prediction. Hum. Brain Mapp. 2022, 43, 3113–3129. [Google Scholar] [CrossRef]
- Gonneaud, J.; Baria, A.T.; Pichet Binette, A.; Gordon, B.A.; Chhatwal, J.P.; Cruchaga, C.; Jucker, M.; Levin, J.; Salloway, S.; Farlow, M.; et al. Accelerated functional brain aging in pre-clinical familial Alzheimer’s disease. Nat. Commun. 2021, 12, 5346. [Google Scholar] [CrossRef]
- Cole, J.H.; Ritchie, S.J.; Bastin, M.E.; Valdes Hernandez, M.C.; Munoz Maniega, S.; Royle, N.; Corley, J.; Pattie, A.; Harris, S.E.; Zhang, Q.; et al. Brain age predicts mortality. Mol. Psychiatry 2018, 23, 1385–1392. [Google Scholar] [CrossRef] [Green Version]
- Smith, S.M.; Elliott, L.T.; Alfaro-Almagro, F.; McCarthy, P.; Nichols, T.E.; Douaud, G.; Miller, K.L. Brain aging comprises many modes of structural and functional change with distinct genetic and biophysical associations. Elife 2020, 9, e52677. [Google Scholar] [CrossRef] [Green Version]
- Hogestol, E.A.; Kaufmann, T.; Nygaard, G.O.; Beyer, M.K.; Sowa, P.; Nordvik, J.E.; Kolskar, K.; Richard, G.; Andreassen, O.A.; Harbo, H.F.; et al. Cross-Sectional and Longitudinal MRI Brain Scans Reveal Accelerated Brain Aging in Multiple Sclerosis. Front. Neurol. 2019, 10, 450. [Google Scholar] [CrossRef]
- Cole, J.H. Multimodality neuroimaging brain-age in UK biobank: Relationship to biomedical, lifestyle, and cognitive factors. Neurobiol. Aging 2020, 92, 34–42. [Google Scholar] [CrossRef] [PubMed]
- Kaufmann, T.; van der Meer, D.; Doan, N.T.; Schwarz, E.; Lund, M.J.; Agartz, I.; Alnaes, D.; Barch, D.M.; Baur-Streubel, R.; Bertolino, A.; et al. Common brain disorders are associated with heritable patterns of apparent aging of the brain. Nat. Neurosci. 2019, 22, 1617–1623. [Google Scholar] [CrossRef] [PubMed]
- Cole, J.H.; Franke, K.; Cherbuin, N. Quantification of the Biological Age of the Brain Using Neuroimaging. In Biomarkers of Human Aging; Moskalev, A., Ed.; Springer International Publishing: Cham, Switzerland, 2019; pp. 293–328. [Google Scholar]
- Franke, K.; Ziegler, G.; Klöppel, S.; Gaser, C.; Initiative, A.s.D.N. Estimating the age of healthy subjects from T1-weighted MRI scans using kernel methods: Exploring the influence of various parameters. Neuroimage 2010, 50, 883–892. [Google Scholar] [CrossRef] [PubMed]
- Valizadeh, S.; Hänggi, J.; Mérillat, S.; Jäncke, L. Age prediction on the basis of brain anatomical measures. Hum. Brain Mapp. 2017, 38, 997–1008. [Google Scholar] [CrossRef]
- Baecker, L.; Dafflon, J.; Da Costa, P.F.; Garcia-Dias, R.; Vieira, S.; Scarpazza, C.; Calhoun, V.D.; Sato, J.R.; Mechelli, A.; Pinaya, W.H. Brain age prediction: A comparison between machine learning models using region-and voxel-based morphometric data. Hum. Brain Mapp. 2021, 42, 2332–2346. [Google Scholar] [CrossRef]
- van Rooij, D.; Anagnostou, E.; Arango, C.; Auzias, G.; Behrmann, M.; Busatto, G.F.; Calderoni, S.; Daly, E.; Deruelle, C.; Di Martino, A.; et al. Cortical and Subcortical Brain Morphometry Differences Between Patients With Autism Spectrum Disorder and Healthy Individuals Across the Lifespan: Results From the ENIGMA ASD Working Group. Am. J. Psychiatry 2018, 175, 359–369. [Google Scholar] [CrossRef]
- Corps, J.; Rekik, I. Morphological Brain Age Prediction using Multi-View Brain Networks Derived from Cortical Morphology in Healthy and Disordered Participants. Sci Rep.-UK 2019, 9, 9676. [Google Scholar] [CrossRef] [Green Version]
- Boedhoe, P.S.W.; van Rooij, D.; Hoogman, M.; Twisk, J.W.R.; Schmaal, L.; Abe, Y.; Alonso, P.; Ameis, S.H.; Anikin, A.; Anticevic, A.; et al. Subcortical Brain Volume, Regional Cortical Thickness, and Cortical Surface Area Across Disorders: Findings From the ENIGMA ADHD, ASD, and OCD Working Groups. Am. J. Psychiatry 2020, 177, 834–843. [Google Scholar] [CrossRef]
- Han, L.K.M.; Dinga, R.; Hahn, T.; Ching, C.R.K.; Eyler, L.T.; Aftanas, L.; Aghajani, M.; Aleman, A.; Baune, B.T.; Berger, K.; et al. Brain aging in major depressive disorder: Results from the ENIGMA major depressive disorder working group. Mol. Psychiatry 2021, 26, 5124–5139. [Google Scholar] [CrossRef]
- Seidlitz, J.; Vasa, F.; Shinn, M.; Romero-Garcia, R.; Whitaker, K.J.; Vertes, P.E.; Wagstyl, K.; Kirkpatrick Reardon, P.; Clasen, L.; Liu, S.; et al. Morphometric Similarity Networks Detect Microscale Cortical Organization and Predict Inter-Individual Cognitive Variation. Neuron 2018, 97, 231–247. [Google Scholar] [CrossRef]
- Gaser, C.; Franke, K.; Kloppel, S.; Koutsouleris, N.; Sauer, H.; Alzheimer’s Disease Neuroimaging, I. BrainAGE in Mild Cognitive Impaired Patients: Predicting the Conversion to Alzheimer’s Disease. PLoS ONE 2013, 8, e67346. [Google Scholar] [CrossRef] [Green Version]
- Liem, F.; Varoquaux, G.; Kynast, J.; Beyer, F.; Kharabian Masouleh, S.; Huntenburg, J.M.; Lampe, L.; Rahim, M.; Abraham, A.; Craddock, R.C.; et al. Predicting brain-age from multimodal imaging data captures cognitive impairment. Neuroimage 2017, 148, 179–188. [Google Scholar] [CrossRef] [PubMed]
- Van Essen, D.C.; Smith, S.M.; Barch, D.M.; Behrens, T.E.; Yacoub, E.; Ugurbil, K.; Consortium, W.-M.H. The WU-Minn human connectome project: An overview. Neuroimage 2013, 80, 62–79. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Shafto, M.A.; Tyler, L.K.; Dixon, M.; Taylor, J.R.; Rowe, J.B.; Cusack, R.; Calder, A.J.; Marslen-Wilson, W.D.; Duncan, J.; Dalgleish, T. The Cambridge Centre for Ageing and Neuroscience (Cam-CAN) study protocol: A cross-sectional, lifespan, multidisciplinary examination of healthy cognitive ageing. BMC Neurol. 2014, 14, 1–25. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Glasser, M.F.; Sotiropoulos, S.N.; Wilson, J.A.; Coalson, T.S.; Fischl, B.; Andersson, J.L.; Xu, J.; Jbabdi, S.; Webster, M.; Polimeni, J.R.; et al. The minimal preprocessing pipelines for the Human Connectome Project. Neuroimage 2013, 80, 105–124. [Google Scholar] [CrossRef] [Green Version]
- Dale, A.M.; Fischl, B.; Sereno, M.I. Cortical surface-based analysis. I. Segmentation and surface reconstruction. Neuroimage 1999, 9, 179–194. [Google Scholar] [CrossRef]
- Fischl, B.; Salat, D.H.; Busa, E.; Albert, M.; Dieterich, M.; Haselgrove, C.; van der Kouwe, A.; Killiany, R.; Kennedy, D.; Klaveness, S.; et al. Whole brain segmentation: Automated labeling of neuroanatomical structures in the human brain. Neuron 2002, 33, 341–355. [Google Scholar] [CrossRef] [Green Version]
- Desikan, R.S.; Segonne, F.; Fischl, B.; Quinn, B.T.; Dickerson, B.C.; Blacker, D.; Buckner, R.L.; Dale, A.M.; Maguire, R.P.; Hyman, B.T.; et al. An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. Neuroimage 2006, 31, 968–980. [Google Scholar] [CrossRef]
- Constantinides, C.; Han, L.K.; Alloza, C.; Antonucci, L.; Arango, C.; Ayesa-Arriola, R.; Banaj, N.; Bertolino, A.; Borgwardt, S.; Bruggemann, J. Brain ageing in schizophrenia: Evidence from 26 international cohorts via the ENIGMA Schizophrenia consortium. medRxiv 2022. [Google Scholar]
- Ali, M. PyCaret: An Open Source, Low-Code Machine Learning Library in Python. 2020. Available online: https://www.pycaret.org (accessed on 1 September 2021).
- Murphy, K.P. Machine Learning: A Probabilistic Perspective; The MIT Press: Cambridge, MA, USA, 2012. [Google Scholar]
- Tibshirani, R. Regression shrinkage and selection via the Lasso. J. Roy. Stat. Soc. B Met. 1996, 58, 267–288. [Google Scholar] [CrossRef]
- Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2005, 67, 301–320. [Google Scholar] [CrossRef]
- Efron, B.; Hastie, T.; Johnstone, I.; Tibshirani, R. Least angle regression. Ann. Stat. 2004, 32, 407–499. [Google Scholar] [CrossRef] [Green Version]
- Rubinstein, R.; Zibulevsky, M.; Elad, M. Efficient Implementation of the K-SVD Algorithm Using Batch Orthogonal Matching Pursuit; Computer Science Department, Technion: Haifa, Israel, 2008. [Google Scholar]
- Mackay, D.J.C. Bayesian Interpolation. Neural Comput 1992, 4, 415–447. [Google Scholar] [CrossRef]
- Crammer, K.; Dekel, O.; Keshet, J.; Shalev-Shwartz, S.; Singer, Y. Online passive aggressive algorithms. J. Mach. Learn. Res. 2006, 7, 551–585. [Google Scholar]
- Fischler, M.A.; Bolles, R.C. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 1981, 24, 381–395. [Google Scholar] [CrossRef]
- Owen, A.B. A robust hybrid of lasso and ridge regression. Contemp. Math. 2007, 443, 59–72. [Google Scholar]
- Hastie, T.; Tibshirani, R.; Friedman, J.H.; Friedman, J.H. The Elements of Statistical Learning: Data Mining, Inference and Prediction; Springer: Berlin/Heidelberg, Germany, 2009; Volume 2. [Google Scholar]
- Tipping, M.E. Sparse Bayesian learning and the relevance vector machine. J. Mach. Learn. Res. 2001, 1, 211–244. [Google Scholar]
- Dang, X.; Peng, H.; Wang, X.; Zhang, H. Theil-Sen Estimators in a Multiple Linear Regression Model. Olemiss Edu. 2008. Available online: http://home.olemiss.edu/~xdang/papers/MTSE.pdf (accessed on 1 September 2021).
- Drucker, H.; Burges, C.J.C.; Kaufman, L.; Smola, A.; Vapnik, V. Support vector regression machines. Adv. Neural Inf. Processing Syst. 1997, 9, 155–161. [Google Scholar]
- Rasmussen, C.; Williams, C. Gaussian Processes for Machine Learning; MIT Press: Cambridge, MA, USA, 2006. [Google Scholar]
- Altman, N.S. An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 1992, 46, 175–185. [Google Scholar]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
- Geurts, P.; Ernst, D.; Wehenkel, L. Extremely randomized trees. Mach. Learn. 2006, 63, 3–42. [Google Scholar] [CrossRef] [Green Version]
- Drucker, H. Improving regressors using boosting techniques. In Proceedings of the ICML, Nashville, TN, USA, 8–12 July 1997; pp. 107–115. [Google Scholar]
- Freund, Y.; Schapire, R.E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef] [Green Version]
- Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 Auguest 2016; pp. 785–794. [Google Scholar]
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. Lightgbm: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems 30; MIT Press: Cambridge, MA, USA, 2017. [Google Scholar]
- Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased boosting with categorical features. In Advances in Neural Information Processing Systems 31; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Liang, H.; Zhang, F.; Niu, X. Investigating Systematic Bias in Brain Age Estimation with Application to Post-Traumatic Stress Disorders; Wiley Online Library: Hoboken, NZ, USA, 2019. [Google Scholar]
- Lundberg, S.M.; Lee, S.-I. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems 30; MIT Press: Cambridge, MA, USA, 2017. [Google Scholar]
- Ball, G.; Kelly, C.E.; Beare, R.; Seal, M.L. Individual variation underlying brain age estimates in typical development. Neuroimage 2021, 235, 118036. [Google Scholar] [CrossRef]
- Bashyam, V.M.; Erus, G.; Doshi, J.; Habes, M.; Nasralah, I.; Truelove-Hill, M.; Srinivasan, D.; Mamourian, L.; Pomponio, R.; Fan, Y.; et al. MRI signatures of brain age and disease over the lifespan based on a deep brain network and 14 468 individuals worldwide. Brain 2020, 143, 2312–2324. [Google Scholar] [CrossRef]
- Schaefer, A.; Kong, R.; Gordon, E.M.; Laumann, T.O.; Zuo, X.N.; Holmes, A.J.; Eickhoff, S.B.; Yeo, B.T.T. Local-Global Parcellation of the Human Cerebral Cortex from Intrinsic Functional Connectivity MRI. Cereb. Cortex 2018, 28, 3095–3114. [Google Scholar] [CrossRef]
- Peng, H.; Gong, W.; Beckmann, C.F.; Vedaldi, A.; Smith, S.M. Accurate brain age prediction with lightweight deep neural networks. Med. Image Anal. 2021, 68, 101871. [Google Scholar] [CrossRef]
- He, S.; Pereira, D.; David Perez, J.; Gollub, R.L.; Murphy, S.N.; Prabhu, S.; Pienaar, R.; Robertson, R.L.; Ellen Grant, P.; Ou, Y. Multi-channel attention-fusion neural network for brain age estimation: Accuracy, generality, and interpretation with 16,705 healthy MRIs across lifespan. Med. Image Anal. 2021, 72, 102091. [Google Scholar] [CrossRef]
Algorithm | Model Performance | Prediction Performance | ||||
---|---|---|---|---|---|---|
r | MAE | Weighted MAE | r | MAE | Weighted MAE | |
Lasso | 0.4921 | 2.6444 | 0.1763 | 0.4258 | 2.7565 | 0.1838 |
Lasso LAR | 0.4921 | 2.6444 | 0.1763 | 0.4258 | 2.7565 | 0.1838 |
SVR | 0.4515 | 2.6981 | 0.1799 | 0.4268 | 2.7756 | 0.1850 |
LAR | 0.4723 | 2.6933 | 0.1796 | 0.4124 | 2.7896 | 0.1860 |
Elastic Net | 0.4714 | 2.6737 | 0.1782 | 0.4199 | 2.7919 | 0.1861 |
Bayesian Ridge | 0.4712 | 2.6745 | 0.1783 | 0.4182 | 2.7927 | 0.1862 |
Ridge | 0.4698 | 2.6797 | 0.1786 | 0.4255 | 2.7941 | 0.1863 |
ARD | 0.4973 | 2.6373 | 0.1758 | 0.3991 | 2.8251 | 0.1883 |
Random Forest | 0.4245 | 2.7785 | 0.1852 | 0.4131 | 2.8304 | 0.1887 |
PAR | 0.4563 | 2.7231 | 0.1815 | 0.4010 | 2.8322 | 0.1888 |
CatBoost | 0.4282 | 2.7631 | 0.1842 | 0.4069 | 2.8328 | 0.1889 |
RVR | 0.4498 | 2.7148 | 0.1810 | 0.4021 | 2.8371 | 0.1891 |
LightGBM | 0.4273 | 2.7457 | 0.1830 | 0.4016 | 2.8418 | 0.1895 |
GBM | 0.4458 | 2.7149 | 0.1810 | 0.4000 | 2.8437 | 0.1896 |
kNN | 0.3768 | 2.8367 | 0.1891 | 0.3801 | 2.8591 | 0.1906 |
AdaBoost | 0.3982 | 2.8003 | 0.1867 | 0.4188 | 2.8595 | 0.1906 |
Extra Trees | 0.4224 | 2.7738 | 0.1849 | 0.4197 | 2.8674 | 0.1912 |
XGBoost | 0.4201 | 2.7726 | 0.1848 | 0.3859 | 2.8771 | 0.1918 |
Kernel Ridge | 0.4417 | 2.7495 | 0.1833 | 0.3878 | 2.8775 | 0.1918 |
GPR | 0.4735 | 2.7199 | 0.1813 | 0.3689 | 2.9420 | 0.1961 |
MLP | 0.4744 | 2.7216 | 0.1814 | 0.3675 | 2.9450 | 0.1963 |
OMP | 0.4790 | 2.6927 | 0.1795 | 0.3590 | 2.9457 | 0.1964 |
LR | 0.4736 | 2.7244 | 0.1816 | 0.3679 | 2.9474 | 0.1965 |
Huber | 0.4705 | 2.7366 | 0.1824 | 0.3674 | 2.9484 | 0.1966 |
Theil–Sen | 0.4663 | 2.7544 | 0.1836 | 0.3398 | 2.9724 | 0.1982 |
RANSAC | 0.4553 | 2.8094 | 0.1873 | 0.3627 | 3.0015 | 0.2001 |
Decision Tree | 0.1694 | 3.0653 | 0.2044 | 0.1122 | 3.1206 | 0.2080 |
Algorithm | Model Performance | Prediction Performance | ||||
---|---|---|---|---|---|---|
r | MAE | Weighted MAE | r | MAE | Weighted MAE | |
Lasso LAR | 0.8952 | 6.6767 | 0.0954 | 0.8589 | 7.0830 | 0.1012 |
ARD | 0.8992 | 6.5372 | 0.0934 | 0.8585 | 7.1040 | 0.1015 |
Lasso | 0.8943 | 6.6898 | 0.0956 | 0.8567 | 7.1757 | 0.1025 |
Elastic Net | 0.8960 | 6.6632 | 0.0952 | 0.8548 | 7.1816 | 0.1026 |
Huber | 0.8938 | 6.7060 | 0.0958 | 0.8455 | 7.4663 | 0.1067 |
Bayesian Ridge | 0.8927 | 6.7691 | 0.0967 | 0.8445 | 7.4698 | 0.1067 |
RVR | 0.8824 | 6.9355 | 0.0991 | 0.8378 | 7.5311 | 0.1076 |
PAR | 0.8877 | 6.9834 | 0.0998 | 0.8395 | 7.5762 | 0.1082 |
Ridge | 0.8906 | 6.8230 | 0.0975 | 0.8432 | 7.5865 | 0.1084 |
OMP | 0.8827 | 7.0357 | 0.1005 | 0.8437 | 7.6179 | 0.1088 |
GPR | 0.8839 | 7.0175 | 0.1003 | 0.8377 | 7.7190 | 0.1103 |
LR | 0.8826 | 7.0582 | 0.1008 | 0.8366 | 7.7432 | 0.1106 |
MLP | 0.8831 | 7.0570 | 0.1008 | 0.8364 | 7.7450 | 0.1106 |
SVR | 0.8887 | 6.8523 | 0.0979 | 0.8309 | 7.7551 | 0.1108 |
RANSAC | 0.8789 | 7.2202 | 0.1031 | 0.8282 | 7.8652 | 0.1124 |
Theil–Sen | 0.8791 | 7.1771 | 0.1025 | 0.8366 | 7.8698 | 0.1124 |
GBM | 0.8681 | 7.3435 | 0.1049 | 0.8368 | 7.9222 | 0.1132 |
CatBoost | 0.8667 | 7.3767 | 0.1054 | 0.8230 | 8.1285 | 0.1161 |
XGBoost | 0.8552 | 7.5686 | 0.1081 | 0.8167 | 8.3920 | 0.1199 |
LightGBM | 0.8646 | 7.1822 | 0.1026 | 0.8040 | 8.4686 | 0.1210 |
Kernel Ridge | 0.876 | 7.2091 | 0.1030 | 0.7022 | 8.6938 | 0.1242 |
Extra Trees | 0.8565 | 7.7800 | 0.1111 | 0.8050 | 8.8377 | 0.1263 |
Random Forest | 0.8410 | 8.0043 | 0.1143 | 0.7955 | 8.9883 | 0.1284 |
AdaBoost | 0.8405 | 8.0458 | 0.1149 | 0.7725 | 9.4055 | 0.1344 |
LAR | 0.8378 | 8.3740 | 0.1196 | 0.7577 | 9.5307 | 0.1362 |
kNN | 0.8234 | 8.7403 | 0.1249 | 0.7709 | 9.6734 | 0.1382 |
Decision Tree | 0.7259 | 9.7473 | 0.1392 | 0.6430 | 10.5017 | 0.1500 |
Algorithm | Model Performance | Prediction Performance | ||||
---|---|---|---|---|---|---|
r | MAE | Weighted MAE | r | MAE | Weighted MAE | |
ARD | 0.8268 | 7.4790 | 0.1133 | 0.7998 | 8.0453 | 0.1219 |
Lasso LAR | 0.8290 | 7.4126 | 0.1123 | 0.7981 | 8.0473 | 0.1219 |
Lasso | 0.8290 | 7.4129 | 0.1123 | 0.7981 | 8.0477 | 0.1219 |
MLP | 0.7939 | 8.1039 | 0.1228 | 0.7779 | 8.0675 | 0.1222 |
PAR | 0.8171 | 7.8135 | 0.1184 | 0.7902 | 8.2368 | 0.1248 |
XGBoost | 0.8160 | 7.7096 | 0.1168 | 0.7918 | 8.2664 | 0.1252 |
Bayesian Ridge | 0.8308 | 7.4376 | 0.1127 | 0.7945 | 8.2785 | 0.1254 |
GBM | 0.8161 | 7.5873 | 0.1150 | 0.7818 | 8.3159 | 0.1260 |
Elastic Net | 0.8343 | 7.3865 | 0.1119 | 0.7947 | 8.3217 | 0.1261 |
SVR | 0.8303 | 7.5350 | 0.1142 | 0.7904 | 8.3845 | 0.1270 |
Ridge | 0.8329 | 7.4285 | 0.1126 | 0.7934 | 8.3912 | 0.1271 |
GPR | 0.7866 | 8.4452 | 0.1280 | 0.7719 | 8.3925 | 0.1272 |
LAR | 0.8132 | 7.7176 | 0.1169 | 0.7837 | 8.4347 | 0.1278 |
LR | 0.7832 | 8.5274 | 0.1292 | 0.7692 | 8.4450 | 0.1280 |
Huber | 0.7966 | 8.1947 | 0.1242 | 0.7704 | 8.5157 | 0.1290 |
CatBoost | 0.8299 | 7.6574 | 0.1160 | 0.7918 | 8.6085 | 0.1304 |
Theil–Sen | 0.7862 | 8.4097 | 0.1274 | 0.7534 | 8.6277 | 0.1307 |
RVR | 0.8322 | 7.4849 | 0.1134 | 0.7766 | 8.6291 | 0.1307 |
OMP | 0.8029 | 7.9480 | 0.1204 | 0.7603 | 8.8267 | 0.1337 |
LightGBM | 0.8196 | 7.6084 | 0.1153 | 0.7475 | 8.8588 | 0.1342 |
Extra Trees | 0.8257 | 7.7683 | 0.1177 | 0.7876 | 8.9449 | 0.1355 |
Random Forest | 0.8118 | 7.9223 | 0.1200 | 0.7679 | 8.9912 | 0.1362 |
Kernel Ridge | 0.8316 | 7.5138 | 0.1138 | 0.7230 | 9.0415 | 0.1370 |
RANSAC | 0.7772 | 8.655 | 0.1311 | 0.7384 | 9.1059 | 0.1380 |
AdaBoost | 0.8211 | 7.7603 | 0.1176 | 0.7402 | 9.2366 | 0.1399 |
kNN | 0.7769 | 8.3113 | 0.1259 | 0.7027 | 9.2521 | 0.1402 |
Decision Tree | 0.7066 | 9.3118 | 0.1411 | 0.6315 | 9.8640 | 0.1495 |
Algorithm | Training Time (s) | Average (SD) Training Time (s) | ||
---|---|---|---|---|
HCP (n = 223) | Cam-CAN (n = 101) | IXI (n = 114) | ||
Automatic Relevance Determination | 2.22 | 1.78 | 2.28 | 2.09 (0.27) |
Bayesian Ridge Regression | 0.79 | 0.77 | 0.75 | 0.77 (0.02) |
Elastic Net Regression | 1.09 | 0.34 | 0.15 | 0.53 (0.50) |
Huber Regression | 0.50 | 0.25 | 0.33 | 0.36 (0.13) |
Least Angle Regression | 0.18 | 0.07 | 0.13 | 0.13 (0.04) |
Lasso Regression | 0.55 | 0.34 | 0.22 | 0.37 (0.17) |
Lasso Least Angle Regression | 0.17 | 0.24 | 0.16 | 0.19 (0.04) |
Linear Regression | 0.61 | 0.58 | 0.55 | 0.58 (0.03) |
Orthogonal Matching Pursuit | 0.08 | 0.08 | 0.06 | 0.07 (0.01) |
Passive Aggressive Regression | 0.19 | 0.18 | 0.15 | 0.17 (0.02) |
Random Sample Consensus | 1.11 | 1.10 | 1.07 | 1.09 (0.02) |
Ridge Regression | 0.07 | 0.06 | 0.06 | 0.06 (0.01) |
Relevance Vector Regression | 4.87 | 3.56 | 2.25 | 3.56 (1.31) |
Support Vector Regression | 0.71 | 0.29 | 0.21 | 0.40 (0.27) |
Theil-Sen Regression | 58.41 | 58.84 | 57.67 | 58.31 (0.59) |
Adaptive Boosting Regression | 29.57 | 7.95 | 14.28 | 17.27 (11.12) |
Category Boosting Regression | 45.67 | 44.58 | 47.67 | 45.97 (1.28) |
Decision Tree Regression | 0.07 | 0.16 | 1.31 | 0.51 (0.69) |
Extra Trees Regression | 7.19 | 9.07 | 9.16 | 8.47 (1.11) |
Gradient Boosting Machine | 4.42 | 2.58 | 8.50 | 5.17 (3.03) |
Light Gradient Boosting Machine | 0.73 | 0.62 | 0.68 | 0.68 (0.06) |
Random Forest Regression | 5.49 | 7.43 | 9.18 | 7.37 (1.85) |
Extreme Gradient Boosting | 3.59 | 1.09 | 4.70 | 3.12 (1.51) |
Gaussian Process Regression | 0.82 | 0.31 | 0.26 | 0.46 (0.31) |
K-Nearest Neighbors Regression | 0.49 | 0.31 | 0.32 | 0.37 (0.10) |
Kernel Ridge Regression | 0.23 | 0.11 | 0.10 | 0.15 (0.07) |
Multi-layer Perceptron Regression | 3.91 | 4.94 | 6.08 | 4.98 (1.09) |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Han, J.; Kim, S.Y.; Lee, J.; Lee, W.H. Brain Age Prediction: A Comparison between Machine Learning Models Using Brain Morphometric Data. Sensors 2022, 22, 8077. https://doi.org/10.3390/s22208077
Han J, Kim SY, Lee J, Lee WH. Brain Age Prediction: A Comparison between Machine Learning Models Using Brain Morphometric Data. Sensors. 2022; 22(20):8077. https://doi.org/10.3390/s22208077
Chicago/Turabian StyleHan, Juhyuk, Seo Yeong Kim, Junhyeok Lee, and Won Hee Lee. 2022. "Brain Age Prediction: A Comparison between Machine Learning Models Using Brain Morphometric Data" Sensors 22, no. 20: 8077. https://doi.org/10.3390/s22208077
APA StyleHan, J., Kim, S. Y., Lee, J., & Lee, W. H. (2022). Brain Age Prediction: A Comparison between Machine Learning Models Using Brain Morphometric Data. Sensors, 22(20), 8077. https://doi.org/10.3390/s22208077