Review on Machine Learning Techniques for Developing Pavement Performance Prediction Models
Abstract
:1. Introduction
2. Pavement Performance Prediction Models (PPPMs)
- Type of formulation (deterministic models, probabilistic models);
- Conceptual format (mechanistic, empirical, empirical–mechanistic);
- Application level (network level, project level); and
- Type of variables (dependent and independent).
- Static models (or absolute models); and
- Dynamic models (or relative models).
- Ct = pavement condition at age t; and
- Xt = explanatory variables (e.g., structural characteristics, climatic conditions, traffic) at age t.
- Ct = pavement condition at age t;
- Xt = explanatory variables value at age t; and
- n = number of past observations considered.
3. Machine Learning Modeling Techniques for Developing PPPMs
- Supervised learning—can be used for project-level or network-level pavement management;
- Unsupervised learning—can be used for exploratory and clustering analysis; and
- Reinforcement learning—can be used to help decision-makers for both project- and network-level pavement management.
3.1. Supervised Learning
- Linear models;
- Nonlinear models;
- Decision trees (boosted and bagged);
- Neural networks; and
- Adaptive neuro-fuzzy learning.
- Logistic regression;
- Support vector machine;
- Decision trees (boosted and bagged);
- K-nearest neighbor;
- Neural networks;
- Naïve Bayes; and
- Discriminant analysis.
3.2. Unsupervised Learning
- Gaussian mixture models;
- K-means and k-medoids;
- Hierarchical clustering;
- Hidden Markov models; and
- Fuzzy c-means.
3.3. Reinforcement Learning
- Model-based reinforcement learning; and
- Model-free reinforcement learning.
- Information-based learning;
- Similarity-based learning;
- Probability-based learning;
- Error-based learning.
4. Data Pre-Analysis, Visualization, and Preparation
4.1. Data Pre-Analysis
- Numerical data—represents data/information that is measurable, which can be divided into two subcategories:
- -
- Discrete—integer-based data (e.g., M&R actions, number of pavement sections); and
- -
- Continuous—decimal-based data (e.g., pavement structural capacity, traffic, pavement condition);
- Categorical data—qualitative data that are used to classify data by categories (e.g., crack initiation = true or false); and
- Ordinal data—represent discrete and ordered data/information (e.g., rank position = 1st, 2nd, 3rd; rutting level = low, medium, high).
- To fully understand the characteristics of each variable in data (types of values the variable can take, the ranges into which the values fall, and how the values are distributed across that range); and
- To discover any data quality issues (which may arise due to invalid data or perfectly valid data that may cause difficulty to some machine learning techniques).
- Missing values—if features have missing values, it is necessary to understand why they are missing. For example, road agencies usually do not make pavement inspections every year, rather every two, three, or four years;
- Irregular cardinality problems—continuous features will usually have a cardinality value close to the number of instances in the data set. If the cardinality of a continuous feature is significantly less than the number of instances in the data set, it should be investigated; and
- Outliers—values that lie far away from the central tendency and can represent valid or invalid data. Valid outliers are correct values that are simply very different from the rest of the values for a feature and should not be removed from the analysis. In contrast, invalid outliers are often the result of noise in the data (sample errors) and must be removed.
- Standard measures of central tendency (mean, mode, and median);
- Standard measures of variation (standard deviation and percentiles);
- Standard data visualization plots (bar plots, histograms, and box plots).
4.2. Data Visualization
4.3. Data Preparation
- Normalization (range normalization, standard scores)—aims to prepare descriptive features to fall in particular ranges;
- Binning (equal width, equal frequency)—involves converting continuous features into categorical features; and
- Sampling (top, random, stratified)—consists of taking a representative data sample from the original (larger) data set.
5. Information-Based Models
6. Similarity-Based Models
7. Error-Based Models
7.1. Linear Regression Models
- y(x, w) = value of the predicted target/output variable;
- xi = values of the explanatory/input variables;
- w0 = intercept, which represents the value of the target variable when xi is 0;
- wi = regression coefficients (represent the extent to which the input variables are associated with the target variable); and
- ε = disturbance term (represents the random error associated with the regression).
- Starting with a set of random weight values;
- Iteratively making small adjustments to these weights based on the output of the error function. It is supposed that the errors show that the predictions made by the model are higher than the observed values. In this case, the weight should be decreased if the explanatory variable positively impacts the target variable; and
- According to the gradient of the error surface, the algorithm moves downwards on the error surface at each step (using differentiation and partial derivates) to converge.
- They are convex (the error surfaces are shaped like a bowl); and
- They have a global minimum (meaning a unique set of optimal weights with the lowest sum of squared errors on an error surface).
- It is suitable for modeling a wide variety of relationships between variables;
- In many practical applications, the assumptions of linear regression are often suitably satisfied;
- Its outputs are relatively easy to interpret and communicate; and
- The estimation of regression models is relatively easy. The routines for its computation are available in a vast number of software packages.
- The continuous behavior of the target variable;
- The linearity relationship between the target and explanatory variables;
- The behavior of disturbance terms (not auto-correlated, no correlation with the regressors, and the normally distributed pattern).
7.2. Logistic Regression Models
- Yi = value of the predicted target/output variable;
- Xk = set of the explanatory/input variables;
- β0 = model constant; and
- βK = set of unknown parameters.
- Ordinal (if the order of the target variable is important)—the final model predicts the same regression coefficients but different intercepts for each class;
- Nominal (if the order of the target variable is not essential)—the final model predicts different regression coefficients and intercepts for each class.
7.3. Nonlinear Regression Models
- The set of basic functions must be manually inputted; and
- The number of weights in a model using basis functions is usually far greater than the number of descriptive features. Therefore, finding the optimal set of weights involves searching through a much broader set of possibilities (i.e., a much larger weight space).
7.4. Time-Series Models
- Linear/non-linear; and
- Univariate/multivariate.
7.5. Panel/Longitudinal Data Models
- i refers to the cross-sectional units;
- t refers to the time periods;
- α is a scalar;
- β is a vector;
- Xit is the ith observation on K the explanatory variable;
- uit is the error component.
- Controlling for individual heterogeneity: Panel data suggest that entities are heterogeneous, whereas studies of time-series and cross-section do not control this heterogeneity, which may lead to biased results;
- More informative data: Panel data give more variability, less collinearity among the variables, more degrees of freedom, and more efficiency;
- The ability to study the dynamics of adjustment: Panel data are better for this. Cross-sectional distributions that look relatively stable hide a multitude of changes; and
- Identify and measure the effect that is simply not detectable in pure cross-sectional or pure time-series data, allowing more complex behavioral models to be constructed and tested than with pure cross-sectional or time-series data.
7.6. Support Vector Machines
- The margin should be as wide as possible; and
- The support vectors (data points from each class that lie closest to the classification boundary) are the most useful data points because they are most likely to be incorrectly classified.
7.7. Artificial Neural Networks
8. Probability-Based Models
8.1. Naïve Bayes Model
8.2. Bayesian Networks
8.3. Markov Models
8.3.1. The Homogeneous Markov Process
- S is the number of road segments;
- xs,j,k,t is the proportion of pavement of segment s in state j at the beginning of period t to which action k is applied; and
- Pi,j,k is the transition probability from state i to state j when action k is applied to the pavement.
8.3.2. The Nonhomogeneous Markov Process
8.3.3. The Semi-Markov Process
9. Conclusions, Discussion, and Guidelines to Support the Development of PPPMs
- Hold-out test set—divides data into a training set and a testing set;
- Hold-out sampling—divides data into a training set, a validation set, and a test set;
- k-Fold cross validation—data are divided into k equal-size folds. The first fold is used as a test set, and the remaining k − 1 folds as training sets. The process is repeated for all k folds;
- Leave-one-out cross validation—k-fold cross-validation in which the number of folds is the same as the number of training instances;
- Bootstrapping—preferred over cross-validation for small data sets;
- Out-of-time sampling—a hold-out sampling that is targeted rather than random.
- True positive (TP);
- True negative (TN);
- False positive (FP);
- False negative (FN).
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
PPPMs | Pavement performance prediction models |
PMSs | Pavement management systems |
ML | Machine learning |
SL | Supervised learning |
UL | Unsupervised learning |
RL | Reinforcement learning |
References
- Branco, F.; Pereira, P.; Picado-Santos, L. Pavimentos Rodoviários; Edições Almedina: Coimbra, Portugal, 2016. [Google Scholar]
- Yang, J.; Lu, J.J.; Gunaratne, M. Application of Neural Network Models for Forecasting of Pavement Crack Index and Pavement Condition Rating; Technical Report; University of South Florida: Tampa, FL, USA, 2003. [Google Scholar]
- Odoki, J.B.; Kerali, H.R. HDM-4 Volume 4: Analytical Framework and Model Descriptions; World Road Association (PIARC): Paris, France, 2013. [Google Scholar]
- OCDE. Essai OCDE en Vraie Grandeur des Superstructures Routières. In Recherche en Matière de Routes et de Transports Routiers; Organisation de Coopération et de Développement Économiques (OCDE): Paris, France, 1991. [Google Scholar]
- COST324. Long Term Performance of Road Pavements; Final Report of the Action; European Commission: Paris, France, 1997. [Google Scholar]
- Pavement Deterioration Models: Deliverable D4-RO 96-SC.404; Technical Report; European Commission: Paris, France, 1998.
- MathWorks. Introducing Machine Learning. Available online: https://www.mathworks.com/content/dam/mathworks/ebook/gated/machineLearning-ebook.pdf (accessed on 18 July 2019).
- Kelleher, J.D.; Mac Namee, B.; D’Arcy, A. Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies; MIT Press: Cambridge, MA, USA, 2015. [Google Scholar]
- AASHTO. Pavement Management Guide, 2nd ed.; Technical Report; American Association of State Highway and Transportation Officials: Washington, DC, USA, 2012. [Google Scholar]
- Sadek, A.W.; Freeman, T.E.; Demetsky, M.J. Deterioration prediction modeling of Virginia’s interstate highway system. Transp. Res. Rec. 1996, 1524, 118–129. [Google Scholar] [CrossRef]
- Haider, S.W.; Chatti, K. Effect of design and site factors on fatigue cracking of new flexible pavements in the LTPP SPS-1 experiment. Int. J. Pavement Eng. 2009, 10, 133–147. [Google Scholar] [CrossRef]
- Kaur, D.; Pulugurta, H. Comparative analysis of fuzzy decision tree and logistic regression methods for pavement treatment prediction. WSEAS Trans. Inf. Sci. Appl. 2008, 5, 979–990. [Google Scholar]
- Henning, T.F.P.; Costello, S.B.; Watson, T.G. A Review of the HDM/dTIMS Pavement Models Based on Calibration Site Data; Land Transport New Zealand: Auckland, New Zealand, 2006. [Google Scholar]
- Li, Z. A Probabilistic and Adaptive Approach to Modeling Performance of Pavement Infrastructure. Ph.D. Thesis, University of Texas, Austin, TX, USA, 2005. [Google Scholar]
- Ben-Akiva, M.; Humplick, F.; Madanat, S.; Ramaswamy, R. Infrastructure management under uncertainty: Latent performance approach. J. Transp. Eng. 1993, 119, 43–58. [Google Scholar] [CrossRef]
- Ben-Akiva, M.; Gopinath, D. Modeling infrastructure performance and user costs. J. Infrastruct. Syst. 1995, 1, 33–43. [Google Scholar] [CrossRef]
- Chu, C.Y.; Durango-Cohen, P.L. Estimation of infrastructure performance models using state-space specifications of time series models. Transp. Res. Part C Emerg. Technol. 2007, 15, 17–32. [Google Scholar] [CrossRef]
- Durbin, J.; Koopman, S.J. Time Series Analysis by State Space Methods; Oxford University Press: Oxford, UK, 2012. [Google Scholar]
- Durango-Cohen, P.L. A time series analysis framework for transportation infrastructure management. Transp. Res. Part B Methodol. 2007, 41, 493–505. [Google Scholar] [CrossRef]
- Lorino, T.; Lepert, P.; Marion, J.M.; Khraibani, H. Modeling the road degradation process: Nonlinear mixed effects models for correlation and heteroscedasticity of pavement longitudinal data. Procedia Soc. Behav. Sci. 2012, 48, 21–29. [Google Scholar] [CrossRef] [Green Version]
- Laird, N.M.; Ware, J.H. Random-effects models for longitudinal data. Biometrics 1982, 38, 963–974. [Google Scholar] [CrossRef]
- Ware, J.H. Linear models for analysis of longitudinal studies. Am. Stat. 1985, 39, 95–101. [Google Scholar]
- Diggle, P.J.; Liang, K.L.; Zeger, S.L. Analysis of Longitudinal Data; Oxford University Press: New York, NY, USA, 1994. [Google Scholar]
- Davidian, M.; Giltinan, D.M. Nonlinear Models for Repeated Measurement Data; Chapman and Hall: London, UK, 1995. [Google Scholar]
- Vonesh, E.; Chinchilli, V.M. Linear and Nonlinear Models for the Analysis of Repeated Measurements; CRC Press: Boca Raton, FL, USA, 1996. [Google Scholar]
- Ricardo Archilla, A.; Madanat, S. Statistical model of pavement rutting in asphalt concrete mixes. Transp. Res. Rec. 2001, 1764, 70–77. [Google Scholar] [CrossRef]
- Karballaeezadeh, N.; Mohammadzadeh, S.D.; Shamshirband, S.; Hajikhodaverdikhan, P.; Mosavi, A.; Chau, K.W. Prediction of remaining service life of pavement using an optimized support vector machine (case study of Semnan–Firuzkuh road). Eng. Appl. Comput. Fluid Mech. 2019, 13, 188–198. [Google Scholar] [CrossRef] [Green Version]
- Ziari, H.; Maghrebi, M.; Ayoubinejad, J.; Waller, S.T. Prediction of pavement performance: Application of support vector regression with different kernels. Transp. Res. Rec. 2016, 2589, 135–145. [Google Scholar] [CrossRef]
- Wasserman, P.D. Neural Computing; Van Nostrand Reinhold: New York, NY, USA, 1989; pp. 44–54. [Google Scholar]
- Huang, Y.; Moore, R. Roughness level probability prediction using artificial neural networks. Transp. Res. Rec. 1997, 1592, 89–97. [Google Scholar] [CrossRef]
- Flintsch, G.W.; Zaniewski, J.P. Expert project recommendation procedure for Arizona Department of Transportation’s pavement management system. Transp. Res. Rec. 1997, 1592, 26–34. [Google Scholar] [CrossRef]
- Bosurgi, G.; Trifirò, F.; Xilbilia, M.G. Artificial neural network for predicting road pavement conditions. In Proceedings of the 4th International SIIV Congress, Palermo, Italy, 12–14 September 2007. [Google Scholar]
- Prozzi, J.A.; Madanat, S.M. Incremental nonlinear model for predicting pavement serviceability. J. Infrastruct. Syst. 2003, 129, 635–641. [Google Scholar] [CrossRef]
- Yang, J.; Gunaratne, M.; Lu, J.J.; Dietrich, B. Use of recurrent Markov chains for modeling the crack performance of flexible pavements. J. Transp. Eng. 2005, 131, 861–872. [Google Scholar] [CrossRef]
- Kononenko, I.; Kukar, M. Machine Learning and Data Mining; Horwood Publishing: Cambridge, UK, 2007. [Google Scholar]
- Smith, W.; Finn, F.; Kulkarni, R.; Saraf, C.; Nair, K. NCHRP Report 213: Bayesian methodology for verifying recommendations to minimize asphalt pavement distress. In Transportation Research Board, National Research Council; Transportation Research Board: Washington, DC, USA, 1979. [Google Scholar]
- Haper, W.V.; Majidzadeh, K. Utilization of expert opinion in the two pavement management systems. In Proceedings of the 70th Annual Meeting of the Transportation Research Board, Washington, DC, USA, 15 January 1991. [Google Scholar]
- Hajek, J.J.; Bradbury, A. Pavement performance modeling using Canadian strategic highway research program bayesian statistical methodology. Transp. Res. Rec. 1996, 1524, 160–170. [Google Scholar] [CrossRef]
- Pereira, P.A.; Barbosa, N. Sistema de Gestão da Conservação—Manual de Utilização do PRISM; Universidade do Minho: Braga, Portugal, 1998. [Google Scholar]
- Hong, F.; Prozzi, J.A. Updating pavement deterioration models using the Bayesian principles and simulation techniques. In Proceedings of the 1st Annual Inter-University Symposium on Infrastructure Management, Waterloo, ON, Canada, 6 August 2005. [Google Scholar]
- Jiménez, L.A.; Mrawira, D. Bayesian regression in pavement deterioration modeling: Revisiting the AASHO road test rut depth model. Infraestruct. Vial 2012, 14, 28–35. [Google Scholar]
- Litao, L. A Methodology for Developing Performance-related Specifications for Pavement Preservation Treatments. Ph.D. Thesis, Texas A&M Transportation Institute, Bryan, TX, USA, 2013. [Google Scholar]
- Way, G.; Eisenberg, J.; Kulkarni, R. Arizona Pavement Management System: Phase 2, Verification of performance prediction models and development of data base. Transp. Res. Rec. 1982, 846, 49–55. [Google Scholar]
- Butt, A.; Shahin, M.; Feighan, K.; Carpenter, S. Pavement performance prediction model using the Markov process. Transp. Res. Rec. 1987, 1123, 12–19. [Google Scholar]
- Li, N.; Xie, W.C.; Haas, R. Reliability-based processing of Markov chains for modeling pavement network deterioration. Transp. Res. Rec. 1996, 1524, 203–213. [Google Scholar] [CrossRef]
- Abaza, K.; Murad, M. Dynamic probabilistic approach for long-term pavement restoration program with added user cost. Transp. Res. Rec. 2007, 1990, 48–56. [Google Scholar] [CrossRef] [Green Version]
- Abaza, K.; Murad, M. Predicting flexible pavement remaining strength and overlay design thickness with stochastic modeling. Transp. Res. Rec. J. Transp. Res. Board 2009. [Google Scholar] [CrossRef]
- Abaza, K.; Murad, M. Stochastic approach for design of flexible pavement. Road Mater. Pavement Des. 2011, 12, 663–685. [Google Scholar]
- Ferreira, A.; Picado-Santos, L.; Antunes, A. Pavement performance modelling: State of the art. In Proceedings of the Seventh International Conference on Civil and Structural Engineering Computing, Egmond aan Zee, The Netherlands, 13–15 September 1999; pp. 157–264. [Google Scholar]
- Karan, M.A.; Haas, R.C.G. Determining investment priorities for urban pavement improvements. Proc. Assoc. Asph. Paving Technol. 1976, 45, 254–282. [Google Scholar]
- Golabi, K.; Kulkarni, R.B.; Way, G.B. A Statewide pavement management system. Interfaces 1982, 12, 5–21. [Google Scholar] [CrossRef]
- Wang, K.C.P.; Zaniewski, J.; Way, G. Probabilistic behavior of pavements. J. Transp. Eng. 1994, 120, 358–375. [Google Scholar] [CrossRef]
- Ferreira, A.; Picado-Santos, L.; Antunes, A. A segment-linked optimization model for deterministic pavement management systems. Int. J. Pavement Eng. 2002, 3, 95–102. [Google Scholar] [CrossRef]
- Mishalani, R.; Madanat, S. Computation of infrastructure transition probabilities using stochastic models. J. Infrastruct. Syst. 2002, 8, 139–148. [Google Scholar] [CrossRef]
- Shahin, M.Y. Pavement Management for Airports, Roads, and Parking Lots; Springer: New York, NY, USA, 2005; Volume 501. [Google Scholar]
- Ortiz-Garcia, J.; Costello, S.; Snaith, M. Derivation of transition probability matrices for pavement deterioration. J. Transp. Eng. 2006, 132, 141–161. [Google Scholar] [CrossRef]
- Madanat, S.; Mishalani, R.; Ibrahim, W.H.W. Estimation of infrastructure transition probabilities from condition rating data. J. Infrastruct. Syst. 1995, 1, 120–125. [Google Scholar] [CrossRef]
- Pulugurta, H.; Shao, Q.; Chou, Y.J. Pavement condition prediction using Markov process. J. Stat. Manag. Syst. 2009, 12, 853–871. [Google Scholar] [CrossRef]
- Abaza, K.; Abu-Eisheh, S. An optimum design approach for flexible pavement. Int. J. Pavement Eng. 2003, 4, 1–11. [Google Scholar] [CrossRef]
- Abaza, K. Performance-based models for flexible pavement structural overlay design. J. Transp. Eng. 2005, 131, 149–159. [Google Scholar] [CrossRef]
- Hong, H.P.; Wang, S.S. Stochastic modeling of pavement performance. Int. J. Pavement Eng. 2003, 4, 235–243. [Google Scholar] [CrossRef]
- Moghaddass, R.; Zuo, M.J.; Liu, Y.; Huang, H.Z. Predictive analytics using a nonhomogeneous semi-Markov model and inspection data. IIE Trans. 2015, 47, 505–520. [Google Scholar] [CrossRef]
Pavement Performance Prediction Models | Management Level | |||
---|---|---|---|---|
National Network | Municipal Network | Project | ||
Deterministic | Absolute | +++ | ||
Relative—structural | ++ | ++ | + | |
Relative—functional | ++ | ++ | ||
Probabilistic | Bayesian methodology | ++ | ++ | |
Homogeneous Markov | +++ | ++ | ||
Nonhomogeneous Markov | + | + | ||
Semi-Markov process | + | + | ||
Hidden Markov | + | + | + | |
Hybrid | Fuzzy logic | + | + | + |
Artificial neural networks | + | + | + | |
Neuro-fuzzy | + | + | + |
Regression Algorithm | How it Works | Best Used |
---|---|---|
Linear Regression | It is a statistical modeling technique used to describe a continuous response variable as a linear function of one or more predictor variables. Because linear regression models are simple to interpret and easy to train, they are often the first model to be fitted to a new data set. | When an algorithm that is easy to interpret and fast to fit is needed. As a baseline for evaluating other, more complex, regression models. |
Nonlinear Regression | It is a statistical modeling technique that helps describe nonlinear relationships in experimental data. Nonlinear regression models are generally assumed to be parametric, where the model is described as a nonlinear equation. “Nonlinear” refers to a fitness function that is a nonlinear function of the parameters. | When data has strong nonlinear trends and cannot be easily transformed into a linear space. For fitting custom models to data. |
Gaussian Process Regression Model | GPR models are nonparametric models that are used for predicting the value of a continuous response variable. They are widely used in the field of spatial analysis for interpolation in the presence of uncertainty. GPR is also referred to as Kriging. | For interpolating spatial data. As a surrogate model to facilitate optimization of complex designs such as automotive engines. |
SVM Regression | Similar to SVM classification algorithms but are modified to be able to predict a continuous response. Instead of finding a hyperplane that separates data, SVM regression algorithms find a model that deviates from the measured data by a predefined value no greater than a small amount, with parameter values that are as small as possible (to minimize sensitivity to error). | For high-dimensional data (where there will be a large number of predictor variables). |
Generalized Linear Models | It is a special case of nonlinear models that uses linear methods. It involves fitting a linear combination of the inputs to a nonlinear function (the link function) of the outputs. | When the response variables have nonnormal distributions, such as a response variable that is always expected to be positive. |
Regression Trees | Similar to decision trees for classification, but they are modified to be able to predict continuous responses. | When predictors are categorical (discrete) or behave nonlinearly. |
Classification Algorithm | How it Works | Best Used |
---|---|---|
Logistic Regression | It fits a model that can predict the probability of a binary response belonging to one class or the other. Because of its simplicity, logistic regression is commonly used as a starting point for binary classification problems. | When data can be separated by a single, linear boundary. As a baseline for evaluating more complex classification methods. |
K-Nearest Neighbor (kNN) | Categorizes objects based on the classes of their nearest neighbors in the data set. KNN predictions assume that objects near each other are similar. Distance metrics, such as Euclidean, city block, cosine, and Chebychev, are used to find the nearest neighbor. | When a simple algorithm to establish benchmark learning rules is required. When memory usage and prediction speed of the trained model is a lesser concern. |
Support Vector Machine (SVM) | Classifies data by finding the linear decision boundary (hyperplane) that separates all data points of one class from those of the other class. The best hyperplane is the one with the largest margin between the two classes when the data is linearly separable. If the data is not linearly separable, a loss function is used to penalize points on the hyperplane’s wrong side. SVMs sometimes use a kernel transform to transform nonlinearly separable data into higher dimensions where a linear decision boundary can be found. | For data with exactly two classes (can also be used for multiclass classification with a technique called error correcting output codes). For high dimensional, nonlinearly separable data. When a classifier that is simple, easy to interpret, and accurate is required. |
Neural Networks | Inspired by the human brain, a neural network consists of highly connected networks of neurons that relate the inputs to the desired outputs. The network is trained by iteratively modifying the connections’ strengths to map the given inputs to the correct response. | For modeling highly nonlinear systems. When data is available incrementally, and the goal is to update the model regularly. When model interpretability is not a key concern. |
Naïve Bayes | Assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature. It classifies new data based on the highest probability of its belonging to a particular class. | For a small data set containing many parameters. When a classifier that is easy to interpret is needed. When the model encounters scenarios that were not in the training data, as is the case with many financial and medical applications. |
Discriminant Analysis | Classifies data by finding linear combinations of features. Discriminant analysis assumes that different classes generate data based on Gaussian distributions. Training a discriminant analysis model involves finding the parameters for a Gaussian distribution for each class. The distribution parameters are used to calculate boundaries, which can be linear or quadratic functions. These boundaries are used to determine the class of new data. | When a simple model that is easy to interpret is needed. When memory usage during training is a concern. When a model that is fast to predict is required. |
Decision Tree | Predicts responses to data by following the tree’s decisions from the root (beginning) down to a leaf node. A tree consists of branching conditions where the value of a predictor is compared to a trained weight. The number of branches and the values of weights are determined in the training process. Additional modification, or pruning, may be used to simplify the model. | When an algorithm that is easy to interpret and fast to fit is a requirement. To minimize memory usage. When high predictive accuracy is not a requirement. |
Ensemble Methods (Bagged and Boosted Decision Trees) | Several “weaker” decision trees are combined into a “stronger” ensemble. A bagged decision tree consists of trees trained independently on data that is bootstrapped from the input data. Boosting involves creating a strong learner by iteratively adding “weak” learners and adjusting each weak learner’s weight to focus on misclassified examples. | When predictors are categorical (discrete) or behave nonlinearly. When the time needed to train a model is less of a concern. |
Classification Algorithm | How it Works | Best Used |
---|---|---|
K-Means | Partitions data into k number of mutually exclusive clusters. How well a point fits into a cluster is determined by the distance from that point to the cluster’s center. RESULT = cluster centers. | When the number of clusters is known. For fast clustering of large data sets. |
K-Medoids | Similar to k-means, but with the requirement that the cluster centers coincide with points in the data. RESULT = cluster centers that coincide with data points. | When the number of clusters is known. For fast clustering of categorical data. To scale to large data sets. |
Hierarchical Clustering | Produces nested clusters by analyzing similarities between pairs of points and grouping objects into a binary, hierarchical tree. RESULT = dendrogram showing the hierarchical relationship between clusters. | When the number of clusters in data advances, a visualization to guide selection is desirable. |
Self-Organizing Map | Neural network-based clustering that transforms a data set into a topology-preserving 2D map. RESULT = lower-dimensional (typically 2D) representation. | To visualize high-dimensional data in 2D or 3D. To deduce the dimensionality of data by preserving its topology (shape). |
Classification Algorithm | How it Works | Best Used |
---|---|---|
Fuzzy c-Means | Partition-based clustering when data points may belong to more than one cluster. RESULT = cluster centers (similar to k-means) but with fuzziness so that points may belong to more than one cluster. | When the number of clusters is known. For pattern recognition. When clusters overlap. |
Gaussian Mixture Model | Partition-based clustering where data points come from different multivariate normal distributions with specific probabilities. RESULT = a model of Gaussian distributions that give probabilities of a point being in a cluster. | When a data point might belong to more than one cluster. When clusters have different sizes and correlation structures within them. |
Type of Algorithm | Prediction Speed | Training Speed | Memory Usage | Required Tuning | General Assessment |
---|---|---|---|---|---|
Logistic Regression + Linear SVM | Fast | Fast | Small | Minimal | Suitable for small problems with linear decision boundaries |
Decision Trees | Fast | Fast | Small | Some | Good generalist but prone to overfitting |
Nonlinear SVM and Logistic Regression | Slow | Slow | Medium | Some | Suitable for many binary problems and handles high-dimensional data well |
Nearest Neighbor | Moderate | Minimal | Medium | Minimal | Lower accuracy, but easy to use and interpret |
Naïve Bayes | Fast | Fast | Medium | Some | Widely used for text, including spam filtering |
Ensembles | Moderate | Slow | Varies | Some | High accuracy and good performance for small to medium-sized data sets |
Neural Network | Moderate | Slow | Medium to Large | Lots | Popular for classification, compression, recognition, and forecasting |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Justo-Silva, R.; Ferreira, A.; Flintsch, G. Review on Machine Learning Techniques for Developing Pavement Performance Prediction Models. Sustainability 2021, 13, 5248. https://doi.org/10.3390/su13095248
Justo-Silva R, Ferreira A, Flintsch G. Review on Machine Learning Techniques for Developing Pavement Performance Prediction Models. Sustainability. 2021; 13(9):5248. https://doi.org/10.3390/su13095248
Chicago/Turabian StyleJusto-Silva, Rita, Adelino Ferreira, and Gerardo Flintsch. 2021. "Review on Machine Learning Techniques for Developing Pavement Performance Prediction Models" Sustainability 13, no. 9: 5248. https://doi.org/10.3390/su13095248
APA StyleJusto-Silva, R., Ferreira, A., & Flintsch, G. (2021). Review on Machine Learning Techniques for Developing Pavement Performance Prediction Models. Sustainability, 13(9), 5248. https://doi.org/10.3390/su13095248