Optimizing Faulting Prediction for Rigid Pavements Using a Hybrid SHAP-TPE-CatBoost Model
Abstract
:1. Introduction
2. Materials and Methods
2.1. Boruta Method
- (1)
- Creation of shadow features: Randomly shuffle the real features R to create shadow features S, then combine the real features and shadow features to create a new training feature matrix N = [R, S].
- (2)
- Input the newly created feature matrix and train a tree-based model (such as RF, LightGBM, and others) to obtain the VIM for the real features and shadow features. The Z-score is calculated as Z-score = average (VIM)/SD(VIM).
- (3)
- Compare the Z-score of each real feature with the maximum Z-score of the shadow feature (S_max). Real features with a Z-score greater than S_max are labeled as “important”, and those with a Z-score less than or equal to S_max are labeled as “unimportant”.
- (4)
- Discard the “unimportant” features as well as all the shadow variables.
- (5)
- Repeat steps (1)–(4) until all feature attributes are labeled as “important” or “unimportant”.
2.2. Tree-Structured Parzen Estimator Method for Hyperparameter Optimization
2.3. CatBoost
2.4. SHAP Method for Results Interpretation
2.5. Model Evaluation Criteria
3. Data Preparation
3.1. Data Collection
- (1)
- In the faulting data, the measured values are recorded as positive or negative depending on the condition of the pavement sections. To remove the influence of negative values, this study only used the most common positive measured faulting data.
- (2)
- The LTPP database includes two types of concrete pavements: Jointed Plain Concrete Pavement (JPCP) and Jointed Reinforced Concrete Pavement (JRCP). This study focuses primarily on the most frequent type of concrete pavement, which is JPCP.
- (3)
- Maintenance and rehabilitation (M&R) activities are conducted in response to the deterioration of pavement conditions, resulting in a decline in driving quality. After appropriate maintenance, the pavement condition is usually restored. The faulting data measured after repair are often much lower than the data before repair. Therefore, in the faulting prediction analysis, this study does not consider the faulting data after repair to avoid negative impacts on the prediction results.
3.2. Boruta-Based Feature Selection
4. Model Construction
4.1. TPE-CatBoost Model Performance Evaluation
4.2. Models Performance Comparison
4.3. SHAP-Based Feature Interpretation
5. Conclusions
- (1)
- The TPE-CatBoost model constructed with six variables demonstrated improved predictive results on the faulting test dataset. Compared to the TPE-CatBoost model constructed with 17 variables, there was an increase of 0.007 in R2, a decrease of 0.31 in MAE, and a decrease of 0.006 in RMSE. This improvement can be attributed to the capability of Boruta to identify relevant variables and eliminate unnecessary variables, thereby generating a more accurate and efficient model.
- (2)
- Compared to TPE-RF, TPE-AdaBoost, TPE-GDBT, and TPE-LightGBM, TPE-CatBoost achieved higher R2 and lower MAE and RMSE. TPE-CatBoost demonstrates greater potential for predicting Faulting.
- (3)
- By integrating with SHAP, TPE-SHAP-CatBoost can uncover the contributions of specific features to fault prediction, thereby enhancing the interpretability of the prediction results. According to the SHAP results, AGE, LTE, DWL, and EM are the most influential features affecting the output of IRI.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Naseri, H.; Ehsani, M.; Golroo, A.; Moghadas Nejad, F. Sustainable Pavement Maintenance and Rehabilitation Planning Using Differential Evolutionary Programming and Coyote Optimisation Algorithm. Int. J. Pavement Eng. 2022, 23, 2870–2887. [Google Scholar] [CrossRef]
- Augeri, M.G.; Greco, S.; Nicolosi, V. Planning Urban Pavement Maintenance by a New Interactive Multiobjective Optimization Approach. Eur. Transp. Res. Rev. 2019, 11, 17. [Google Scholar] [CrossRef]
- Mao, Z. Life-Cycle Assessment of Highway Pavement Alternatives in Aspects of Economic, Environmental, and Social Performance. Ph.D. Thesis, Texas A & M University, College Station, TX, USA, 2012. [Google Scholar]
- Hossain, M.; Gopisetti, L.S.P.; Miah, M.S. Artificial Neural Network Modelling to Predict International Roughness Index of Rigid Pavements. Int. J. Pavement Res. Technol. 2020, 13, 229–239. [Google Scholar] [CrossRef]
- Mapa, D.G.; Gunaratne, M.; Riding, K.A.; Zayed, A. Evaluating Early-Age Stresses in Jointed Plain Concrete Pavement Repair Slabs. ACI Mater. J. 2020, 117, 119–132. [Google Scholar]
- Wang, C.; Xiao, W.; Liu, J. Developing an Improved Extreme Gradient Boosting Model for Predicting the International Roughness Index of Rigid Pavement. Constr. Build. Mater. 2023, 408, 133523. [Google Scholar] [CrossRef]
- Simpson, A.L.; National Research Council; Jordahl, P.R.; Owusu-Antwi, E. Sensitivity Analyses for Selected Pavement Distresses; Strategic Highway Research Program, SHRP-P; National Research Council: Washington, DC, USA, 1994; ISBN 978-0-309-05771-4. [Google Scholar]
- Yu, H.T.; Smith, K.D.; Darter, M.I.; Jiang, J. Performance of Concrete Pavements, Volume III: Improving Concrete Pavement Performance (No. FHWA-RD-95-111); Department of Transportation, Federal Highway Administration: Washington, DC, USA, 1998. [Google Scholar]
- Ker, H.-W.; Lee, Y.-H.; Lin, C.-H. Development of Faulting Prediction Models for Rigid Pavements Using LTPP Database. Statistics 2008, 218, 0037-0030. [Google Scholar]
- Saghafi, B.; Hassaniz, A.; Noori, R.; Bustos, M.G. Artificial neural networks and regression analysis for predicting faulting in jointed concrete pavements considering base condition. Int. J. Pavement Res. Technol. 2009, 2, 20–25. [Google Scholar]
- Wang, W.-N.; Tsai, Y.-C.J. Back-Propagation Network Modeling for Concrete Pavement Faulting Using LTPP Data. Int. J. Pavement Res. Technol. 2013, 6, 651–657. [Google Scholar] [CrossRef]
- Ehsani, M.; Moghadas Nejad, F.; Hajikarimi, P. Developing an Optimized Faulting Prediction Model in Jointed Plain Concrete Pavement Using Artificial Neural Networks and Random Forest Methods. Int. J. Pavement Eng. 2022, 1–16. [Google Scholar] [CrossRef]
- Ehsani, M.; Hamidian, P.; Hajikarimi, P.; Moghadas Nejad, F. Optimized Prediction Models for Faulting Failure of Jointed Plain Concrete Pavement Using the Metaheuristic Optimization Algorithms. Constr. Build. Mater. 2023, 364, 129948. [Google Scholar] [CrossRef]
- Kursa, M.B.; Rudnicki, W.R. Feature Selection with the Boruta Package. J. Stat. Softw. 2010, 36, 1–13. [Google Scholar] [CrossRef]
- Jia, D.; Yang, L.; Gao, X.; Li, K. Assessment of a New Solar Radiation Nowcasting Method Based on FY-4A Satellite Imagery, the McClear Model and SHapley Additive exPlanations (SHAP). Remote Sens. 2023, 15, 2245. [Google Scholar] [CrossRef]
- Chen, B.; Zheng, H.; Luo, G.; Chen, C.; Bao, A.; Liu, T.; Chen, X. Adaptive Estimation of Multi-Regional Soil Salinization Using Extreme Gradient Boosting with Bayesian TPE Optimization. Int. J. Remote Sens. 2022, 43, 778–811. [Google Scholar] [CrossRef]
- Kavzoglu, T.; Teke, A. Advanced Hyperparameter Optimization for Improved Spatial Prediction of Shallow Landslides Using Extreme Gradient Boosting (XGBoost). Bull. Eng. Geol. Environ. 2022, 81, 201. [Google Scholar] [CrossRef]
- Yu, J.; Zheng, W.; Xu, L.; Meng, F.; Li, J.; Zhangzhong, L. TPE-CatBoost: An Adaptive Model for Soil Moisture Spatial Estimation in the Main Maize-Producing Areas of China with Multiple Environment Covariates. J. Hydrol. 2022, 613, 128465. [Google Scholar] [CrossRef]
- Behkamal, B.; Entezami, A.; De Michele, C.; Arslan, A.N. Investigation of Temperature Effects into Long-Span Bridges via Hybrid Sensing and Supervised Regression Models. Remote Sens. 2023, 15, 3503. [Google Scholar] [CrossRef]
- Merow, C.; Smith, M.J.; Edwards Jr, T.C.; Guisan, A.; McMahon, S.M.; Normand, S.; Thuiller, W.; Wüest, R.O.; Zimmermann, N.E.; Elith, J. What Do We Gain from Simplicity versus Complexity in Species Distribution Models? Ecography 2014, 37, 1267–1281. [Google Scholar] [CrossRef]
- Belanche-Muñoz, L.; Blanch, A.R. Machine Learning Methods for Microbial Source Tracking. Environ. Model. Softw. 2008, 23, 741–750. [Google Scholar] [CrossRef]
- Yang, E.; Yang, Q.; Li, J.; Zhang, H.; Di, H.; Qiu, Y. Establishment of Icing Prediction Model of Asphalt Pavement Based on Support Vector Regression Algorithm and Bayesian Optimization. Constr. Build. Mater. 2022, 351, 128955. [Google Scholar] [CrossRef]
- Grinsztajn, L.; Oyallon, E.; Varoquaux, G. Why Do Tree-Based Models Still Outperform Deep Learning on Typical Tabular Data? Adv. Neural Inf. Process. Syst. 2022, 35, 507–520. [Google Scholar]
- Hancock, J.; Khoshgoftaar, T. CatBoost for Big Data: An Interdisciplinary Review. J. Big Data 2020, 7, 94. [Google Scholar] [CrossRef] [PubMed]
- Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased Boosting with Categorical Features. Adv. Neural Inf. Process. Syst. 2018, 31. [Google Scholar]
- Lundberg, S.M.; Lee, S.-I. A Unified Approach to Interpreting Model Predictions. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
- Lundberg, S.M.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J.M.; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.; Lee, S.-I. From Local Explanations to Global Understanding with Explainable AI for Trees. Nat. Mach. Intell. 2020, 2, 56–67. [Google Scholar] [CrossRef] [PubMed]
- Moncada-Torres, A.; van Maaren, M.C.; Hendriks, M.P.; Siesling, S.; Geleijnse, G. Explainable Machine Learning Can Outperform Cox Regression Predictions and Provide Insights in Breast Cancer Survival. Sci. Rep. 2021, 11, 6968. [Google Scholar] [CrossRef] [PubMed]
- Jung, Y. Multiple Predicting K-Fold Cross-Validation for Model Selection. J. Nonparametr. Stat. 2018, 30, 197–215. [Google Scholar] [CrossRef]
- Bergstra, J.; Bengio, Y. Random Search for Hyper-Parameter Optimization. J. Mach. Learn. Res. 2012, 13, 281–305. [Google Scholar]
- Van den Broeck, G.; Lykov, A.; Schleich, M.; Suciu, D. On the Tractability of SHAP Explanations. J. Artif. Intell. Res. 2022, 74, 851–886. [Google Scholar] [CrossRef]
- Lin, N.; Zhang, D.; Feng, S.; Ding, K.; Tan, L.; Wang, B.; Chen, T.; Li, W.; Dai, X.; Pan, J.; et al. Rapid Landslide Extraction from High-Resolution Remote Sensing Images Using SHAP-OPT-XGBoost. Remote Sens. 2023, 15, 3901. [Google Scholar] [CrossRef]
- Chen, Y.; Lytton, R.L. Development of a New Faulting Model in Jointed Concrete Pavement Using LTPP Data. Transp. Res. Rec. 2019, 2673, 407–417. [Google Scholar] [CrossRef]
- Chen, Y.; Lytton, R.L. Exploratory Analysis of LTPP Faulting Data Using Statistical Techniques. Constr. Build. Mater. 2021, 309, 125025. [Google Scholar] [CrossRef]
Variable ID | Variable Type | Unit | Description |
---|---|---|---|
FLT | Output | mm | Average calculated edge faulting. |
DWL | Structure | mm | The outer diameter of dowel bars. |
PR | / | Poisson’s ratio of concrete slab. | |
EM | kPa | Modulus of elasticity of concrete slab. | |
CS | kPa | Compressive strength of concrete slab. | |
TS | kPa | Splitting tensile strength of concrete slab. | |
CTE | mm/mm/deg C | Coefficient of Thermal Expansion value of concrete slab. | |
P10 | % | Percent subgrade passing sieve No. 10. | |
P40 | % | Percent subgrade passing sieve No. 40. | |
P200 | % | Percent subgrade passing sieve No. 200. | |
PI | % | Plasticity index. | |
BTH | mm | Base thickness. | |
LTE | % | Load transfer efficiency. | |
PRC | Climate | mm | The annual precipitation. |
FIN | / | Freezing index. | |
ESAL | Traffic | / | Estimated ESALs. |
CESAL | / | Estimated Cumulative ESALs. | |
AGE | Age | year | The number of years that passed since the pavement was built. |
Variable Type | Variable ID | Unit |
---|---|---|
Input | DWL | mm |
EM | kPa | |
TS | kPa | |
BTH | mm | |
LTE | % | |
AGE | year | |
Output | FLT | mm |
Variables | Maximum Value | Minimum Value | Mean | Standard Deviation |
---|---|---|---|---|
FLT | 0 | 9.100 | 1.534 | 1.631 |
DWL | 0 | 31.8 | 12.365 | 14.575 |
EM | 1.310 × 107 | 4.827 × 107 | 2.939 × 107 | 5.609 × 106 |
TS | 2916.585 | 6260.660 | 4204.700 | 688.453 |
BTH | 22.900 | 589.300 | 146.754 | 84.890 |
LTE | 19.462 | 93.550 | 70.238 | 15.295 |
AGE | 2 | 32 | 15.981 | 6.153 |
Parameter | Description | Space Range | Optimum |
---|---|---|---|
iterations | The number of iterations during the training process. | min = 40 max = 200 | 150 |
learning_rate | How fast does the algorithm move in one step (>0) | min = 0.01 max = 0.5 | 0.38 |
depth | Maximum height allowed for each tree (≥1) | min = 2 max = 10 | 3 |
l2_leaf_reg | L2 regularization term | min = 0.01 max = 1 | 0.04 |
Model | Parameter | Optimum |
---|---|---|
TPE-RF | n_estimators | 163 |
criterion | “squared_error” | |
max_features | 2 | |
max_depth | 10 | |
TPE-AdaBoost | n_estimators | 196 |
learning_rate | 1.4 | |
loss | “square” | |
TPE-GDBT | n_estimators | 180 |
learning_rate | 0.35 | |
max_features | 2 | |
min_impurity_decrease | 0 | |
TPE-LightGBM | n_estimators | 180 |
learning_rate | 0.39 | |
max_depth | 16 | |
colsample_bytree | 0.5 | |
min_child_weight | 2.5 | |
num_leaves | 100 |
R Square | MAE | RMSE | |
---|---|---|---|
TPE-RF | 0.755 | 0.500 | 0.928 |
TPE-AdaBoost | 0.721 | 0.717 | 0.990 |
TPE-GDBT | 0.867 | 0.387 | 0.684 |
TPE-LightGBM | 0.865 | 0.483 | 0.688 |
TPE-CatBoost | 0.906 | 0.346 | 0.573 |
Ranking | Train Set | Mean |SHAP Value| | Test Set | Mean |SHAP Value| |
---|---|---|---|---|
1 | AGE | 0.574 | AGE | 0.558 |
2 | LTE | 0.353 | DWL | 0.364 |
3 | DWL | 0.346 | EM | 0.353 |
4 | EM | 0.332 | LTE | 0.320 |
5 | TS | 0.248 | TS | 0.286 |
6 | BTH | 0.216 | BTH | 0.188 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xiao, W.; Wang, C.; Liu, J.; Gao, M.; Wu, J. Optimizing Faulting Prediction for Rigid Pavements Using a Hybrid SHAP-TPE-CatBoost Model. Appl. Sci. 2023, 13, 12862. https://doi.org/10.3390/app132312862
Xiao W, Wang C, Liu J, Gao M, Wu J. Optimizing Faulting Prediction for Rigid Pavements Using a Hybrid SHAP-TPE-CatBoost Model. Applied Sciences. 2023; 13(23):12862. https://doi.org/10.3390/app132312862
Chicago/Turabian StyleXiao, Wei, Changbai Wang, Jimin Liu, Mengcheng Gao, and Jianyang Wu. 2023. "Optimizing Faulting Prediction for Rigid Pavements Using a Hybrid SHAP-TPE-CatBoost Model" Applied Sciences 13, no. 23: 12862. https://doi.org/10.3390/app132312862
APA StyleXiao, W., Wang, C., Liu, J., Gao, M., & Wu, J. (2023). Optimizing Faulting Prediction for Rigid Pavements Using a Hybrid SHAP-TPE-CatBoost Model. Applied Sciences, 13(23), 12862. https://doi.org/10.3390/app132312862