Advanced Machine Learning Techniques for Predicting Concrete Compressive Strength
Abstract
:1. Introduction
2. Materials and Methods
2.1. Data Collection and Description
2.2. Data Preprocessing
2.2.1. Data Cleaning
2.2.2. Exploratory Data Analysis
2.2.3. Correlation Analysis and Preparation of Predictor Variables
2.2.4. Feature Engineering and Multicollinearity Analysis
2.2.5. Data Scaling
2.2.6. Discretization of the Target Variable for Classification
2.3. Model Development and Evaluation
2.3.1. Regression and Classification Models
- Linear regression: This serves as a baseline model to establish a benchmark and assess the extent of linear relationships between the features and the target variable (compressive strength).
- Decision tree regression: This model is employed to capture non-linear relationships by partitioning the data based on feature thresholds, effectively creating a tree-like structure of decisions to arrive at a prediction.
- RF regression: This ensemble method combines multiple decision trees to improve predictive accuracy and mitigate overfitting, by leveraging the wisdom of the crowd for a more robust prediction.
- Gradient boosting regression: This technique builds models sequentially, with each subsequent model correcting errors made by previous ones. This iterative approach enhances performance, particularly on complex datasets with intricate patterns.
- AdaBoost regression: Similar to gradient boosting, AdaBoost focuses on instances where prior models struggled and adjusts weights accordingly to improve prediction accuracy on challenging data points.
- KNN regression: This model predicts target values based on the average of the nearest neighbors in the feature space and leverages the similarity between data points for prediction.
2.3.2. Model Evaluation Metrics
2.3.3. Minimum Dataset Size Analysis
2.4. Feature Importance Analysis
2.4.1. Mean Decrease in Impurity
2.4.2. SHAP Values
2.4.3. Ablation Study
2.4.4. Partial Dependence Plot
2.5. Model Implementation and Validation
3. Results
3.1. Regression Analysis
3.2. Classification Analysis
3.3. Feature Importance Ranking and Feature Ablation
3.4. Understanding Feature Contributions with SHAP Analysis
4. Discussion
4.1. Model Performance Insights
4.2. Feature Importance and Practical Implications
4.3. Challenges, Limitations, and Future Research
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Griffiths, S.; Sovacool, B.K.; Furszyfer Del Rio, D.D.; Foley, A.M.; Bazilian, M.D.; Kim, J.; Uratani, J.M. Decarbonizing the Cement and Concrete Industry: A Systematic Review of Socio-Technical Systems, Technological Innovations, and Policy Options. Renew. Sustain. Energy Rev. 2023, 180, 113291. [Google Scholar] [CrossRef]
- Young, B.A.; Hall, A.; Pilon, L.; Gupta, P.; Sant, G. Can the Compressive Strength of Concrete Be Estimated from Knowledge of the Mixture Proportions?: New Insights from Statistical Analysis and Machine Learning Methods. Cem. Concr. Res. 2019, 115, 379–388. [Google Scholar] [CrossRef]
- Li, Z.; Yoon, J.; Zhang, R.; Rajabipour, F.; Srubar, W.V., III; Dabo, I.; Radlińska, A. Machine Learning in Concrete Science: Applications, Challenges, and Best Practices. npj Comput. Mater. 2022, 8, 127. [Google Scholar] [CrossRef]
- Alghrairi, N.S.; Aziz, F.N.; Rashid, S.A.; Mohamed, M.Z.; Ibrahim, A.M. Machine Learning-Based Compressive Strength Estimation in Nanomaterial-Modified Lightweight Concrete. Open Eng. 2024, 14, 20220604. [Google Scholar] [CrossRef]
- Ding, Y.; Wei, W.; Wang, J.; Wang, Y.; Shi, Y.; Mei, Z. Prediction of Compressive Strength and Feature Importance Analysis of Solid Waste Alkali-Activated Cementitious Materials Based on Machine Learning. Constr. Build. Mater. 2023, 407, 133545. [Google Scholar] [CrossRef]
- Ekanayake, I.U.; Meddage, D.P.P.; Rathnayake, U. A Novel Approach to Explain the Black-Box Nature of Machine Learning in Compressive Strength Predictions of Concrete Using Shapley Additive Explanations (SHAP). Case Stud. Constr. Mater. 2022, 16, e01059. [Google Scholar] [CrossRef]
- Paudel, S.; Pudasaini, A.; Shrestha, R.K.; Kharel, E. Compressive Strength of Concrete Material Using Machine Learning Techniques. Clean. Eng. Technol. 2023, 15, 100661. [Google Scholar] [CrossRef]
- Song, H.; Ahmad, A.; Farooq, F.; Ostrowski, K.A.; Maślak, M.; Czarnecki, S.; Aslam, F. Predicting the Compressive Strength of Concrete with Fly Ash Admixture Using Machine Learning Algorithms. Constr. Build. Mater. 2021, 308, 125021. [Google Scholar] [CrossRef]
- Quan Tran, V.; Quoc Dang, V.; Si Ho, L. Evaluating Compressive Strength of Concrete Made with Recycled Concrete Aggregates Using Machine Learning Approach. Constr. Build. Mater. 2022, 323, 126578. [Google Scholar] [CrossRef]
- Ahmad, A.; Ahmad, W.; Chaiyasarn, K.; Ostrowski, K.A.; Aslam, F.; Zajdel, P.; Joyklad, P. Prediction of Geopolymer Concrete Compressive Strength Using Novel Machine Learning Algorithms. Polymers 2021, 13, 3389. [Google Scholar] [CrossRef] [PubMed]
- Anjum, M.; Khan, K.; Ahmad, W.; Ahmad, A.; Amin, M.N.; Nafees, A. Application of Ensemble Machine Learning Methods to Estimate the Compressive Strength of Fiber-Reinforced Nano-Silica Modified Concrete. Polymers 2022, 14, 3906. [Google Scholar] [CrossRef]
- Ullah, H.S.; Khushnood, R.A.; Farooq, F.; Ahmad, J.; Vatin, N.I.; Ewais, D.Y.Z. Prediction of Compressive Strength of Sustainable Foam Concrete Using Individual and Ensemble Machine Learning Approaches. Materials 2022, 15, 3166. [Google Scholar] [CrossRef]
- Kumar, P.; Pratap, B. Feature Engineering for Predicting Compressive Strength of High-Strength Concrete with Machine Learning Models. Asian J. Civ. Eng. 2024, 25, 723–736. [Google Scholar] [CrossRef]
- Nguyen, N.-H.; Abellán-García, J.; Lee, S.; Vo, T.P. From Machine Learning to Semi-Empirical Formulas for Estimating Compressive Strength of Ultra-High Performance Concrete. Expert Syst. Appl. 2024, 237, 121456. [Google Scholar] [CrossRef]
- Onyelowe, K.C.; Gnananandarao, T.; Ebid, A.M.; Mahdi, H.A.; Ghadikolaee, M.R.; Al-Ajamee, M. Evaluating the Compressive Strength of Recycled Aggregate Concrete Using Novel Artificial Neural Network. Civ. Eng. J. 2022, 8, 1679–1693. [Google Scholar] [CrossRef]
- Onyelowe, K.C.; Ebid, A.M.; Mahdi, H.A.; Riofrio, A.; Eidgahee, D.R.; Baykara, H.; Soleymani, A.; Kontoni, D.-P.N.; Shakeri, J.; Jahangir, H. Optimal Compressive Strength of RHA Ultra-High-Performance Lightweight Concrete (UHPLC) and Its Environmental Performance Using Life Cycle Assessment. Civ. Eng. J. 2022, 8, 2391–2410. [Google Scholar] [CrossRef]
- Onyelowe, K.C.; Kontoni, D.-P.N.; Ebid, A.M.; Dabbaghi, F.; Soleymani, A.; Jahangir, H.; Nehdi, M.L. Multi-Objective Optimization of Sustainable Concrete Containing Fly Ash Based on Environmental and Mechanical Considerations. Buildings 2022, 12, 948. [Google Scholar] [CrossRef]
- ACI Committee 318; American Concrete Institute. Building Code Requirements for Structural Concrete (ACI 318-08) and Commentary; American Concrete Institute: Farmington Hills, MI, USA, 2008; ISBN 978-0-87031-264-9. [Google Scholar]
- Yeh, I.-C. Concrete Compressive Strength. UCI Mach. Learn. Repos. 2007, 10, C5PK67. [Google Scholar]
- Mckinney, W. Pandas: A Foundational Python Library for Data Analysis and Statistics. Python High Perform. Sci. Comput. 2011, 14, 1–9. [Google Scholar]
- Vinutha, H.P.; Poornima, B.; Sagar, B.M. Detection of Outliers Using Interquartile Range Technique from Intrusion Dataset. In Information and Decision Sciences, Proceedings of the 6th International Conference on FICTA, Bhubaneswar, India, 14–16 October 2017; Satapathy, S.C., Tavares, J.M.R.S., Bhateja, V., Mohanty, J.R., Eds.; Springer: Singapore, 2018; pp. 511–518. [Google Scholar]
- Tukey, J.W. Exploratory Data Analysis; Addison-Wesley Pub. Co.: Reading, MA, USA, 1977; ISBN 978-0-201-07616-5. [Google Scholar]
- Hunter, J.D. Matplotlib: A 2D Graphics Environment. Comput. Sci. Eng. 2007, 9, 90–95. [Google Scholar] [CrossRef]
- American Concrete Institute. Building Code Requirements for Structural Concrete (ACI 318-19) and Commentary; American Concrete Institute: Farmington Hills, MI, USA, 2019. [Google Scholar]
- McKinney, W. Data Structures for Statistical Computing in Python. Proc. Python Sci. Conf. 2010, 445, 56–61. [Google Scholar]
- Waskom, M.L. Seaborn: Statistical Data Visualization. J. Open Source Softw. 2021, 6, 3021. [Google Scholar] [CrossRef]
- O’brien, R.M. A Caution Regarding Rules of Thumb for Variance Inflation Factors. Qual. Quant. 2007, 41, 673–690. [Google Scholar] [CrossRef]
- Hover, K.C. The Influence of Water on the Performance of Concrete. Constr. Build. Mater. 2011, 25, 3003–3013. [Google Scholar] [CrossRef]
- Hashemi, M.; Shafigh, P.; Karim, M.R.B.; Atis, C.D. The Effect of Coarse to Fine Aggregate Ratio on the Fresh and Hardened Properties of Roller-Compacted Concrete Pavement. Constr. Build. Mater. 2018, 169, 553–566. [Google Scholar] [CrossRef]
- Iqbal Khan, M.; Abbass, W.; Alrubaidi, M.; Alqahtani, F.K. Optimization of the Fine to Coarse Aggregate Ratio for the Workability and Mechanical Properties of High Strength Steel Fiber Reinforced Concretes. Materials 2020, 13, 5202. [Google Scholar] [CrossRef] [PubMed]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning; Springer Series in Statistics; Springer: New York, NY, USA, 2009; ISBN 978-0-387-84857-0. [Google Scholar]
- Massey, F.J., Jr. The Kolmogorov-Smirnov Test for Goodness of Fit. J. Am. Stat. Assoc. 1951, 46, 68–78. [Google Scholar] [CrossRef]
- Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M.; et al. {TensorFlow}: A System for {Large-Scale} Machine Learning. In Proceedings of the 12th USENIX symposium on operating systems design and implementation (OSDI 16), Savannah, GA, USA, 2–4 November 2016; pp. 265–283. [Google Scholar]
- Chollet, F. Fchollet/Keras-Resources. Available online: https://github.com/fchollet/keras-resources (accessed on 20 November 2024).
- Bayesian Optimization in Action. Available online: https://www.manning.com/books/bayesian-optimization-in-action (accessed on 15 January 2025).
- Snoek, J.; Larochelle, H.; Adams, R.P. Practical Bayesian Optimization of Machine Learning Algorithms. Adv. Neural Inf. Process. Syst. 2012, 25. Available online: https://arxiv.org/abs/1206.2944 (accessed on 15 January 2025).
- Taylor, K.E. Summarizing Multiple Aspects of Model Performance in a Single Diagram. J. Geophys. Res. Atmos. 2001, 106, 7183–7192. [Google Scholar] [CrossRef]
- Powers, D.M.W. Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness and Correlation. arXiv 2020, arXiv:2010.16061. [Google Scholar]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Lundberg, S.; Lee, S.-I. A Unified Approach to Interpreting Model Predictions. arXiv 2017, arXiv:1705.07874. [Google Scholar]
- Friedman, J.H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
Variable | Type | Unit | Description |
---|---|---|---|
cement | quantitative | kg/m3 | Input |
blast furnace slag | quantitative | kg/m3 | Input |
fly ash | quantitative | kg/m3 | Input |
water | quantitative | kg/m3 | Input |
superplasticizer | quantitative | kg/m3 | Input |
coarse aggregate | quantitative | kg/m3 | Input |
fine aggregate | quantitative | kg/m3 | Input |
age | quantitative | Days | Input |
compressive strength | quantitative | MPa | Output |
Strength Classification | Threshold (MPa) | Count |
---|---|---|
very high strength | ≥60 | 62 |
high strength | [41, 59.99] | 215 |
normal strength | [30, 40.99] | 250 |
weak | [20, 29.99] | 190 |
very weak | <20 | 194 |
Regression Models | Classification Models |
---|---|
linear regression k-nearest neighbors (KNN) regression decision tree regression RF regression gradient boosting regression AdaBoost regression neural network | RF classifier logistic regression SVM k-nearest neighbors (KNN) classifier bagging classifier |
Feature | Number | Min | Max | Range | Mean | Variance | Std Dev | |
---|---|---|---|---|---|---|---|---|
Training Set | Blast Furnace Slag | 728 | 0 | 342.1 | 342.1 | 71.75 | 7453.08 | 86.33 |
Fly Ash | 728 | 0 | 200.1 | 200.1 | 59.92 | 4102.09 | 64.05 | |
Superplasticizer | 728 | 0 | 22 | 22 | 6.06 | 27.27 | 5.22 | |
Age | 728 | 1 | 120 | 119 | 31.86 | 792.56 | 28.15 | |
Water_Cement_Ratio | 728 | 0.3 | 1.88 | 1.58 | 0.77 | 0.1 | 0.31 | |
Coarse_Fine_Ratio | 728 | 0.92 | 1.87 | 0.95 | 1.28 | 0.03 | 0.18 | |
Testing Set | Blast Furnace Slag | 183 | 0 | 305.3 | 305.3 | 70.21 | 7331.27 | 85.62 |
Fly Ash | 183 | 0 | 195 | 195 | 59.97 | 4437.47 | 66.61 | |
Superplasticizer | 183 | 0 | 22.1 | 22.1 | 5.88 | 28.08 | 5.3 | |
Age | 183 | 3 | 120 | 117 | 33.15 | 871.41 | 29.52 | |
Water_Cement_Ratio | 183 | 0.28 | 1.66 | 1.38 | 0.76 | 0.09 | 0.31 | |
Coarse_Fine_Ratio | 183 | 0.94 | 1.84 | 0.89 | 1.26 | 0.03 | 0.17 |
Regression | Classification | ||
---|---|---|---|
Model | Hyperparameters Considered | Model | Hyperparameters Considered |
Linear Regression | None (used ordinary least squares) | Logistic Regression | penalty (l1, l2), C (regularization strength), solver (saga) |
K-Nearest Neighbors | n_neighbors, metric, weights | Support Vector Machine | C (regularization), gamma (kernel coefficient), kernel (linear, rbf, poly, sigmoid), degree (if kernel = poly) |
Decision Tree Regressor | max_depth, min_samples_split, min_samples_leaf | k-Nearest Neighbors | n_neighbors, weights (uniform, distance), p (distance metric: 1 = Manhattan, 2 = Euclidean) |
Random Forest Regressor | n_estimators, max_depth, min_samples_split, min_samples_leaf, max_features | Random Forest Classifier | n_estimators, max_depth, min_samples_split, max_features |
Gradient Boosting Regressor | n_estimators, learning_rate, max_depth, subsample, min_samples_split | Bagging Classifier (with DT) | n_estimators, max_samples, max_features, bootstrap, bootstrap_features, estimator__max_depth, estimator__criterion (for DecisionTreeClassifier) |
AdaBoost Regressor | n_estimators, learning_rate, base_estimator (DT max_depth) | ||
Neural Network (MLP) | Number of layers, units per layer, activation, dropout rate, batch size, epochs, optimizer, learning_rate, L2 regularization |
Model | MSE | R2 |
---|---|---|
gradient boosting regressor | 15.79 | 0.94 |
RF regressor | 21.61 | 0.91 |
neural network model | 24.20 | 0.90 |
AdaBoost | 24.27 | 0.90 |
k-nearest neighbors | 39.88 | 0.84 |
decision tree regressor | 42.67 | 0.83 |
linear regression | 71.25 | 0.69 |
Model | Balanced Accuracy | Weighted Accuracy | Weighted Avg Precision | Weighted Avg Recall | Weighted Avg F1-Score |
---|---|---|---|---|---|
RF classifier | 0.74 | 0.73 | 0.76 | 0.75 | 0.75 |
logistic regression | 0.63 | 0.62 | 0.63 | 0.64 | 0.63 |
SVM classifier | 0.76 | 0.78 | 0.80 | 0.80 | 0.80 |
KNN | 0.62 | 0.53 | 0.69 | 0.69 | 0.68 |
bagging with decision trees | 0.77 | 0.78 | 0.77 | 0.76 | 0.76 |
Model | Hyperparameters Considered | Initial/Default Values | Hyperparameter Tuning Method | Best/Tuned Values |
---|---|---|---|---|
GRB | n_estimators, learning_rate, “max_depth, subsample, min_samples_split | n_estimators = 100, learning_rate = 0.1, max_depth = 3, subsample = 1.0, min_samples_split = 2 | Bayesian Optimization | n_estimators = 500, learning_rate = 0.2057, max_depth = 10, subsample = 0.5, min_samples_split = 0.242 |
SVM | C (regularization), gamma (kernel coefficient), kernel (linear, rbf, poly, sigmoid), degree (if kernel = poly) | C = 1.0, kernel = ‘rbf’, gamma = ‘scale’, degree = 3 | Bayesian Optimization (BayesSearchCV) | C ≈ 5.68 × 105, gamma ≈ 0.1434, kernel = ’rbf’, degree = 5 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Nikoopayan Tak, M.S.; Feng, Y.; Mahgoub, M. Advanced Machine Learning Techniques for Predicting Concrete Compressive Strength. Infrastructures 2025, 10, 26. https://doi.org/10.3390/infrastructures10020026
Nikoopayan Tak MS, Feng Y, Mahgoub M. Advanced Machine Learning Techniques for Predicting Concrete Compressive Strength. Infrastructures. 2025; 10(2):26. https://doi.org/10.3390/infrastructures10020026
Chicago/Turabian StyleNikoopayan Tak, Mohammad Saleh, Yanxiao Feng, and Mohamed Mahgoub. 2025. "Advanced Machine Learning Techniques for Predicting Concrete Compressive Strength" Infrastructures 10, no. 2: 26. https://doi.org/10.3390/infrastructures10020026
APA StyleNikoopayan Tak, M. S., Feng, Y., & Mahgoub, M. (2025). Advanced Machine Learning Techniques for Predicting Concrete Compressive Strength. Infrastructures, 10(2), 26. https://doi.org/10.3390/infrastructures10020026