Data Imputation of Soil Pressure on Shield Tunnel Lining Based on Random Forest Model
Abstract
:1. Introduction
2. Engineering Background
- Evaluation of Deformation and Initial Internal Forces for Shield Tunnel Structures: During the construction phase, a shield tunnel structure experiences upward shape deformation, resulting in the generation of initial internal forces in the segment and bolts. Precisely understanding the changes in initial internal forces during the construction phase for a shield tunnel structure is the fundamental basis for assessing the mechanical performance of the structure during the operation phase. Currently, shield tunnel segment design methods mainly include the traditional method, modified traditional method, and multi-hinge ring method. These methods primarily focus on analyzing the stress of segments during the regular service period and do not consider the impact that different factors may have on segment stress during the construction phase. Significant experience in shield tunnel engineering has shown that it takes some time for the segments to be assembled into rings and for the grout to solidify, with joint resistance and external ground pressure affecting the load transfer mechanism. Thus, it is essential to systematically study the mechanical behavior of segments considering the soil–segment–jack interactions and their key influencing factors. Furthermore, analyzing the ground disturbance caused by the shield excavation and tunnel additional settlement can accurately evaluate the initial stress state of the tunnel structure.
- Long-term Deformation and Mechanical Performance Evolution of Shield Tunnel Structures During the Operational Phase: The water level of the Fuchun River fluctuates seasonally, causing changes in the external confining pressure on the tunnel and affecting the deformation and internal forces of the tunnel structure. This can result in water leakage through the segments, compromising the overall stability of the tunnel structure and causing damage to electrical equipment inside the tunnel. Moreover, localized water leakage can lead to weathering and detachment of the concrete lining. Corrosive substances present in the water can accelerate the deterioration of reinforced concrete structures, reducing their load-bearing capacity. Furthermore, during the operational phase, complex traffic loads significantly influence the mechanical performance of the segments. Interactions between vehicles, trains, and the road surface or tracks cause vibrations in the shield tunnel structure, which are transmitted to the surrounding soil. This process can cause plastic deformation and result in the buildup and release of pore pressure within the soil. Consequently, both the tunnel structure and the surrounding soil undergo uneven settlement and deformation. In addition to the natural degradation of the structure over time, cumulative damage from long-term traffic loads can eventually lead to cracking, leakage, and misalignment of the tunnel structure. Moreover, shield tunnel structures constructed in combination with public transport systems feature complex internal structures such as roadway slabs and sidewalls. Traditional research methods that simplify these structures into single circular rings are unable to accurately analyze the dynamic response characteristics and deformation patterns of large-section tunnel structures with complex internal spatial divisions. Therefore, there is an urgent need to study the long-term deformation mechanisms and mechanical performance evolution of segments under the combined effects of water level fluctuations and traffic loads.
3. Monitoring System
4. Random Forest (RF) Introduction
4.1. Development History
4.2. Principle of Random Forest
4.3. Advantages of Random Forest
- Due to the adoption of an ensemble algorithm, Random Forest itself has higher accuracy than most individual algorithms;
- It performs well on the test set. The introduction of two random elements makes Random Forest less prone to overfitting and provides a certain level of noise resistance, giving it an advantage over other algorithms;
- When trees are combined in Random Forest, it can accommodate nonlinear data and itself represents a nonlinear classification (fitting) model;
- It can handle high-dimensional data without the need for feature selection and show robustness to the dataset. Furthermore, it can handle both discrete and continuous inputs without normalization of the data;
- Owing to its Out-of-Bag (OOB) error estimate, it can obtain an unbiased assessment of the true error during the model building process without discarding any training data. During training, Random Forest can identify interactions amid features and determine each feature’s significance, thereby providing a valuable reference;
- As each tree within Random Forest is generated independently and concurrently, it is easy to parallelize the process, and it demonstrates fast training speeds to fit large-scale datasets.
5. Experiment
5.1. Introduction to Measured/Tested Data
5.2. Random Forest-Based Imputation Method
5.3. Comparison of Imputation Methods and Evaluation Metrics
5.4. Parameter Selection and Optimization
5.5. Imputation Results and Comprehensive Evaluation
6. Conclusions
- According to the field monitoring results, it is evident that the soil pressure exhibits minimal fluctuations in the first 25 min, followed by a gradual decline, indicating a non-linear variation;
- The Random Forest model demonstrates optimal performance and achieves the minimum mean squared error (MSE) when the following parameters are set: 200 decision trees, a maximum depth of 4 for each tree, and the consideration of a maximum of four features during each node split;
- As the missing proportion increases, the imputation errors of the models based on median and mean imputation methods also increase, while the error of the model based on Random Forest remains around 0.00025. It is evident that the Random Forest method outperforms median and mean imputation methods. At a missing proportion of 60%, the difference in errors reaches approximately 70%;
- Comparing the interpolated results with the original data through plots shows that the Random Forest-based imputation method can effectively handle multidimensional data obtained from sensor monitoring. It can provide reasonable predictions to fill in the missing parts of the dataset.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Wang, G.; Fang, Q.; Du, J.; Wang, J.; Li, Q. Deep learning-based prediction of steady surface settlement due to shield tunnelling. Autom. Constr. 2023, 154, 105006. [Google Scholar] [CrossRef]
- Ding, Y.; Wei, Y.J.; Xi, P.S.; Ang, P.P.; Han, Z. A long-term tunnel settlement prediction model based on BO-GPBE with SHM data. Smart Struct. Syst. 2024, 33, 17–26. [Google Scholar]
- Elbaz, K.; Shen, S.L.; Zhou, A.; Yin, Z.Y.; Lyu, H.M. Prediction of disc cutter life during shield tunneling with AI via the incorporation of a genetic algorithm into a GMDH-type neural network. Engineering 2021, 7, 238–251. [Google Scholar] [CrossRef]
- Ren, Y.; Zhang, C.; Zhu, M.; Chen, R.; Wang, J. Significance and formulation of ground loss in tunneling-induced settlement prediction: A data-driven study. Acta Geotech. 2023, 18, 4941–4956. [Google Scholar] [CrossRef]
- Moghtader, T.; Sharafati, A.; Naderpour, H.; Gharouni Nik, M. Estimating Maximum Surface Settlement Caused by EPB Shield Tunneling Utilizing an Intelligent Approach. Buildings 2023, 13, 1051. [Google Scholar] [CrossRef]
- Kussul, N.; Lavreniuk, M.; Skakun, S.; Shelestov, A. Deep learning classification of land cover and crop types using remote sensing data. IEEE Geosci. Remote Sens. Lett. 2017, 14, 778–782. [Google Scholar] [CrossRef]
- Emmanuel, T.; Maupong, T.; Mpoeleng, D.; Semong, T.; Mphago, B.; Tabona, O. A survey on missing data in machine learning. J. Big Data 2021, 8, 140. [Google Scholar] [CrossRef]
- Psuj, G. Multi-sensor data integration using deep learning for characterization of defects in steel elements. Sensors 2018, 18, 292. [Google Scholar] [CrossRef]
- O’Brien, D.; Osborne, J.A.; Perez-Duenas, E.; Cunningham, R.; Li, Z. Automated crack classification for the CERN underground tunnel infrastructure using deep learning. Tunn. Undergr. Space Technol. 2023, 131, 104668. [Google Scholar]
- Rubin, D.B. Inference and missing data. Biometrika 1976, 63, 581–592. [Google Scholar] [CrossRef]
- Little, R.J.; Rubin, D.B. Statistical Analysis with Missing Data; John Wiley & Sons: Hoboken, NJ, USA, 2019; p. 793. [Google Scholar]
- Strike, K.; El Emam, K.; Madhavji, N. Software cost estimation with incomplete data. IEEE Trans. Softw. Eng. 2001, 27, 890–908. [Google Scholar] [CrossRef]
- Ding, Y.; Hang, D.; Wei, Y.J.; Zhang, X.L.; Ma, S.Y.; Liu, Z.X.; Zhou, S.X.; Han, Z. Settlement prediction of existing metro induced by new metro construction with machine learning based on SHM data: A comparative study. J. Civ. Struct. Health Monit. 2023, 13, 1447–1457. [Google Scholar] [CrossRef]
- Newman, D.A. Missing data: Five practical guidelines. Organ. Res. Methods 2014, 17, 372–411. [Google Scholar] [CrossRef]
- Lin, W.C.; Tsai, C.F. Missing value imputation: A review and analysis of the literature (2006–2017). Artif. Intell. Rev. 2020, 53, 1487–1509. [Google Scholar] [CrossRef]
- Enders, C.K. The Relative Performance of Full-Information Maximum Likelihood Estimation for Missing Data in Structural Equation Models; The University of Nebraska-Lincoln: Lincoln, NE, USA, 1999. [Google Scholar]
- Geleris, J.; Sun, Y.; Platt, J.; Zucker, J.; Baldwin, M.; Hripcsak, G.; Schluger, N.W. Observational study of hydroxychloroquine in hospitalized patients with COVID-19. N. Engl. J. Med. 2020, 382, 2411–2418. [Google Scholar] [CrossRef] [PubMed]
- Redfield, M.M.; Chen, H.H.; Borlaug, B.A.; Semigran, M.J.; Lee, K.L.; Lewis, G.; Braunwald, E. Effect of phosphodiesterase-5 inhibition on exercise capacity and clinical status in heart failure with preserved ejection fraction: A randomized clinical trial. JAMA 2013, 309, 1268–1277. [Google Scholar] [CrossRef] [PubMed]
- Khatti, J.; Samadi, H.; Grover, K.S. Estimation of settlement of pile group in clay using soft computing techniques. Geotech. Geol. Eng. 2023, 9, 1–32. [Google Scholar] [CrossRef]
- Davey, A. Statistical Power Analysis with Missing Data: A Structural Equation Modeling Approach; Routledge: London, UK, 2009. [Google Scholar]
- Ninić, J.; Gamra, A.; Ghiassi, B. Real-time assessment of tunnelling-induced damage to structures within the building information modelling framework. Undergr. Space 2024, 14, 99–117. [Google Scholar] [CrossRef]
- Hu, Y.H.; Tsai, C.F. An investigation of solutions for handling incomplete online review datasets with missing values. J. Exp. Theor. Artif. Intell. 2022, 34, 971–987. [Google Scholar] [CrossRef]
- Garciarena, U.; Santana, R. An extensive analysis of the interaction between missing data types, imputation methods, and supervised classifiers. Expert Syst. Appl. 2017, 89, 52–65. [Google Scholar] [CrossRef]
- Tung, T.M.; Yaseen, Z.M. A survey on river water quality modelling using artificial intelligence models: 2000–2020. J. Hydrol. 2020, 585, 124670. [Google Scholar]
- Zhang, L.; Wen, J.; Li, Y.; Chen, J.; Ye, Y.; Fu, Y.; Livingood, W. A review of machine learning in building load prediction. Appl. Energy 2021, 285, 116452. [Google Scholar] [CrossRef]
- Chen, Z.Y.; Zhang, T.H.; Zhang, R.; Zhu, Z.M.; Yang, J.; Chen, P.Y.; Guo, Y. Extreme gradient boosting model to estimate PM2.5 concentrations with missing-filled satellite data in China. Atmos. Environ. 2019, 202, 180–189. [Google Scholar] [CrossRef]
- Kowarik, A.; Templ, M. Imputation with the R Package VIM. J. Stat. Softw. 2016, 74, 1–16. [Google Scholar] [CrossRef]
- Beaudoin, A.; Bernier, P.Y.; Guindon, L.; Villemaire, P.; Guo, X.J.; Stinson, G.; Hall, R.J. Mapping attributes of Canada’s forests at moderate resolution through kNN and MODIS imagery. Can. J. For. Res. 2014, 44, 521–532. [Google Scholar] [CrossRef]
- Jakobsen, J.C.; Gluud, C.; Wetterslev, J.; Winkel, P. When and how should multiple imputation be used for handling missing data in randomised clinical trials—A practical guide with flowcharts. BMC Med. Res. Methodol. 2017, 17, 162. [Google Scholar] [CrossRef] [PubMed]
- Seaman, S.R.; White, I.R. Review of inverse probability weighting for dealing with missing data. Stat. Methods Med. Res. 2013, 22, 278–295. [Google Scholar] [CrossRef] [PubMed]
- Sahota, J.K.; Gupta, N.; Dhawan, D. Fiber Bragg grating sensors for monitoring of physical parameters: A comprehensive review. Opt. Eng. 2020, 59, 060901. [Google Scholar] [CrossRef]
- Kinet, D.; Mégret, P.; Goossen, K.W.; Qiu, L.; Heider, D.; Caucheteur, C. Fiber Bragg grating sensors toward structural health monitoring in composite materials: Challenges and solutions. Sensors 2014, 14, 7394–7419. [Google Scholar] [CrossRef]
- Genuer, R.; Poggi, J.M.; Genuer, R.; Poggi, J.M. Random Forests; Springer International Publishing: Cham, Switzerland, 2020; pp. 33–55. [Google Scholar]
- Liaw, A.; Wiener, M. Classification and regression by randomForest. R News 2002, 2, 18–22. [Google Scholar]
- Cutler, D.R.; Edwards, T.C., Jr.; Beard, K.H.; Cutler, A.; Hess, K.T.; Gibson, J.; Lawler, J.J. Random forests for classification in ecology. Ecology 2007, 88, 2783–2792. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, M.; Ye, X.-W.; Ying, X.-H.; Jia, J.-D.; Ding, Y.; Zhang, D.; Sun, F. Data Imputation of Soil Pressure on Shield Tunnel Lining Based on Random Forest Model. Sensors 2024, 24, 1560. https://doi.org/10.3390/s24051560
Wang M, Ye X-W, Ying X-H, Jia J-D, Ding Y, Zhang D, Sun F. Data Imputation of Soil Pressure on Shield Tunnel Lining Based on Random Forest Model. Sensors. 2024; 24(5):1560. https://doi.org/10.3390/s24051560
Chicago/Turabian StyleWang, Min, Xiao-Wei Ye, Xin-Hong Ying, Jin-Dian Jia, Yang Ding, Di Zhang, and Feng Sun. 2024. "Data Imputation of Soil Pressure on Shield Tunnel Lining Based on Random Forest Model" Sensors 24, no. 5: 1560. https://doi.org/10.3390/s24051560
APA StyleWang, M., Ye, X. -W., Ying, X. -H., Jia, J. -D., Ding, Y., Zhang, D., & Sun, F. (2024). Data Imputation of Soil Pressure on Shield Tunnel Lining Based on Random Forest Model. Sensors, 24(5), 1560. https://doi.org/10.3390/s24051560