Rapid Visual Screening Feature Importance for Seismic Vulnerability Ranking via Machine Learning and SHAP Values
Abstract
:1. Introduction
2. Materials and Methods
2.1. Dataset Description
- Black: Structures that suffered total or partial collapse during the earthquake, potentially leading to loss of human life.
- Red: Structures with significant damage to their structural members.
- Yellow: Structures with moderate damage to the structural members, potentially including extended damage to nonstructural elements.
- Green: Structures that suffered very little or no damage.
- Free ground level (Pilotis), soft storeys and/or short columns: In general, this attribute pertains to structures wherein a storey has significantly less structural rigidity than the rest. For example, this can manifest on the ground floor (pilotis) when it has greater height than the typical structure storey, or when the wall fillings do not cover the whole height of a storey, effectively reducing the active height of the adjacent columns.
- Wall fillings regularity: This indicates whether the infill walls are of sufficient thickness and with few openings. The presence of such wall fillings is beneficial to the structure’s overall seismic response, as during an earthquake they act as diagonal struts that support the surrounding frames.
- Absence of design seismic codes: In Greece, this pertains to pre-1960 structures which were not designed following a dedicated seismic code.
- Poor condition: Very high or non-uniform ground sinking, concrete with aggregate segregation or erosion, or corrosion in the reinforcement bars are examples of maintenance-related factors that can reduce the seismic capacity of a building.
- Previous damage: This pertains to structures which had suffered previous earthquake damages that was not adequately repaired. Although this is distinct feature from “poor condition”, it causes a similar reduction in the nominal seismic capacity of the building.
- Significant height: This describes structures with five or more storeys.
- Irregularity in height: This describes structures with a discontinuity in the vertical path of the loads.
- Irregularity in plan: This pertains to structures with floor plans that significantly deviate from a rectangular shape, e.g., floor plans with highly acute angles in their outer walls or with E, Z, or H-shapes. Irregularity in height, plan, or both can cause excess seismic overload on the building.
- Torsion: This affects structures with high horizontal eccentricity, which are subjected to torsion during the earthquake.
- Pounding: If adjacent buildings do not have a sufficient gap between them, and especially if they have different heights, then the floor slabs of one building can ram into the columns of the other.
- Heavy nonstructural elements: These elements can potentially create eccentricities if they are displaced during an earthquake, leading to additional torsion. This is because even though these are nonstructural elements, they can often contribute to the total mass and horizontal stiffness of the structure.
- Foundation Soil: The Greek Code for Seismic Resistant Structures–EAK 200 [3] classifies soils into categories A, B, C, D, and X. Class A refers to rock or semi-rock formations extending in wide area and large depth. Class B refers to strongly weathered rocks or soils mechanically equivalent to granular materials. Classes C and D refer to granular materials and soft clay, respectively, while class X refers to loose fine-grained silt [3]. In [20], as well as in the present study, soils in EAK category A are classified as S1, while those in category B are classified as S2; soils in EAK categories C, D, and X were not encountered.
- The design Seismic Code: This feature describes the seismic code(s) that the structures adhered to at the time of their design. Specifically, structures that were built before 1984 are classified as RC1, buildings constructed between 1985 and 1994 are labeled RC2, and buildings constructed after 1995 are labeled RC3, as the Greek state introduced updated seismic codes at these milestones.
2.2. Data Preprocessing
2.3. Machine Learning Algorithm
Algorithm 1 Gradient Boosting Learning Process [35] |
Initialize for do:
end for |
2.4. Hyperparameter Tuning
2.5. SHAP
3. Results
3.1. Binary Classifiers and Hyperparameter Tuning
- max_depth: This is the maximum allowed depth of each individual Decision Tree; too large or too small values can lead to overfitting or underfitting, respectively [48].
- n_estimators: This is the number of individual Decision Trees used in Gradient Boosting.
- min_samples_leaf: This is the minimum number of samples that must remain in an end node (leaf) of each individual tree.
- learning_rate: This controls the contribution of each individual tree, as shown in Algorithm 1. If the value is too large, the algorithm might overfit; however, a lower learning rate has the trade-off that more trees are required to reach the desired accuracy.
3.2. Feature Importance
- Distinction between Red (ULS) and Black (Collapse): As can be seen from the left part of Figure 4a, the most crucial factor overall for the Collapse Limit State is the presence of soft storeys and/or short columns, with a weight of approximately . The presence of regular infill panel walls, however, has an almost equal in magnitude, but a positive effect, which is why the corresponding bar in the figure is hatched. This is an important feature that helped prevent structures that crossed the ULS to cross the CLS as well. Finally, the absence of design seismic codes, the number of storeys in the structure, and the presence of an irregular plan all play import roles for this damage threshold.The right part of this feature displays an important distinction, as the absence of design seismic codes is now the dominant feature, even if only slightly. This can be explained in the following way. The absence of design seismic codes feature is indeed a crucial factor, as is well known in the literature, and the model assigns high SHAP values to it. However, not many structures were affected by this feature. Of the 452 structures in our dataset, only 26 lacked a design seismic code. Of these, 20 () crossed the ULS, and 19 of those () crossed the CLS as well. Thus, by taking the squares of the SHAP values, as per the right figure of Figure 4a, we assign more weight to these extreme SHAP values even though they pertained to only a limited number of cases. It is important to note that there is not a noteworthy distinction in the other factors, such as soft storeys/short columns, regularity of the infill panel walls, or structure height, between the left and right subfigures of Figure 4a, as the corresponding SHAP values are more balanced.
- Distinction between Yellow (SLS) and Red (ULS): As can be seen from Figure 4b, the most important features by far are the presence of soft storeys and/or short columns as well as the presence of regular infill wall panels. Soft storeys/short columns had a detrimental effect, accounting for approximately of the total. On the other hand, regular infill wall panels had a beneficial effect with approximately equal magnitude. This is in agreement with the established engineering literature, as bricks walls help to reduce storey drift, and consequently decrease the overall degree of damage. The absence of design seismic codes did not play an important role in this case, as most structures that displayed this feature crossed the CLS as well, as mentioned above. Pounding, on the other hand, had a contribution of approximately . The height of the structure and potential preexisting poor condition accounted for 7–8% each. Out of the thirteen total features, these five combined to account for approximately of the total in the model’s predictions. Finally, we note that in this case the SHAP values are balanced, as the left and right subfigures, using and , respectively, show minimal differences.
- Distinction between Green (minimal to no damage) and Yellow (SLS): Finally, the results for the distinction between structures that crossed the SLS (Yellow) and those that suffered minimal to no damage are shown in Figure 4c. It can be seen that the most important factors here are the existence and type of design seismic codes, each of which account for approximately of the total. This is in agreement with the post-1985 Greek seismic codes, which enforce lower damage degrees for the same earthquake design. Regular infill panel walls, soft storeys and/or short columns, and the presence of adjacent structures that could lead to pounding were relevant here, although the magnitude of their effect was only approximately .
4. Summary and Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
RVSP | Rapid Visual Screening Procedure |
ML | Machine Learning |
SLS | Serviceability Limit State |
ULS | Ultimate Limit State |
CLS | Collapse Limit State |
SHAP | SHapley Additive exPlanations |
References
- Palermo, V.; Tsionis, G.; Sousa, M.L. Building Stock Inventory to Assess Seismic Vulnerability Across Europe; Publications Office of the European Union: Luxembourg, 2018. [Google Scholar]
- Federal Emergency Management Agency (US). Rapid Visual Screening of Buildings for Potential Seismic Hazards: A Handbook; Government Printing Office: Washington, DC, USA, 2017. [Google Scholar]
- Greek Code for Seismic Resistant Structures–EAK. 2000. Available online: https://iisee.kenken.go.jp/worldlist/23_Greece/23_Greece_Code.pdf (accessed on 3 January 2024).
- Lizundia, B.; Durphy, S.; Griffin, M.; Holmes, W.; Hortacsu, A.; Kehoe, B.; Porter, K.; Welliver, B. Update of FEMA P-154: Rapid visual screening for potential seismic hazards. In Improving the Seismic Performance of Existing Buildings and Other Structures; American Society of Civil Engineers: Reston, VA, USA, 2015; pp. 775–786. [Google Scholar]
- Vulpe, A.; Carausu, A.; Vulpe, G.E. Earthquake induced damage quantification and damage state evaluation by fragility and vulnerability models. In Proceedings of the SMiRT 16, Washington, DC, USA, 12–17 August 2001. [Google Scholar]
- NEHRP Handbook for the Seismic Evaluation of Existing Buildings. Available online: https://www.preventionweb.net/files/7543_SHARPISDRFLOOR120081209171548.pdf (accessed on 3 January 2024).
- Rossetto, T.; Elnashai, A. Derivation of vulnerability functions for European-type RC structures based on observational data. Eng. Struct. 2003, 25, 1241–1263. [Google Scholar] [CrossRef]
- Eleftheriadou, A.; Karabinis, A. Damage probability matrices derived from earthquake statistical data. In Proceedings of the 14th World Conference on Earthquake Engineering, Beijing, China, 12–17 October 2008; pp. 07–0201. [Google Scholar]
- Chieffo, N.; Formisano, A.; Louren, P.B. Seismic vulnerability procedures for historical masonry structural aggregates: Analysis of the historical centre of Castelpoto (South Italy). Structures 2023, 48, 852–866. [Google Scholar] [CrossRef]
- Chieffo, N.; Fasan, M.; Romanelli, F.; Formisano, A.; Mochi, G. Physics-based ground motion simulations for the prediction of the seismic vulnerability of masonry building compounds in Mirandola (Italy). Buildings 2021, 11, 667. [Google Scholar] [CrossRef]
- Scala, S.A.; Gaudio, C.D.; Verderame, G.M. Influence of construction age on seismic vulnerability of masonry buildings damaged after 2009 L’Aquila earthquake. Soil Dyn. Earthq. Eng. 2022, 157, 107199. [Google Scholar] [CrossRef]
- Scala, S.A.; Gaudio, C.D.; Verderame, G.M. Towards a multi-parametric fragility model for Italian masonry buildings based on the informative level. Structures 2024, 59, 105613. [Google Scholar] [CrossRef]
- Harirchian, E.; Kumari, V.; Jadhav, K.; Das, R.R.; Rasulzade, S.; Lahmer, T. A Machine Learning Framework for Assessing Seismic Hazard Safety of Reinforced Concrete Buildings. Appl. Sci. 2020, 10, 7153. [Google Scholar] [CrossRef]
- Sajan, K.; Bhusal, A.; Gautam, D.; Rupakhety, R. Earthquake damage and rehabilitation intervention prediction using machine learning. Eng. Fail. Anal. 2023, 144, 106949. [Google Scholar]
- Luo, H.; Paal, S.G. A locally weighted machine learning model for generalized prediction of drift capacity in seismic vulnerability assessments. Comput.-Aided Civ. Infrastruct. Eng. 2019, 34, 935–950. [Google Scholar] [CrossRef]
- Kazemi, F.; Asgarkhani, N.; Jankowski, R. Machine learning-based seismic response and performance assessment of reinforced concrete buildings. Arch. Civ. Mech. Eng. 2023, 23, 94. [Google Scholar] [CrossRef]
- Burkart, N.; Huber, M.F. A survey on the explainability of supervised machine learning. J. Artif. Intell. Res. 2021, 70, 245–317. [Google Scholar] [CrossRef]
- Mangalathu, S.; Karthikeyan, K.; Feng, D.-C.; Jeon, J.-S. Machine-learning interpretability techniques for seismic performance assessment of infrastructure systems. Eng. Struct. 2022, 250, 112883. [Google Scholar] [CrossRef]
- Futagami, K.; Fukazawa, Y.; Kapoor, N.; Kito, T. Pairwise acquisition prediction with shap value interpretation. J. Financ. Data Sci. 2021, 7, 22–44. [Google Scholar] [CrossRef]
- Karabinis, A. Calibration of Rapid Visual Screening in Reinforced Concrete Structures Based on Data after a Near Field Earthquake (7.9.1999 Athens-Greece). 2004. Available online: https://oasp.gr/sites/default/files/program_documents/261%20-%20Teliki%20ekthesi.pdf (accessed on 20 March 2024).
- Ruggieri, S.; Cardellicchio, A.; Leggieri, V.; Uva, G. Machine-learning based vulnerability analysis of existing buildings. Autom. Constr. 2021, 132, 103936. [Google Scholar] [CrossRef]
- Karampinis, I.; Iliadis, L. A Machine Learning Approach for Seismic Vulnerability Ranking. In Proceedings of the International Conference on Engineering Applications of Neural Networks, León, Spain, 14–17 June 2023; Springer: Berlin/Heidelberg, Germany, 2023; pp. 3–16. [Google Scholar]
- Elrahman, S.M.A.; Abraham, A. A review of class imbalance problem. J. Netw. Innov. Comput. 2013, 1, 332–340. [Google Scholar]
- Longadge, R.; Dongre, S. Class imbalance problem in data mining review. arXiv 2013, arXiv:1305.1707. [Google Scholar]
- Maheshwari, S.; Jain, R.; Jadon, R. A review on class imbalance problem: Analysis and potential solutions. Int. J. Comput. Issues (IJCSI) 2017, 14, 43–51. [Google Scholar]
- Satyasree, K.; Murthy, J. An exhaustive literature review on class imbalance problem. Int. J. Emerg. Trends Technol. Comput. Sci. 2013, 2, 109–118. [Google Scholar]
- Bansal, A.; Jain, A. Analysis of focussed under-sampling techniques with machine learning classifiers. In Proceedings of the 2021 IEEE/ACIS 19th International Conference on Software Engineering Research, Management and Applications (SERA), Las Vegas, NE, USA, 22–25 May 2022; IEEE: Piscataway, NJ, USA, 2021; pp. 91–96. [Google Scholar]
- Mohammed, R.; Rawashdeh, J.; Abdullah, M. Machine learning with oversampling and undersampling techniques: Overview study and experimental results. In Proceedings of the 2020 11th International Conference on Information and Communication Systems (ICICS), Irbid, Jordan, 7–9 April 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 243–248. [Google Scholar]
- Newaz, A.; Hassan, S.; Haq, F.S. An empirical analysis of the efficacy of different sampling techniques for imbalanced classification. arXiv 2022, arXiv:2208.11852. [Google Scholar]
- Hasanin, T.; Khoshgoftaar, T. The effects of random undersampling with simulated class imbalance for big data. In Proceedings of the 2018 IEEE International Conference on Information Reuse and Integration (IRI), Salt Lake City, UT, USA, 6–9 July 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 70–79. [Google Scholar]
- Liu, B.; Tsoumakas, G. Dealing with class imbalance in classifier chains via random undersampling. Knowl.-Based Syst. 2020, 192, 105292. [Google Scholar] [CrossRef]
- Liu, Y.; Li, X.; Kong, A.W.K.; Goh, C.K. Learning from small data: A pairwise approach for ordinal regression. In Proceedings of the 2016 IEEE Symposium Series on Computational Intelligence (SSCI), Athens, Greece, 6–9 December 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 1–6. [Google Scholar]
- Natekin, A.; Knoll, A. Gradient boosting machines, a tutorial. Front. Neurorobotics 2013, 7, 21. [Google Scholar] [CrossRef]
- Kingsford, C.; Salzberg, S.L. What are decision trees? Nat. Biotechnol. 2008, 26, 1011–1013. [Google Scholar] [CrossRef] [PubMed]
- Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Zhang, Y.; Haghani, A. A gradient boosting method to improve travel time prediction. Transp. Res. Part C Emerg. Technol. 2015, 58, 308–324. [Google Scholar] [CrossRef]
- Hawkins, D.M. The problem of overfitting. J. Chem. Comput. Sci. 2004, 44, 1–12. [Google Scholar]
- Bengio, Y. Practical recommendations for gradient-based training of deep architectures. In Neural Networks: Tricks of the Trade, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2012; pp. 437–478. [Google Scholar]
- Yang, L.; Shami, A. On hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing 2020, 415, 295–316. [Google Scholar] [CrossRef]
- Yu, T.; Zhu, H. Hyper-parameter optimization: A review of algorithms and applications. arXiv 2020, arXiv:2003.05689. [Google Scholar]
- Snoek, J.; Larochelle, H.; Adams, R.P. Practical bayesian optimization of machine learning algorithms. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NE, USA, 3–6 December 2012; Volume 25. [Google Scholar]
- Head, T.; MechCoder; Louppe, G.; Shcherbatyi, I.; fcharras; Vinícius, Z.; cmmalone; Schröder, C.; nel215; Campos, N.; et al. scikit-optimize/scikit-optimize: V0. 5.2. Version v0. 2018. Available online: https://zenodo.org/records/1207017 (accessed on 20 March 2024).
- Shapley, L.S. Notes on the n-Person Game: The Value of an n-Person Game; RAND Corporation: Santa Monica, CA, USA, 1951; Volume 7. [Google Scholar]
- Lundberg, S.M.; Erion, G.G.; Lee, S.-I. Consistent individualized feature attribution for tree ensembles. arXiv 2018, arXiv:1802.03888. [Google Scholar]
- Lundberg, S.M.; Lee, S.-I. A unified approach to interpreting model predictions. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Lundberg, S.M.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J.M.; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.; Lee, S.-I. From local explanations to global understanding with explainable ai for trees. Nat. Mach. 2020, 2, 2522–5839. [Google Scholar] [CrossRef]
- Zharmagambetov, A.; Hada, S.S.; Carreira-Perpiñán, M.Á.; Gabidolla, M. An experimental comparison of old and new decision tree algorithms. arXiv 2019, arXiv:1911.03054. [Google Scholar]
- Browne, M. Cross-validation methods. J. Math. Psychol. 2000, 44, 108–132. [Google Scholar] [CrossRef] [PubMed]
- Ferrer, L. Analysis and comparison of classification metrics. arXiv 2022, arXiv:2209.05355. [Google Scholar]
Pair | Damage Threshold | Number of Structures | Samples in the Transformed Dataset |
---|---|---|---|
(Green, Yellow) | Serviceability Limit State | (92, 69) | 6348 |
(Yellow, Red) | Ultimate Limit State | (69, 102) | 7038 |
(Red, Black) | Collapse Limit State | (102, 90) | 9180 |
Hyperparameter | Tuning Range | Optimal Value per Pair | ||
---|---|---|---|---|
(Green, Yellow) | (Yellow, Red) | (Red, Black) | ||
max_depth | [3, 11] | 3 | 5 | 3 |
n_estimators | [50, 300] | 297 | 50 | 293 |
min_samples_leaf | [1, 10] | 9 | 8 | 10 |
learning_rate | [0.05, 0.25] | 0.086887 | 0.120314 | 0.182278 |
(Green, Yellow) | (Yellow, Red) | (Red, Black) | ||||
---|---|---|---|---|---|---|
−1 | +1 | −1 | +1 | −1 | +1 | |
Precision | 0.69585 | 0.76301 | 0.90943 | 0.86933 | 0.93992 | 0.90379 |
Recall | 0.73221 | 0.72928 | 0.84995 | 0.92183 | 0.88501 | 0.95024 |
F1-score | 0.71357 | 0.74576 | 0.87869 | 0.89481 | 0.91164 | 0.92644 |
Accuracy | 0.73062 | 0.88732 | 0.91972 | |||
AUC | 0.81451 | 0.95232 | 0.98128 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Karampinis, I.; Iliadis, L.; Karabinis, A. Rapid Visual Screening Feature Importance for Seismic Vulnerability Ranking via Machine Learning and SHAP Values. Appl. Sci. 2024, 14, 2609. https://doi.org/10.3390/app14062609
Karampinis I, Iliadis L, Karabinis A. Rapid Visual Screening Feature Importance for Seismic Vulnerability Ranking via Machine Learning and SHAP Values. Applied Sciences. 2024; 14(6):2609. https://doi.org/10.3390/app14062609
Chicago/Turabian StyleKarampinis, Ioannis, Lazaros Iliadis, and Athanasios Karabinis. 2024. "Rapid Visual Screening Feature Importance for Seismic Vulnerability Ranking via Machine Learning and SHAP Values" Applied Sciences 14, no. 6: 2609. https://doi.org/10.3390/app14062609
APA StyleKarampinis, I., Iliadis, L., & Karabinis, A. (2024). Rapid Visual Screening Feature Importance for Seismic Vulnerability Ranking via Machine Learning and SHAP Values. Applied Sciences, 14(6), 2609. https://doi.org/10.3390/app14062609