A Hybrid Deep Learning Model for Enhanced Structural Damage Detection: Integrating ResNet50, GoogLeNet, and Attention Mechanisms †
Abstract
:1. Introduction
2. System Model
2.1. Convolutional Neural Networks (CNNs)
2.2. ResNet50
2.3. GoogLeNet
3. Proposed Model
3.1. Convolutional Blocks
- Conv Block 1: The initial convolutional block uses 32 filters of size 3 × 3 to process the input image. This is followed by a ReLU activation function, which introduces non-linearity into the model to learn the more complex patterns. This block primarily detects fundamental features like edges and simple textures, as illustrated in Figure 6b, which serve as a foundation for deeper layers in the network.
- Conv Block 2: The output from the first block is passed to the second convolutional block, where the filter size is 3 × 3 with 64 filters. Similarly to the first block, a ReLU activation function is applied. This block builds upon the basic features captured in the first block, allowing the model to detect more complex structures and patterns within the images, such as corners and intricate textures. The refined edge detection result at this stage is shown in Figure 6c.
- Conv Block 3: The third convolutional block further increases the complexity of the feature extraction by applying 128 filters of size 3 × 3, followed by ReLU activation. This block is crucial for capturing high-level features that are directly related to structural damage, such as cracks, fractures, and deformations in the building structures.
3.2. Residual Block
3.3. Inception Module
3.4. Attention Mechanism (CBAM)
- Channel Attention: This component enhances the model’s ability to focus on the most informative feature channels within the feature map. The channel attention mechanism operates by applying both global average pooling and global max pooling across the spatial dimensions of the input feature map . This generates two separate context descriptors, which are then processed through a shared multi-layer perceptron (MLP) to capture channel-wise dependencies and create the final channel attention map. The steps of this process are detailed below.First, we apply global average pooling and global max pooling operations to , generating two channel-wise statistics, and :Next, these descriptors are passed through a shared MLP, which consists of two fully connected layers. The shared MLP generates intermediate feature representations for both average-pooled and max-pooled inputs. The two MLP outputs are then summed element-wise to produce the final channel attention map:To further expand on the shared MLP, we can represent it as a sequence of two fully connected layers. If the intermediate representation has d dimensions, we can define the MLP asand are weight matrices for the first and second fully connected layers, respectively,and are the bias terms for each layer,is the ReLU activation function, which introduces non-linearity after the first fully connected layer.Finally, the channel attention map is used to reweight the original feature map by element-wise multiplication:
- Spatial Attention: Following the channel attention, spatial attention is applied to focus on the most critical spatial locations within the feature map. This component applies average and max pooling operations, and then a convolutional layer, in order to generate the spatial attention map, as in Figure 7c. The spatial attention mechanism can be described by applying convolutional operations to the max-pooled and average-pooled feature maps, focusing on emphasizing the most significant spatial regions within the feature map. This results in the generation of a spatial attention map.
- Sequential Dual Attention Application: The dual attention mechanism operates sequentially within both architectures. Each feature map is first processed by the channel attention module to prioritize significant channels and then by the spatial attention module to refine the focus on relevant regions within the image. This two-step attention approach enables the model to capture intricate damage patterns by focusing both on the most meaningful channels and specific spatial locations within each feature map.
- Technical Feasibility: The CBAM is a lightweight module and, when applied after each main block in ResNet50 and GoogLeNet, adds minimal computational overhead. This design choice allows the model to leverage dual attention without a substantial increase in processing time or resource requirements, maintaining the efficiency of ResNet50 and GoogLeNet. The final architecture thus benefits from enriched feature representation, with an enhanced capacity for damage localization and detection.
3.5. Fully Connected Layers and Output
- Fully Connected Layer 1: This layer consists of 256 units utilizing the ReLU activation function, with a dropout rate of 50% incorporated to mitigate overfitting during training. It plays a crucial role in combining the features extracted by the previous layers and preparing them for the final classification.
- Fully Connected Layer 2: This layer contains 128 units with ReLU activation. It further reduces the dimensionality of the feature maps, ensuring that only the most relevant features are passed on to the output layer.
- Output Layer: The final layer of the network is a softmax output layer with two units, corresponding to a binary classification task of determining whether a structure is “damaged” or “undamaged”. The softmax function ensures that the outputs are interpretable as probabilities, summing to 1 across the two classes.
4. Workflow of the System Model
4.1. Data Collection and Filtering
- Bing Images: Bing’s search engine was utilized to retrieve images using targeted keywords like “earthquake structural damage”, “structural cracks”, and “building deformation”. The search was broadened with additional terms such as “flood-damaged buildings”, “tornado-damaged structures”, and “infrastructure failure”.
- Kaggle Dataset: A specialized dataset from Kaggle was employed (https://www.kaggle.com/datasets/arnavr10880/concrete-crack-images-for-classification/data (accessed on 15 January 2024)), focusing on images of concrete slabs, both cracked and uncracked, relevant for early damage detection.
- Public Repositories: Additional relevant images were sourced from public datasets, including the “Structural-Damage Image Captioning Dataset” repository (https://jstagedata.jst.go.jp/articles/dataset/Structural-Damage_Image_Captioning_Dataset/24736914 (accessed on 15 January 2024)), known for its well-curated collection categorized by damage type.
4.1.1. Outlier Detection and Removal
- Visual Inspection: A manual review was conducted to eliminate irrelevant, low-quality, or mislabeled images. Special attention was given to ensuring that undamaged structure images did not contain features resembling cracks.
- Statistical Analysis: Statistical analysis of pixel intensity distributions was performed. Images with significant deviations were flagged as potential outliers, and further feature-based analysis was used to detect anomalies, which were then removed to maintain data consistency.
4.1.2. Data Classification and Standardization
4.2. Data Augmentation
- Rotation: Images were rotated randomly within a range of to to help the model become invariant to the orientation of structural damage, a sample rotated image is shown in Figure 10b.
- Horizontal and Vertical Flipping: Both horizontal and vertical flips were applied to ensure that the model learned to recognize damage patterns irrespective of their orientation, a sample rotated image is shown in Figure 10c.
- Zooming: Zooming operations were applied randomly within a range of 0.8 to 1.2 times the original size, allowing the model to handle scale variation in damage features, a sample rotated image is shown in Figure 10d.
- Translation: Images were translated horizontally and vertically by up to 10% to enable the model to detect damage appearing in different image locations, a sample rotated image is shown in Figure 10e.
- Brightness Adjustment: In image-based structural damage detection, variations in illumination can significantly affect the model accuracy, as differences in lighting alter the contrast and visibility of critical features like cracks and deformations [32]. Rather than relying on preprocessing techniques like histogram equalization or normalization to adjust illumination, our approach introduces controlled brightness variation directly through data augmentation. Specifically, the brightness of images was adjusted within a range of 0.8 to 1.2, which exposed the model to a spectrum of lighting conditions during training. This augmentation strategy helps the model generalize to real-world scenarios where lighting can vary, thereby enhancing the robustness in detecting structural damage across diverse environments without the need for additional preprocessing steps. A sample of a brightness-improved image is shown in Figure 10f.
4.3. Data Splitting
4.4. Training and Validation
5. Results and Analysis
5.1. Performance of Pre-Trained Models
5.1.1. ResNet50 Performance
5.1.2. GoogLeNet Performance
5.1.3. Performance of Proposed Model
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Smith, J.; Doe, J. Challenges in Post-Disaster Structural Damage Assessment. Int. J. Disaster Manag. 2022, 15, 230–245. [Google Scholar]
- Minaee, S.; Boykov, Y.; Porikli, F.; Plaza, A.; Kehtarnavaz, N.; Terzopoulos, D. Image Segmentation Using Deep Learning: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 3523–3542. [Google Scholar] [CrossRef] [PubMed]
- Jiang, Z.; Li, Y.; Zhang, Y. Deep Learning Techniques for Automated Structural Damage Detection in Buildings. J. Comput. Civ. Eng. 2020, 34, 04020026. [Google Scholar]
- Wu, C.; Wong, K.; Lam, H. Automated Detection of Structural Damage in Buildings Using Convolutional Neural Networks. Eng. Struct. 2020, 209, 110020. [Google Scholar]
- Lee, H.; Kim, S. Crack Detection in Building Facades Using Convolutional Neural Networks. J. Struct. Eng. 2019, 145, 04019019. [Google Scholar]
- Yang, L.; Zhang, X.; Sun, J. Deep Learning-Based Crack Detection in UAV Images for Infrastructure Inspection. Autom. Constr. 2020, 109, 102992. [Google Scholar]
- Sun, X.; Hou, C.; Zhu, L. Deep Learning-Based Automated Inspection of Concrete Structures Using UAV. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; pp. 6411–6416. [Google Scholar]
- Zhang, Q.; Yang, J. Recent Advances in Image-Based Structural Damage Detection: A Review. J. Build. Eng. 2021, 35, 102112. [Google Scholar]
- Yuqing, Z.; Khalid, M. Structural Damage Recognition Using VGG16 Model. In Proceedings of the International Conference on Image Processing, Anchorage, AK, USA, 19–22 September 2021; pp. 234–238. [Google Scholar]
- Cao, W.; Li, Y.; He, Z. Crack Detection in Gusset Plate Welded Joints of Steel Bridges Using Deep Learning. J. Bridge Eng. 2021, 26, 04021001. [Google Scholar]
- Xiuying, W.; Li, X. Concrete Crack Detection Using ResNet101-Based Image Segmentation. Autom. Constr. 2020, 113, 103136. [Google Scholar]
- Zheng, J.; Wang, H.; Li, J. Rail Surface Crack Detection Using YOLOv3 and RetinaNet. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3421–3430. [Google Scholar]
- Guo, R.; Zhou, J. Pavement Crack Detection Using YOLOv5. IEEE Trans. Intell. Transp. Syst. 2021, 22, 4301–4310. [Google Scholar]
- Kong, X.; Li, J. Vision-Based Metal Fatigue Crack Detection Using Video Feature Tracking. IEEE Trans. Ind. Electron. 2020, 67, 4885–4893. [Google Scholar]
- Wilson, T.; Diogo, R. Deep Learning for Structural Damage Detection: A Case Study Using VGG16. Eng. Struct. 2021, 226, 111322. [Google Scholar]
- Wan, S.; Guan, S.; Tang, Y. Advancing Bridge Structural Health Monitoring: Insights into Knowledge-Driven and Data-Driven Approaches. J. Data Sci. Intell. Syst. 2023, 2, 129–140. [Google Scholar] [CrossRef]
- Khan, I.U.; Jeong, S.; Sim, S.H. Investigation of Issues in Data Anomaly Detection Using Deep-Learning- and Rule-Based Classifications for Long-Term Vibration Measurements. Appl. Sci. 2024, 14, 5476. [Google Scholar] [CrossRef]
- Mohamed, A.; El-Saadawi, M.; Sayed, T. Disaster Resilience: Deep Learning Applications in Post-Earthquake Structural Assessment. Nat. Hazards Rev. 2021, 22, 04020045. [Google Scholar]
- Rathinam, S.; Madhavan, P.; Sivaramakrishnan, C. Deep Learning for Post-Disaster Structural Damage Detection: A Review. J. Build. Pathol. Rehabil. 2020, 5, 1–16. [Google Scholar]
- Shabbir, A.; Ali, N.; Jameel, A.; Zafar, B.; Rasheed, A.; Sajid, M.; Ahmed, A.; Dar, S. Satellite and Scene Image Classification Based on Transfer Learning and Fine Tuning of ResNet50. Math. Probl. Eng. 2021, 2021. [Google Scholar] [CrossRef]
- Feng, C.; Zhang, H.; Wang, S.; Li, Y.; Wang, H.; Yan, F. Structural damage detection using deep convolutional neural network and transfer learning. KSCE J. Civ. Eng. 2019, 23, 4493–4502. [Google Scholar] [CrossRef]
- Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-Based Learning Applied to Document Recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
- Nair, V.; Hinton, G.E. Rectified Linear Units Improve Restricted Boltzmann Machines. In Proceedings of the 27th International Conference on Machine Learning (ICML), Omnipress 2010, Haifa, Israel, 21–24 June 2010; pp. 807–814. [Google Scholar]
- Huang, G.; Liu, Z.; van der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1097–1105. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Zagoruyko, S.; Komodakis, N. Wide Residual Networks. In Proceedings of the British Machine Vision Conference (BMVC), York, UK, 19–22 September 2016. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the 32nd International Conference on Machine Learning (ICML), Lile, France, 6–11 July 2015; pp. 448–456. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Civera, M.; Fragonara, L.Z.; Surace, C. Video Processing Techniques for the Contactless Investigation of Large Oscillations. J. Phys. Conf. Ser. 2019, 1249, 012004. [Google Scholar] [CrossRef]
- Prechelt, L. Early stopping-but when? Neural Netw. Tricks Trade 1998, 1524, 55–69. [Google Scholar]
- Powers, D.M. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. J. Mach. Learn. Technol. 2011, 2, 37–63. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Singh, V.; Baral, A.; Kumar, R.; Tummala, S.; Noori, M.; Yadav, S.V.; Kang, S.; Zhao, W. A Hybrid Deep Learning Model for Enhanced Structural Damage Detection: Integrating ResNet50, GoogLeNet, and Attention Mechanisms. Sensors 2024, 24, 7249. https://doi.org/10.3390/s24227249
Singh V, Baral A, Kumar R, Tummala S, Noori M, Yadav SV, Kang S, Zhao W. A Hybrid Deep Learning Model for Enhanced Structural Damage Detection: Integrating ResNet50, GoogLeNet, and Attention Mechanisms. Sensors. 2024; 24(22):7249. https://doi.org/10.3390/s24227249
Chicago/Turabian StyleSingh, Vikash, Anuj Baral, Roshan Kumar, Sudhakar Tummala, Mohammad Noori, Swati Varun Yadav, Shuai Kang, and Wei Zhao. 2024. "A Hybrid Deep Learning Model for Enhanced Structural Damage Detection: Integrating ResNet50, GoogLeNet, and Attention Mechanisms" Sensors 24, no. 22: 7249. https://doi.org/10.3390/s24227249
APA StyleSingh, V., Baral, A., Kumar, R., Tummala, S., Noori, M., Yadav, S. V., Kang, S., & Zhao, W. (2024). A Hybrid Deep Learning Model for Enhanced Structural Damage Detection: Integrating ResNet50, GoogLeNet, and Attention Mechanisms. Sensors, 24(22), 7249. https://doi.org/10.3390/s24227249