A C-BiLSTM Approach to Classify Construction Accident Reports
Abstract
:1. Introduction
2. Related Works
2.1. Text Mining and Machine Learning
2.2. Existing Studies on Accident Narrative Classification
2.3. LSTM, BiLSTM, and C-BiLSTM
2.4. Performance Metrics
3. C-BiLSTM-Based Classification Framework
3.1. Data Pre-Processing
3.2. Word Embedding
3.3. One Dimension Convolutional Layer
3.4. BiLSTM
4. Results and Discussions
4.1. Data Description
4.2. Baseline Models
4.3. Experiment Results
4.4. Discussions and Future Work
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Al-Humaidi, H.M.; Tan, F.H. Construction safety in Kuwait. J. Perform. Constr. Facil. 2010, 24, 70–77. [Google Scholar] [CrossRef]
- International Labor Organization (ILO). Available online: http://www.ilo.org/global/topics/safety-and-health-at-work/lang–en/index.html (accessed on 2 June 2019).
- Sacks, R.; Rozenfeld, O.; Rosenfeld, Y. Spatial and temporal exposure to safety hazards in construction. J. Constr. Eng. Manag. 2009, 135, 726–736. [Google Scholar] [CrossRef]
- Abdelhamid, T.S.; Everett, J.G. Identifying root causes of construction accidents. J. Constr. Eng. Manag. 2000, 126, 52–60. [Google Scholar] [CrossRef]
- Zhou, Z.; Yang, M.G.; Li, Q. Overview and analysis of safety management studies in the construction industry. Saf. Sci. 2015, 72, 337–350. [Google Scholar] [CrossRef]
- Hallowell, M.; Gambatese, J. A formal model for construction safety risk management. In Proceedings of the Construction and Building Research Conference of the Royal Institution of Chartered Surveyors, COBRA 2007, Atlanta, GA, USA, 10–14 September 2007. [Google Scholar]
- Esmaeili, B.; Hallowell, M.R. Diffusion of Safety innovations in the construction industry. J. Constr. Eng. Manag. 2012, 138, 955–963. [Google Scholar] [CrossRef]
- Teizer, J.; Allread, B.S.; Fullerton, C.E.; Hinze, J. Autonomous pro-active real-time construction worker and equipment operator proximity safety alert system. Autom. Constr. 2010, 19, 630–640. [Google Scholar] [CrossRef]
- Baradan, S.; Usmen, M.A. Comparative injury and fatality risk analysis of building trades. J. Constr. Eng. Manag. 2006, 132, 533–539. [Google Scholar] [CrossRef]
- Hallowell, M.R.; Gambatese, J.A. Activity-based safety risk quantification for concrete formwork construction. J. Constr. Eng. Manag. 2009, 135, 990–998. [Google Scholar] [CrossRef]
- Shapira, A.; Lyachin, B. Identification and Analysis of Factors Affecting Safety on Construction Sites with Tower Cranes. J. Constr. Eng. Manag. 2009, 135, 24–33. [Google Scholar] [CrossRef] [Green Version]
- Esmaeili, B.; Hallowell, M.R. Using network analysis to model fall hazards on construction projects. Saf. Health Constr. 2011, 99, 24–26. [Google Scholar]
- Desvignes, M. Requisite Emperical Risk Data for Integration of Safety with Advanced Technologies and Intelligent Systems. Master’s Thesis, Department of Civil, Environmental, and Architectural Engineering, University of Colorado, Boulder, CO, USA, 2014. [Google Scholar]
- Chua, D.K.H.; Goh, Y.M. Incident causation model for improving feedback of safety knowledge. J. Constr. Eng. Manag. 2004, 130, 542–551. [Google Scholar] [CrossRef]
- Aggarwal, C.C.; Zhai, C.X. A Survey of Text Classification Algorithms. In Mining Text Data; Aggarwal, C., Zhai, C., Eds.; Springer: Boston, MA, USA, 2012; pp. 163–222. ISBN 978-1-4614-3222-7. [Google Scholar]
- Zhang, F.; Fleyeh, H.; Wang, X.; Lu, M. Construction site accident analysis using text mining and natural language processing techniques. Autom. Constr. 2019, 99, 238–248. [Google Scholar] [CrossRef]
- Allahyari, M.; Pouriyeh, S.; Assefi, M.; Safaei, S.; Trippe, E.D.; Gutierrez, J.B.; Kochut, K. A Brief Survey of Text Mining: Classification, Clustering and Extraction Techniques. arXiv 2017, arXiv:1707.02919. [Google Scholar]
- Gantz, J.; Reinsel, D. The Digital Universe in 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East. Available online: http://www.emc.com/leadership/digital-universe/2012iview.index.htm (accessed on 21 May 2020).
- Williams, T.P.; Gong, J. Predicting construction cost overruns using text mining, numerical data and ensemble classifiers. Autom. Constr. 2014, 43, 23–29. [Google Scholar] [CrossRef]
- Wang, S.; Manning, C.D. Baselines and bigrams: Simple, good sentiment and topic classification. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju, Korea, 8–14 July 2012; Volume 2, pp. 90–94. [Google Scholar]
- Watanabe, A.; Sasano, R.; Takamura, H.; Okumura, M. Generating personalized snippets for web page recommender systems. In Proceedings of the 2014 IEEE/WIC/ACM International Conference on Web Intelligence, Intelligence Agent Technolog Work, WI-IAT 2014, Warsaw, Poland, 11–14 August 2014; Volume 2, pp. 198–208. [Google Scholar]
- Schwartz, H.A.; Eichstaedt, J.C.; Kern, M.L.; Dziurzynski, L.; Ramones, S.M.; Agrawal, M.; Shah, A.; Kosinski, M.; Stillwell, D.; Seligman, M.E.P.; et al. Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach. PLoS ONE 2013, 8, 1631–1642. [Google Scholar] [CrossRef]
- Sebastiani, F. Machine Learning in Automated Text Categorization. ACM Comput. Surv. 2002, 34, 1–47. [Google Scholar] [CrossRef]
- Kecman, V. Support Vector Machines–An Introduction; Springer: Berlin/Heidelberg, Germany, 2005; pp. 1–47. [Google Scholar]
- Russell, S.J.; Norvig, P. Artificial Intelligence: A Modern Approach, 2nd ed.; Prentice Hall: Englewood Cliffs, NJ, USA, 2003; pp. 1–1095. [Google Scholar]
- Dasarathy, B. V Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques; IEEE Computer Society Press: Los Alamitos, CA, USA, 1991; pp. 217–224. ISBN 9780818659300. [Google Scholar]
- Deng, L.; Yu, D. Deep Learning: Methods and Applications. Found. Trends Signal. Process. 2014, 7, 197–387. [Google Scholar] [CrossRef] [Green Version]
- Hinton, G.E.; Salakhutdinov, R.R. Reducing the Dimensionality of Data with Neural Networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef] [Green Version]
- Zhong, B.; Xing, X.; Love, P.; Wang, X.; Luo, H. Convolutional neural network: Deep learning-based classification of building quality problems. Adv. Eng. Inform. 2019, 40, 46–57. [Google Scholar] [CrossRef]
- Liu, G.; Guo, J. Bidirectional LSTM with attention mechanism and convolutional layer for text classification. Neurocomputing 2019, 337, 325–338. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, L. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar]
- Lu, J.; Yang, J.; Batra, D.; Parikh, D. Hierarchical question-image co-attention for visual question answering. Proc. Adv. Neural Inf. Process. Syst. 2016, 29, 289–297. [Google Scholar]
- Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual attention network for scene segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 3146–3154. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pretraining of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
- Bertke, S.J.; Meyers, A.R.; Wurzelbacher, S.J.; Bell, J.; Lampl, M.L.; Robins, D. Development and evaluation of a Naïve Bayesian model for coding causation of workers’ compensation claims. J. Saf. Res. 2012, 43, 327–332. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Tanguy, L.; Tulechki, N.; Urieli, A.; Hermann, E.; Raynal, C. Natural language processing for aviation safety reports: From classification to interactive analysis. Comput. Ind. 2016, 78, 80–95. [Google Scholar] [CrossRef] [Green Version]
- Marucci-Wellman, H.R.; Corns, H.L.; Lehto, M.R. Classifying injury narratives of large administrative databases for surveillance—A practical approach combining machine learning ensembles and human review. Accid. Anal. Prev. 2017, 98, 359–371. [Google Scholar] [CrossRef] [Green Version]
- Abdat, F.; Leclercq, S.; Cuny, X.; Tissot, C. Extracting recurrent scenarios from narrative texts using a Bayesian network: Application to serious occupational accidents with movement disturbance. Accid. Anal. Prev. 2014, 70, 155–166. [Google Scholar] [CrossRef] [Green Version]
- Tixier, A.J.P.; Hallowell, M.R.; Rajagopalan, B.; Bowman, D. Automated content analysis for construction safety: A natural language processing system to extract precursors and outcomes from unstructured injury reports. Autom. Constr. 2016, 62, 45–56. [Google Scholar] [CrossRef] [Green Version]
- Goh, Y.M.; Ubeynarayana, C.U. Construction accident narrative classification: An evaluation of text mining techniques. Accid. Anal. Prev. 2017, 108, 122–130. [Google Scholar] [CrossRef]
- Lu, C.; Huang, H.; Jian, P.; Wang, D.; Guo, Y. A P-LSTM neural network for sentiment classification. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD, Jeju Island, Korea, 23–26 May 2017; pp. 524–533. [Google Scholar]
- Wang, Y.; Feng, S.; Wang, D.; Zhang, Y.; Yu, G. Context-aware chinese microblog sentiment classification with bidirectional LSTM. In Proceedings of the Asia-Pacific Web Conference on Web Technologies and Applications, APWeb, Suzhou, China, 23–25 September 2016. [Google Scholar]
- Wei, X. A convolution-LSTM-based deep neural network for cross-domain MOOC forum post classification. Information 2017, 8, 92. [Google Scholar] [CrossRef] [Green Version]
- Le, T.; Bui, G.; Duan, Y. A multi-view recurrent neural network for 3D mesh segmentation. Comput. Graph. 2017, 66, 103–112. [Google Scholar] [CrossRef]
- Yenala, H.; Chinnakotla, M.; Goyal, J. Convolutional Bi-directional LSTM for Detecting Inappropriate Query Suggestions in Web Search. In Proceedings of the Advances in Knowledge Discovery and Data Mining, PAKDD, Jeju Island, Korea, 23–26 May 2017; pp. 3–16. [Google Scholar]
- Buckland, M.; Gey, F. The relationship between Recall and Precision. J. Am. Soc. Inf. Sci. 1994, 45, 12–19. [Google Scholar] [CrossRef]
- Bird, S.; Loper, E. NLTK: The Natural Language Toolkit. In Proceedings of the ACL Interactive Poster & Demonstration Sessions, Barcelona, Spain, 21–16 July 2004. [Google Scholar]
- Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.; Dean, J. Distributed representations ofwords and phrases and their compositionality. Adv. Neural Inf. Process. Syst. 2013, 26, 1–9. [Google Scholar]
- Wang, P.; Xu, B.; Xu, J.; Tian, G.; Liu, C.L.; Hao, H. Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification. Neurocomputing 2016, 174, 806–814. [Google Scholar] [CrossRef]
- Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient Estimation of Word Representations in Vector Space. arXiv 2013, arXiv:1301.3781. [Google Scholar]
- Funahashi, K.I.; Nakamura, Y. Approximation of dynamical systems by continuous time recurrent neural networks. Neural Netw. 1993, 6, 801–806. [Google Scholar] [CrossRef]
- Wang, L.; Wang, Z.; Liu, S. An effective multivariate time series classification approach using echo state network and adaptive differential evolution algorithm. Expert Syst. Appl. 2016, 43, 237–249. [Google Scholar] [CrossRef]
- Cao, W.; Song, A.; Hu, J. Stacked residual recurrent neural network with word weight for text classification. IAENG Int. J. Comput. Sci. 2017, 44, 277–284. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
- Zaremba, W.; Sutskever, I.; Vinyals, O. Recurrent Neural Network Regularization. arXiv 2014, arXiv:1409.2329. [Google Scholar]
- Workplace Safety and Health Institute. Available online: https://www.wsh-institute.sg/ (accessed on 3 June 2019).
- He, H.; Garcia, E.A. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 2009, 21, 1263–1284. [Google Scholar]
- The OSHA Dataset Used in This Article. Available online: https://github.com/LemonDa/OSHA_Dataset (accessed on 3 June 2019).
- Saptoro, A.; Tadé, M.O.; Vuthaluru, H. A modified Kennard-Stone algorithm for optimal division of data for developing artificial neural network models. Chem. Prod. Process. Model. 2012, 7. [Google Scholar] [CrossRef]
- Bishop, C.M. Pattern Recognition and Machine Learning (Information Science and Statistics); Springer: Berlin/Heidelberg, Germany, 2006; pp. 1–55. [Google Scholar]
- Liu, Y.; Bi, J.W.; Fan, Z.P. A method for multi-class sentiment classification based on an improved one-vs-one (OVO) strategy and the support vector machine (SVM) algorithm. Inf. Sci. 2017, 394, 38–52. [Google Scholar] [CrossRef] [Green Version]
- Kononenko, I. Semi-naive bayesian classifier. Lect. Notes Comput. Sci. 1991, 482, 206–219. [Google Scholar]
- Scott, A.J.; Hosmer, D.W.; Lemeshow, S. Applied Logistic Regression; Wiley: New York, NY, USA, 2000; pp. 1–68. [Google Scholar]
- Tixier, A.J.P.; Hallowell, M.R.; Rajagopalan, B.; Bowman, D. Application of machine learning to construction injury prediction. Autom. Constr. 2016, 69, 102–114. [Google Scholar] [CrossRef] [Green Version]
Title | Employee is Found Dead after Exposure to Chlorine |
---|---|
Summary | On 27 June 2008 employee #1 and a coworker were performing mold inspections in an army barrack. A contractor was spraying a 6 to 1 mixture of bleach and water. Employee #1 complained of chest pains and the odor of chlorine later that evening. He was found dead in his hotel room the following day. |
Label | exposure to chemical substances |
Label | Description | Count | % | |
---|---|---|---|---|
1 | Caught In/Between Objects | Fractures, cuts, lacerations, or amputations caused when a worker is caught in between objects, generally referring to hand tools. | 95 | 5% |
2 | Collapse of Object | Cases resulting from structural failure. | 258 | 14% |
3 | Electrocution | Direct electric shock or any burns caused by electrical faults. | 270 | 14% |
4 | Exposure to Chemical Substances | Worker comes into contact with toxic/corrosive chemical substances. | 109 | 6% |
5 | Exposure to Extreme Temperatures | Extreme temperatures caused by frost, hot liquid, or gases (this category includes hypothermia). | 92 | 5% |
6 | Falls | Slip or trip cases and cases where a victim falls from a high elevation (not due to structural failure). | 293 | 16% |
7 | Fires and Explosion | Injuries caused by direct fires and explosion (not including electrical burns). | 173 | 9% |
8 | Struck by Falling Object | Victim is hit by a falling object (which is not a result of structural failure). | 124 | 7% |
9 | Struck by Moving Objects | Victim is hit by a moving object (which is not in free fall). | 164 | 9% |
10 | Traffic | Injury occurs while a worker is driving a vehicle or when a moving vehicle strikes a worker. | 169 | 9% |
11 | Other | Cases that do not fall in any of the above categories. Some less-frequently occurring categories are merged, as the number of occurrences is very low (drowning, suffocation). | 116 | 6% |
TOTAL | 1863 | 100% |
Labels | C-BiLSTM (BERT) | BERT | C-BiLSTM (Word2vec) | BiLSTM | LSTM | CNN | SVM | NB | LR |
---|---|---|---|---|---|---|---|---|---|
F1 | F1 | F1 | F1 | F1 | F1 | F1 | |||
Caught in/between objects | 0.83 | 0.82 | 0.74 | 0.71 | 0.71 | 0.68 | 0.73 | 0.24 | 0.71 |
Collapse of object | 0.66 | 0.63 | 0.57 | 0.56 | 0.54 | 0.52 | 0.51 | 0.38 | 0.45 |
Electrocution | 0.96 | 0.96 | 0.96 | 0.94 | 0.93 | 0.87 | 0.90 | 0.88 | 0.92 |
Exposure to chemical substances | 0.84 | 0.81 | 0.79 | 0.80 | 0.78 | 0.75 | 0.71 | 0.71 | 0.77 |
Exposure to extreme temperatures | 0.83 | 0.84 | 0.82 | 0.82 | 0.77 | 0.74 | 0.72 | 0.55 | 0.77 |
Falls | 0.83 | 0.80 | 0.81 | 0.78 | 0.77 | 0.72 | 0.75 | 0.72 | 0.73 |
Struck by moving objects | 0.65 | 0.66 | 0.64 | 0.64 | 0.59 | 0.49 | 0.57 | 0.43 | 0.53 |
Struck by falling objects | 0.71 | 0.72 | 0.72 | 0.71 | 0.67 | 0.66 | 0.63 | 0.07 | 0.48 |
Traffic | 0.83 | 0.81 | 0.77 | 0.75 | 0.77 | 0.75 | 0.81 | 0.69 | 0.74 |
Fires and explosions | 0.95 | 0.93 | 0.92 | 0.89 | 0.90 | 0.88 | 0.80 | 0.85 | 0.84 |
Others | 0.78 | 0.78 | 0.72 | 0.66 | 0.67 | 0.64 | 0.55 | 0.20 | 0.63 |
Weighted average/total | 0.81 | 0.80 | 0.78 | 0.76 | 0.75 | 0.71 | 0.71 | 0.58 | 0.69 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, J.; Zi, L.; Hou, Y.; Deng, D.; Jiang, W.; Wang, M. A C-BiLSTM Approach to Classify Construction Accident Reports. Appl. Sci. 2020, 10, 5754. https://doi.org/10.3390/app10175754
Zhang J, Zi L, Hou Y, Deng D, Jiang W, Wang M. A C-BiLSTM Approach to Classify Construction Accident Reports. Applied Sciences. 2020; 10(17):5754. https://doi.org/10.3390/app10175754
Chicago/Turabian StyleZhang, Jinyue, Lijun Zi, Yuexian Hou, Da Deng, Wenting Jiang, and Mingen Wang. 2020. "A C-BiLSTM Approach to Classify Construction Accident Reports" Applied Sciences 10, no. 17: 5754. https://doi.org/10.3390/app10175754
APA StyleZhang, J., Zi, L., Hou, Y., Deng, D., Jiang, W., & Wang, M. (2020). A C-BiLSTM Approach to Classify Construction Accident Reports. Applied Sciences, 10(17), 5754. https://doi.org/10.3390/app10175754