Effectiveness of Natural Language Processing Based Machine Learning in Analyzing Incident Narratives at a Mine
Abstract
:1. Introduction
2. Importance of This Paper
3. Research Methodology
3.1. MSHA Accident Database
3.2. Random Forest Classifier
- Changing case to lower case.
- Removal of specific words: This consisted of the removal of acronyms common in MSHA databases, and a custom list of “stop words”. Stop words are words such as stray characters, punctuation marks, and common words that may not add value. These are available from several toolkits. The stop words list available from NLTK [29] was modified and used in this paper.
- Lemmatizing: This was done using the lemmatizer in the spacy [30] toolkit. Lemmatizing is the grouping of similar words, or rather, identifying the foundational word. This is done so that related words are not considered separately. For example, consider the two sentences, “He was pushing a cart when he got hurt” and “He got hurt as he pushed a cart”. The lemmatizer would provide “push” as a lemma for both pushing and pushed, and push would replace pushed and pushing in the narrative.
- Over-exertion in lifting objects (OEL).
- Over-exertion in pulling or pushing objects (OEP).
- Fall to the walkway or working surface (FWW).
- Caught in, under or between a moving and a stationary object (CIMS), and
- Struck by flying object (SFO).
4. Results
4.1. Performance within MSHA Data
- Total samples (n_samples): 40,649
- Total samples in target category (n_target): 8979
- Total samples in other categories (n_other): n_samples − n_target = 31,670
- Samples from target category predicted accurately (n_target_accurate): 7248
- Samples from other category predicted wrongly as target (false_predicts): 1331
- Samples from other category predicted correctly as other (other_accurate): 31,670 − 1331 = 30,339
- Percentage of targets accurately predicted: 100 × n_target_accurate/n_target = 100 × 7248/8979 = 81%
- False positive rate: false_predicts/n_other = 1331/31,670 = 4%
- Total correct predictions (total_correct): n_target_accurate + other_accurate = 7248 + 30,339 = 37,587
- Overall success rate (%) = 100 × total_correct/_samples = 100 × 37,587/40,649 = 92%
4.2. Performance on Non-MSHA Data
- Consider the non-trivial words in the problem narrative: “instal new motor/pump assy.use portable cherry picker cherry picker tip assembly catch leg ankle piping motor assembly”. This list of non-trivial words was obtained after pre-processing. Note that “instal” is not a typo but a product of the lemmatizer.
- Consider the word frequencies of the training set when the accident category was “Caught in”. There were 4894 unique words in the 4563 narratives from that category. The top 5 words were finger (0.036), hand (0.021), right (0.015), pinch (0.0148), and catch (0.0143) with the number in parenthesis indicating the proportion of times the word occurred within that category of narratives.
- Similarly, consider the list of words in the “Struck by” category. There were 7758 unique words in the 10,216 narratives. The top 5 words were strike (0.019), left (0.014), right (0.014), cut (0.013), and fall (0.012).
- Now obtain the similarity score between the narrative and a category by weighing each word of the narrative by the proportion of occurrence within the category. This makes sense as the frequency of occurrence of a word in a category is an indicator of its importance to the category. For example, if “leg” gets “Caught in” less frequently than “Struck by”, it will occur in lower proportion in “Caught in” than in “Struck by”. The words in the “Struck by” list occurred 16 times in the narrative for a total similarity score of 0.0168. There are 13 unique words in the 16 occurrences. The top 3 contributors were “leg”, “/” and “install” with scores of 0.004, 0.0027, and 0.0023 for each occurrence in the narrative.
- Similarly, obtain the total similarity score for all the other categories. For “Caught in”, the score is 0.0338. The top 3 contributors in the narrative were “catch” (0.014), “tip” (0.0045), and “install”. It is insightful to note how much more “catch” contributed as a top word than “leg” did as a top word. Clearly, “catch” is a bigger determiner of “Caught in” than leg is of “Struck by”.
- The decision as to which category the narrative belongs is the one with the highest similarity score. In this case, the narrative is deemed to be of the category “Caught in”.
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- ILO. Safety and Health at the Heart of the Future of Work: Building on 100 Years of Experience. International Labour Organization, April 2019 Issue. Available online: https://www.ilo.org/wcmsp5/groups/public/---dgreports/---dcomm/documents/publication/wcms_686645.pdf (accessed on 10 April 2021).
- Hämäläinen, P.; Takala, J.; Boon, K.T. Global estimates of occupational accidents and work-related illnesses. In Proceedings of the XXI World Congress on Safety and Health at Work, Marina Bay Sands, Singapore, Workplace Safety and Health Institute, Marina Bay Sands, Singapore, 3–6 September 2017. [Google Scholar]
- Takala, J.; Hämäläinen, P.; Saarela, K.; Yun, L.; Manickam, K.; Jin, T.; Heng, P.; Tjong, C.; Kheng, L.; Lim, S.; et al. Global estimates of the burden of injury and illness at work in 2012. J. Occup. Environ. Hyg. 2014, 11, 326–337. [Google Scholar] [CrossRef] [PubMed]
- Jiskani, I.M.; Cai, Q.; Zhou, W. Distinctive model of mine safety for sustainable mining in Pakistan. Min. Metall. Explor. 2020, 37, 1023–1037. [Google Scholar] [CrossRef]
- Talebi, E.; Rogers, W.P.; Morgan, T.; Drews, F.A. Modeling Mine Workforce Fatigue: Finding Leading Indicators of Fatigue in Operational Data Sets. Minerals 2021, 11, 621. [Google Scholar] [CrossRef]
- Basu, A.J.; Kumar, U. Innovation and technology driven sustainability performance management framework (ITSPM) for the mining and minerals sector. Int. J. Surf. Min. Reclam. Environ. 2004, 18, 135–149. [Google Scholar] [CrossRef]
- Aznar-Sáncheza, J.A.; Velasco-Muñoza, J.F.; Belmonte-Ureña, L.J.; Manzano-Agugliaro, F. Innovation and technology for sustainable mining activity: A worldwide research assessment. J. Clean. Prod. 2019, 221, 38–54. [Google Scholar] [CrossRef]
- NIOSH. NIOSH Mine and Mine Worker Charts. Available online: https://wwwn.cdc.gov/NIOSH-Mining/MMWC (accessed on 15 June 2021).
- ICMM. 2012. Available online: http://www.icmm.com/en-gb/guidance/health-safety/indicators-ohs (accessed on 5 April 2021).
- Garcia, D. COATIS, an NLP system to locate expressions of actions connected by causality links. In Knowledge Acquisition, Modeling and Management. EKAW 1997; Plaza, E., Benjamins, R., Eds.; Lecture Notes in Computer Science (Lecture Notes in Artificial Intelligence); Springer: Berlin/Heidelberg, Germany, 1997; Volume 1319, pp. 347–352. [Google Scholar] [CrossRef]
- Kaplan, R.; Berry-Rogghe, G. Knowledge-based acquisition of causal relationships in text. Knowl. Acquis. 1991, 3, 317–337. [Google Scholar] [CrossRef]
- Posse, C.; Matzke, B.; Anderson, C.; Brothers, A.; Matzke, M.; Ferryman, T. Extracting information from narratives: An application to aviation safety reports. In Proceedings of the IEEE Aerospace Conference Proceedings, Big Sky, MT, USA, 5–12 March 2005. [Google Scholar] [CrossRef]
- Maille, N.P.; Ferryman, T.A.; Rosenthal, L.J.; Shafto, M.G.; Statler, I.C. What Happened, and Why: Towards an Understanding of Human Error Based on Automated Analyses of Incident Reports—Volume I. NASA ONERA, 2015. Available online: https://ntrs.nasa.gov/api/citations/20060023334/downloads/20060023334.pdf?attachment=true (accessed on 15 April 2021).
- Jurafsky, D.; Martin, J.H. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. 2020. Available online: https://web.stanford.edu/~jurafsky/slp3/ed3book_dec302020.pdf (accessed on 18 January 2021).
- Baker, H.; Hallowell, M.R.; Tixier, A.J.-P. AI-based prediction of independent construction safety outcomes from universal attributes. Autom. Constr. 2020, 118, 103146. [Google Scholar] [CrossRef]
- Tixier, A.J.-P.; Hallowell, M.R.; Rajagopalan, B.; Bowman, D. Automated content analysis for construction safety: A natural language processing system to extract precursors and outcomes from unstructured injury reports. Autom. Constr. 2016, 62, 45–56. [Google Scholar] [CrossRef] [Green Version]
- Rose, R.; Puranik, T.G.; Mavris, D.N. Natural language processing based method for clustering and analysis of aviation safety narratives. Aerosp. 2020, 7, 143. [Google Scholar] [CrossRef]
- Baillargeon, J.T.; Lamontagne, L.; Marceau, E. Mining actuarial risk predictors in accident descriptions using recurrent neural networks. Risks 2021, 9, 7. [Google Scholar] [CrossRef]
- Gernard, J.M. Machine learning classification models for more effective mine safety inspections. In Proceedings of the 2014 International Mechanical Engineering Congress and Exposition IMECE2014, Montreal, QC, Canada, 14–20 November 2014. [Google Scholar]
- Yedla, A.; Kakhki, F.D.; Jannesar, A. Predictive modeling for occupational safety outcomes and days away from work analysis in mining operations. Int. J. Environ. Res. Public Health 2020, 17, 1–17. [Google Scholar]
- Raj, V.K.; Tarshizi, E.K. Advanced Application of Text Analytics in MSHA Metal and Nonmetal Fatality Reports; SME Annual Meeting & Expo: Phoenix, AZ, USA, 2020. [Google Scholar]
- MSHA (Mine Safety and Health Administration). Mine Data Retrieval System. Available online: https://www.msha.gov/mine-data-retrieval-system (accessed on 31 January 2021).
- Mitchell, T.M. Machine Learning. In Machine Learning; McGraw-Hill: New York City, NY, USA, 1997; Volume 45. [Google Scholar]
- Ganguli, R.; Dagdelen, K.; Grygiel, E. Systems Engineering. Mining Engineering Handbook; Darling, P., Ed.; Society for Mining, Metallurgy and Exploration, Inc.: Englewood, CO, USA, 2011. [Google Scholar]
- Röger, C.; Ismayilova, I. Predicting ambient traffic of a vehicle from road abrasion measurements using random forest. In Proceedings of the Conference 13th International Workshop on Computational Transportation Science (IWCTS’20), Seattle, WA, USA, 3 November 2020; pp. 1–7. [Google Scholar]
- Weedon, M.; Tsaptsinos, D.; Denholm-Price, J. Random forest explorations for URL classification. In Proceedings of the 2017 International Conference On Cyber Situational Awareness, Data Analytics And Assessment (Cyber SA), London, UK, 19–20 June 2017; Institute of Electrical and Electronics Engineers, Inc.: New York City, NY, USA, 20 June 2017. ISBN 9781509050604. [Google Scholar] [CrossRef]
- Scikit-Learn. sklearn.ensemble.RandomForestClassifier. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html (accessed on 15 January 2021).
- Humphries, G.R.W.; Magness, D.R.; Huettmann, F. Machine Learning for Ecology and Sustainable Natural Resources Management; Springer: Berlin/Heidelberg, Germany, 2018. [Google Scholar] [CrossRef]
- NLTK. Natural Language Tool Kit. Available online: https://www.nltk.org/ (accessed on 15 January 2021).
- Explosion Spacy. Industrial-Strength Natural Language Processing. Available online: https://spacy.io/ (accessed on 15 January 2021).
Type Group: Caught in | Type Group: Fall | Type Group: Over-Exertion | Type Group: Struck |
---|---|---|---|
Caught in, under, or between a moving and a stationary object | Fall down raise, shaft or manway | Over-exertion in lifting objects | Struck by concussion |
Caught in, under, or between collapsing material or buildings | Fall down stairs | Over-exertion in pulling or pushing objects | Struck by falling object |
Caught in, under, or between NEC | Fall from headframe, derrick, or tower | Over-exertion in wielding or throwing objects | Struck by flying object |
Caught in, under, or between running or meshing objects | Fall from ladders | Over-exertion NEC | Struck by powered moving object |
Caught in, under, or between two or more moving objects | Fall from machine | Struck by rolling or sliding object | |
Fall from piled material | Struck by... NEC | ||
Fall from scaffolds, walkways, platforms | |||
Fall on same level, NEC | |||
Fall onto or against objects | |||
Fall to lower level, NEC | |||
Fall to the walkway or working surface |
Subset | Type Group: OE | Type Group: Caught in | Type Group: Struck by | Type Group: Fall | OEP | OEL | FWW | CIMS | SFO |
---|---|---|---|---|---|---|---|---|---|
Training | 8909 | 4563 | 10,216 | 4802 | 1290 | 2838 | 2130 | 3337 | 1586 |
Testing | 8979 | 4524 | 10,226 | 4926 | 1275 | 2961 | 2130 | 3310 | 1590 |
Metrics | Type Group: OE | Type Group: Caught in | Type Group: Struck by | Type Group: Fall | OEP | OEL | FWW | CIMS | SFO |
---|---|---|---|---|---|---|---|---|---|
Records from Category | 8979 | 4524 | 10,226 | 4926 | 1275 | 2961 | 2130 | 3310 | 1590 |
Overall Success | 92% | 96% | 90% | 95% | 98% | 96% | 96% | 95% | 97% |
% from Category Accurately Predicted | 81% | 71% | 75% | 71% | 37% | 59% | 34% | 55% | 25% |
False Positive | 4% | 1% | 5% | 2% | <1% | <1% | <1% | 2% | <1% |
Type Group: OE | OE Lifting | OE Pulling |
---|---|---|
feel pain back | feel pain back | feel pain back |
pain low back | pain low back | feel pain shoulder |
feel pain low | feel pain low | feel pain right |
feel pain right | feel low back | feel pain low |
feel pain shoulder | feel pain shoulder | feel pain left |
feel pain left | feel pain right | feel pain groin |
feel pain knee | feel pain left | feel pain abdomen |
Fall | FWW |
---|---|
lose balance fall | lose balance fall |
slip fall ground | slip fall right |
cause lose balance | slip fall left |
foot slip fall | slip fall ground |
slip fall backward | cause lose balance |
step lose balance | place restrict duty |
lose balance cause | slip fall ice |
Caught in … | CIMS | Struck by … | SFO |
---|---|---|---|
right index finger | right index finger | piece rock fell | wear safety glass |
left index finger | left index finger | rock fall strike | safety glass eye |
right middle finger | left ring finger | cause laceration require | eye safety glass |
left ring finger | right middle finger | left index finder | behind safety glass |
right ring finger | right ring finger | strike left hand | go safety glass |
left middle finger | pinch index finger | right index finger | safety glass face |
pinch index finger | left middle finger | wear safety glasses | safety glass left |
Metrics | OE | OEP | OEL | Fall | FWW | Caught in | CIMS | Struck by | SFO | Overall |
---|---|---|---|---|---|---|---|---|---|---|
Number | 26 | 1 | 4 | 14 | 3 | 9 | 7 | 27 | 2 | 93 |
Validation | 85% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 96% |
Accident Type | Narrative |
---|---|
OEP | Employee pulled a heavy bag with helper and felt sharp pain in mid back area |
OEL | … Employee strained lumbar back while carrying a portable generator... |
FWW | The operator …. began the pre-shift walk around, but did not notice the slick ground conditions. The operator was not wearing any type of traction device, and slipped and landed on their side/back. |
SFO | .. While doing so a small piece of shrapnel from shank guard struck mechanic in the left inner thigh and was lodged into skin… |
CIMS | While moving a turbo charger rotor, employee pinched finger between the rotor shaft and the crate … |
Overlapping Types | Count |
---|---|
Fall, FWW | 3 |
Caught in, Struck by | 1 |
OEL, OE | 3 |
OEP, OE | 1 |
Struck by, SFO | 2 |
Caught in, CIMS | 7 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ganguli, R.; Miller, P.; Pothina, R. Effectiveness of Natural Language Processing Based Machine Learning in Analyzing Incident Narratives at a Mine. Minerals 2021, 11, 776. https://doi.org/10.3390/min11070776
Ganguli R, Miller P, Pothina R. Effectiveness of Natural Language Processing Based Machine Learning in Analyzing Incident Narratives at a Mine. Minerals. 2021; 11(7):776. https://doi.org/10.3390/min11070776
Chicago/Turabian StyleGanguli, Rajive, Preston Miller, and Rambabu Pothina. 2021. "Effectiveness of Natural Language Processing Based Machine Learning in Analyzing Incident Narratives at a Mine" Minerals 11, no. 7: 776. https://doi.org/10.3390/min11070776
APA StyleGanguli, R., Miller, P., & Pothina, R. (2021). Effectiveness of Natural Language Processing Based Machine Learning in Analyzing Incident Narratives at a Mine. Minerals, 11(7), 776. https://doi.org/10.3390/min11070776