B-TBM: A Novel Deep Learning Model with Enhanced Loss Function for HAZOP Risk Classification Using Natural Language Statistical Laws

Xu, Binxin; Lu, Duhui; Gao, Dong; Zhang, Beike

doi:10.3390/pr12112373

Open AccessArticle

B-TBM: A Novel Deep Learning Model with Enhanced Loss Function for HAZOP Risk Classification Using Natural Language Statistical Laws

¹

Chemical Industry System Simulation Engineering Technology Center, School of Information and Technology, Beijing University of Chemical Technology, Beijing 100029, China

²

China Special Equipment Inspection & Research Institute, Beijing 100029, China

^*

Authors to whom correspondence should be addressed.

Processes 2024, 12(11), 2373; https://doi.org/10.3390/pr12112373

Submission received: 21 September 2024 / Revised: 14 October 2024 / Accepted: 22 October 2024 / Published: 29 October 2024

(This article belongs to the Special Issue Condition Monitoring and the Safety of Industrial Processes)

Download

Browse Figures

Versions Notes

Abstract

:

HAZOP is a paradigm of industrial safety, and the introduction of deep learning-based HAZOP text categorization marks the arrival of an intelligent era of safety analysis. However, existing risk analysis methods have limitations in processing complex texts and extracting deep risk features. To solve this problem, this paper proposes a novel HAZOP risk event classification model based on BERT, BiLSTM, and TextCNN. The complexity of HAZOP text is revealed by introducing statistical laws of natural language, such as Zipf’s law and Heaps’ law, and the outputs of different levels of BERT are further combined linearly to collaborate with BiLSTM and TextCNN to capture long-term dependency and local contextual information for a more accurate classification task. Meanwhile, an improved loss function is proposed to effectively solve the deficiencies of the traditional cross-entropy loss function in the mislabeling process and improve the generalization ability of the model. It is experimentally demonstrated that the accuracy of the model is improved by 3% to 4% compared to the traditional BERT model in the task of severity and possibility classification of HAZOP reports. This study not only improves the accuracy and efficiency of HAZOP risk analysis, but also provides new ideas and methods for the application of natural language processing in industrial safety.

Keywords:

risk classification; deep learning; hazard and operability analysis; natural language statistical laws

1. Introduction

Industrial processes often carry significant safety risks, and the potential consequences of safety incidents can range from fatal hazards to substantial economic losses. Therefore, conducting a thorough safety analysis of the entire process is crucial to mitigate risks before accidents occur. Recognized for shouldering this responsibility is the HAZOP (Hazard and Operability Analysis), as highlighted in studies by Suzuki et al. [1], Zhu et al. [2], Ahn and Chang [3], and Meng et al. [4]. The HAZOP is designed to analyze, identify, and predict risks by using guide words associated with potential deviations in the process. This involves expert discussions to explore nodes where deviations might occur, the causes of these deviations, protective measures to prevent incidents, and the severity and possibility ratings of potential accidents. The risks are meticulously documented in HAZOP reports, as exemplified in the attached Appendix A. Dunjó et al. [5] note that the compilation of HAZOP reports is a valuable repository of knowledge and expertise from a variety of risk experts. Unlocking the potential within these reports through HAZOP report mining holds profound significance in today’s context. This practice not only embodies the intelligent analysis of risks but also accelerates the efficiency of safety diagnostics. Traditional HAZOP processes often demand weeks or even months of expert brainstorming sessions, imposing substantial costs in terms of both manpower and time. The advent of HAZOP report mining addresses this challenge, providing a pathway to harness the wealth of expert insights efficiently. The mining of HAZOP reports transforms the landscape of risk analysis by leveraging the collective wisdom encapsulated in these documents. This approach transcends the limitations of traditional brainstorming methods, enabling a more rapid and sophisticated comprehension of potential hazards. In the contemporary industrial milieu, where time and resources are paramount, HAZOP report mining emerges as a strategic initiative to propel safety diagnostics into a realm of intelligence and expediency. By doing so, it not only optimizes the utilization of expert knowledge but also aligns with the imperative of sustaining a thriving and secure industrial ecosystem. Many research institutions have tried to use computer algorithms to supplement computer HAZOP analysis, such as in Yousofnejad et al. [6], Cheraghi et al. [7], and Wu et al. [8].

In the realm of HAZOP report mining, previous research has predominantly centered around the application of deep learning methodologies. The inherent generality and the ability to bypass intricate feature engineering provided by deep learning frameworks have presented a myriad of options for system development. Notably, pre-trained language models like BERT have emerged as a beacon of efficiency with an acceptable cost, offering a versatile toolset for various applications. For instance, Zhang et al. [9] delved into knowledge boundary prediction based on BERT, while Xu et al. [10] focused on the automatic extraction of causal relationships among risk factors using BiLSTM. Ricketts et al. [11] explored hazard analysis in the aviation sector through entity recognition, and Jia et al. [12] investigated safety risks stemming from medication errors in the healthcare industry. Furthermore, Wang et al. [13] ventured into HAZOP entity generation under the GPT-2 model. Peng et al. [14] used the DCNN (Dual Convolutional Neural Network) model to extract richer semantic information from HAZOP to solve the semantic nesting problem. Zhao, Zhang, & Gao [15] proposed a deep learning approach to solve the problem of unclear physical boundaries. Zhang et al. [16] demonstrated the efficiency of CatBoost model in fault diagnosis of oil-immersed power transformers. However, despite the strides made in various aspects of HAZOP report mining, the research landscape concerning risk classification remains somewhat underexplored. A critical issue emerges from the observation that prior studies have not extensively investigated the characteristics of risk representations, often making the implicit assumption of their inherent complexity. This raises a fundamental concern, leading to a second problem. Relying only on the semantic information of BERT to accomplish tasks such as severity classification restricts the model to a certain range and hinders further development and improvement of the model. Addressing these issues is paramount for advancing the sophistication and effectiveness of risk classification models in HAZOP report mining. As shown in Table 1 for common text classification models, the deep learning models include BERT, TextCNN, and BiLSTM, which can automatically learn complex features and are especially suitable for dealing with unstructured data such as text. Traditional machine learning algorithms include SVM, KNN, and MLP, which perform well in the task of small-scale and high-dimensional feature space. They also perform well, but are less efficient, in the face of large-scale data and are more dependent on manual feature engineering. Finally, the gradient boosting tree algorithms LightGBM, XGBoost, and CatBoost are efficient in dealing with large-scale structured data and have good generalization ability, but do not perform as well as deep learning models on unstructured data.

The main objective of this research is to develop a high-precision deep learning model specialized in HAZOP risk event classification for the automatic classification of risk events in HAZOP reports in order to reduce the workload of experts in the risk assessment process and to improve the reuse efficiency of textual information. The model aims to make the risk assessment process in industrial environments more automated and efficient by efficiently categorizing multiple categories of information contained in complex industrial reports and reducing the need for manual intervention. In creating a novel model capable of automatically identifying and classifying the severity and possibility of HAZOP risk events through the integration of advanced natural language processing techniques, and reducing the need for human intervention during the model application process, thus realizing a higher level of automated risk assessment, the B-TBM model not only improves the classification accuracy, but also adapts to the multi-category classification task in unbalanced data scenarios. It thus, in practice applications reduces the burden on experts. This will provide a set of more intelligent and efficient tools for HAZOP risk management in industrial safety, which will help to improve the speed of risk identification and classification accuracy. This will ultimately reduce the cost and resource requirements of risk management, and will lay the foundation for the future promotion of natural language processing technology in the field of industrial safety. The first identified problem is solved by exploring risks in a nuanced way by considering them as a distinct subset of natural language. Our methodology involves investigating these risks through commonly observed statistical regularities, thereby quantitatively delineating the characteristics of HAZOP texts. By delving into the frequency and diversity of words and the long-range correlations inherent in natural language, we fortify the theoretical underpinnings of related research. Notably, the interconnectedness between words and the logical relationships between sentences, proven to be suitable frameworks, are instrumental in this pursuit. Furthermore, a pivotal aspect of our study lies in the recognition that risks manifest intricately at both the word and sentence levels. This realization serves as a potent avenue for mitigating the second issue identified. To achieve this, we propose leveraging diverse information encapsulated in different depths of BERT’s architecture. Jawahar et al. [26] point out that the lower layer focuses on capturing lexical information, the middle layer delves into syntactic nuances, and the deeper layer centers around semantic understanding. This multifaceted approach stands in contrast to conventional studies that solely rely on the output from the final layer.

Building upon these foundations, we propose a novel risk classification model, B-TBM, tailored for HAZOP reports. B-TBM orchestrates a linear combination of outputs from different layers of BERT, synergizing with BiLSTM and TextCNN to adeptly capture both the long-range dependencies and local contextual information embedded within the text. Notably, in the development of B-TBM, we identified a limitation in conventional cross-entropy loss functions that focus solely on correct labels, potentially leading to information loss during backpropagation. To address this, we introduce a novel loss function LFCF (Logarithmic Focal Cross-Entropy Function), which is designed to enhance the learning process. Extensive experimentation, specifically centered around possibility and severity classification, validates the efficacy of both B-TBM and LFCF. The outcomes of our research not only furnish more intelligent results for the routine safety assessments in industrial processes but also mitigate overreliance on expert experience and potential biases introduced by human factors during analysis. This approach stands as a significant leap towards enhancing the efficiency of the next generation of HAZOP studies. Our contributions are outlined as follows:

1.: We assess the complexity of risks represented by HAZOP language, providing a case study for other industrial practices;
2.: We introduce a novel risk classification model that ingeniously leverages BERT and incorporates a newly proposed loss function;
3.: Extensive experimentation substantiates the effectiveness of the model, serving as a valuable tool for risk analysis by expert groups, engineers, and other enterprises.

The remainder of this paper is organized as follows. Section 2 introduces the related work on the risk classification and statistical laws of natural language. Section 3 outlines the method. The experiments are presented in Section 4. Section 5 is the outcomes and key findings, and Section 6 discusses the applications of our classification system. Section 7 concludes the paper.

2. Related Work

2.1. Risk Classification

The classification of risks has long been a meaningful yet challenging endeavor. Compounded by the fact that risks often manifest in textual form, employing text classification methods has proven to be a highly effective strategy for extracting valuable insights regarding risk categories. This approach has not only demonstrated its effectiveness but has also found successful applications in diverse domains such as natural disaster assessment, fire incident analysis, and emergency incident severity categorization, among others, yielding remarkably positive outcomes.

For instance, Samela et al. [27] employed algorithms based on DEMs (Digital Elevation Models) and geographical positioning to estimate the severity and risk levels of floods, enabling real-time monitoring of high-risk sections prone to flooding. A model integrating multiple approaches has emerged, as seen in Akay’s work [28], which predicts the impact of flood disasters on specific map segments. In the domain of warehouse safety risk management, Li et al. [29] proposed the use of the electrostatic discharge method to optimize SVM model parameters, thereby enhancing the dynamic risk assessment capabilities for fires involving Class A hazardous chemicals. Meanwhile, in water engineering construction safety, Tian et al. [30] utilized neural networks to address severity classification in textual descriptions of safety hazards. The application of LDA (Latent Dirichlet Allocation) models, as showcased by Wang et al. [31], categorizes causes and consequences in HAZOP reports, extracting associated rules and general trends. Feng et al. [32] proposed a text classification model based on natural language processing techniques to effectively mine and classify consequence severity from HAZOP reports by applying BERT Pre-training model and BiLSTM+Attention mechanism to improve the safety analysis of chemical production processes. Zhang et al. [33] proposed a new hazard classification model combining grey model and deep learning to improve the accuracy and effectiveness of hazard classification. Wang et al. [34] effectively classified the severity, possibility, and risk of hazardous events in industrial safety by combining BERT vectorization, HmF-DFA multiple fractal analysis, and HGNN (Hierarchical Gating Neural Network). Ekramipooya et al. [35] presented an automated approach based on BERT and ML (Machine Learning) to effectively predict information in HAZOP reports through clustering and classification techniques. Soheil Rezashoar et al. [36] investigated a hybrid machine learning-based algorithm (LightGBM-Optuna) to solve the road traffic accident classification problem. Jingsong Xie et al. [37] proposed a novel bearing fault classification method based on XGBoost, which fuses features extracted by deep learning and empirical features, and improves the accuracy and robustness of the classification of bearing faults by means of an improved neural network structure and the XGBoost algorithm. Walczak et al. [38] showed that the KNN algorithm with dynamically expanding deactivated word lists and weighted keywords significantly improves the accuracy and efficiency of text categorization of industrial events, which is important for automating the processing of and responding to large amounts of text data in industrial processes. Orrù et al. [39] proposed a machine learning-based fault classification prediction method for centrifugal pumps in the oil and gas industry, which achieves high-accuracy fault classification detection through SVM and MLP algorithms. In addition, Wang and Gu [40] proposed that the correlation between classification parameters, bootstrap words, causes, and consequences is combined with the LDA model, which provides a technical basis and fundamental guarantee for risk identification, accident prevention, and rescue actions in petrochemical plants.

In the HAZOP risk assessment, severity and possibility ratings are categorized on a scale of 1 to 5, which measure the potential impact and probability of occurrence of a risk event, respectively. Specifically, the severity scale increases progressively from the lowest impact (level 1) to the most severe impact (level 5), while the possibility scale increases progressively from a very rare occurrence (level 1) to an almost certain occurrence (level 5). This grading provides the model with a clear delineation of the risk hierarchy, which helps the B-TBM model to better locate and predict the levels of different risk events during the risk classification process. However, since events with high severity and high possibility classes are relatively rare in the dataset, the B-TBM model may be limited by the sample size when dealing with these extreme classes. The problem of data imbalance may lead to bias in the model’s prediction of high-risk events and the model is prone to confusion in identifying these classes. In previous research endeavors, the prominence of BERT, TextCNN, and BiLSTM has been particularly noteworthy. BERT, standing out as a highly influential pre-trained model in recent years, has wielded substantial impact owing to its ability to significantly enhance model capabilities with minimal computational resources through fine-tuning. This has rendered BERT a pivotal asset for diverse tasks. TextCNN, with its adeptness in applying various forms of convolution to language sequences, excels in capturing localized semantic features. Meanwhile, BiLSTM leverages its advantage in handling long dependencies within sequences. These models have maintained their vigor and relevance due to their commendable performances and the acceptability of their computational costs. BERT, especially, has revolutionized the landscape by allowing for fine-tuning on specific tasks, augmenting model capabilities without excessive computational demands. The versatility of TextCNN in capturing local semantics and the proficiency of BiLSTM in managing long dependencies continue to position them as valuable tools in the natural language processing domain. As their enduring efficacy, our present model draws upon the strengths of BERT, TextCNN, and BiLSTM, which have withstood the test of time and remain integral components in contemporary research methodologies.

2.2. Statistical Laws of Natural Language

The laws of language statistics are commonly used to measure the complexity of domain corpora. Zipf’s law describes the discrete distribution of some measured values x as a function of their rank r. If all the measured values are binned into T rank r in a decreasing order (

x_{1}

>

x_{2}

… >

x_{r}

… >

x_{T}

), thus, Zipf’s law is presented as Equation (1).

x_{r} = \frac{x_{1}}{r^{α}}

(1)

where

α

is Zipf’s exponent, which needs to be experimentally determined. Heaps’ law stands as another ubiquitous statistical regularity observed in complex systems of empirical nature. This law delineates the sub-linear growth in the number of different components concerning the size of the system. Such topological characteristics provide an explanation for the emergence of heavy-tailed distributions in random systems. In the realm of linguistics, Heaps’ law posits that the vocabulary size continues to expand with the size of the corpus. In other words, this law governs how the adopted index size accommodates itself in relation to the corpus size. The more general formulation is represented by Equation (2).

N_{v} \propto N_{t}^{β}

(2)

where

N_{t}

is the word size,

N_{v}

is the number of different words, and

β

is Heaps’ exponent. Of noteworthy significance is the seamless extension of preferential attachment-based stochastic growth models through the introduction of the arrival rate of new components. This facile selection of new components conveniently accommodates the capture of the empirical Heaps’ law. This characterization of the diversity in textual vocabulary bestows a means to gauge the intricacies inherent in text.

Another prevalent measure of text complexity is the exploration of long-range correlation, often approached from the vantage point of multifractality. Multifractal analysis, in its broad sense, unveils the inherent self-similarity between local and global perspectives, with the latter illuminated and deepened through an understanding of the former. Under the influence of syntax, the interplay between words engenders a specific meaning, referred to as long-range correlation, conveying the will, emotions, and thoughts of individuals and societies. The significance of certain textual excerpts often lies in their ability to self-replicate and articulate the meaning of the entire text, presenting a more accurate and concise representation. From a topological perspective, this reveals the intricate interplay within language, providing insights into long-term trends and the complexities of memory. The Hurst exponent stands as a metric for long-range correlation, serving as a quantifiable measure to unravel the complexity within textual data, which is usually guided by the multifractal detrended fluctuation analysis. Given a text vector

R = {r_{1}, r_{2}, \dots, r_{l}}

with length l, its profile series is shown as Equation (3).

D = \sum_{i = 1}^{k} (r_{i} - \frac{1}{l} \sum_{l}^{i = 1} r_{i}), k = 1, 2, \dots, l

(3)

Further, D is sliding bidirectionally via window operation with length s (see Equation (4)) where [•] stands for the rounding function. Therefore, obtain 2

W_{s}

non-overlapping windows, and take

φ_{t}^{n}

to mark the n-th window where 0 < t < s.

W_{s} = [\frac{l}{s}]

(4)

Note that each window has its own local trend

ψ_{t}^{n}

, and the fitting of each

ψ_{t}^{n}

is equipped with the least square method, thus, the detrended series is Equation (5).

δ_{t}^{n} = φ_{t}^{n} - ψ_{t}^{n}

(5)

The local variance

σ^{2} (s, n)

by the window size s variable and window series number n variable is Equation (6). For 2

W_{s}

windows, the q-order fluctuation function

F_{q} (s)

are calculated; see Equation (7).

σ^{2} (s, n) = \frac{1}{s} \sum_{t = 1}^{n} {(δ_{t}^{n})}^{2}

(6)

F_{q} (s) = \{\begin{matrix} {[\frac{1}{2 W_{s}} \sum_{n = 1}^{2 W_{s}} σ^{2} {(s, n)}^{\frac{q}{2}}]}^{\frac{1}{q}}, q \neq 0 \\ exp \{\frac{1}{4 W_{s}} \sum_{n = 1}^{2 W_{s}} ln (σ^{2} (s, n))\}, q = 0 \end{matrix}

(7)

By setting window size s to observe

F_{q} (s)

, obviously, with the increase in s,

F_{q} (s)

rises in a power-law in the form of Equation (8), which depends on the fractal order q.

F_{q} (s) \propto S^{H (q)}

(8)

The generalized Hurst exponents

H (q)

(when q = 2,

H (q)

is the standard Hurst exponents) can be obtained by fitting

F_{q} (s)

with the Boltzmann method.

3. Method

3.1. Complexity Measurement

The HAZOP contains densely specialized terminology that not only reflects the unique properties of industrial equipment and materials, but also encapsulates the knowledge embedded in a team of experts. In order to specifically and intuitively measure the inherent complexity of the HAZOP corpus relative to a general-purpose corpus, the NLSL method reveals domain-specific linguistic complexity through statistical analysis of HAZOP texts. Specifically, the method quantifies the distribution of terminology and the pattern of vocabulary growth using Zipf’s law and Hipps’ law, providing new perspectives for understanding the characteristics of the HAZOP corpus. This analysis not only validates the high-density distribution characteristics of the jargon in HAZOP texts, but also demonstrates the peculiarities of the corpus in terms of word frequency distribution and neologism occurrence. These data help to gain insight into the differences in linguistic structure and complexity between HAZOP texts and general corpora, thus providing a theoretical basis for customized text classification methods. Similar to general language corpora, we randomly selected a dataset from the vast and representative domain of Wikipedia encyclopedias to reflect the size of the HAZOP corpus.

Regression, a ubiquitous quantitative analysis method, serves as a powerful tool for modeling the complex relationships between variables by determining the optimal fit with observed data. It is commonly employed to understand how variations in independent variables influence dependent variables, facilitating prediction or forecasting. The primary objective is to estimate the parameters of a mathematical equation (regression model) that best represents the relationship between variables. The least squares method is often employed to identify the best fit, where the sum of the squared vertical deviations between each data point and the corresponding fitted point is minimized. This strategy minimizes the overall differences between observed data and predicted values, offering a robust means to unveil and quantify the interplay between variables. The goodness of fit is encapsulated by the determination coefficient, denoted as

R^{2}

, which provides informative insights and is constrained within the interval of 0 to 1. This coefficient signifies the fidelity to actual data points, with values exceeding 0.8 commonly considered indicative of a reasonably acceptable fit.

R^{2}

serves as a valuable metric, offering a nuanced evaluation of the effectiveness of the regression model in capturing and representing the underlying relationships within the data. Assuming p and q are observed and fitted values, respectively, and N is the size of the dataset, then parameter-estimated

\tilde{v}

and

R^{2}

can be calculated by Equation (9):

\begin{matrix} \tilde{v} = \frac{min \sum_{i = 1}^{N} {(p_{i} - q_{i})}^{2}}{\partial v} \\ R^{2} = 1 - \frac{\sum_{i = 1}^{N} {(p_{i} - q_{i})}^{2}}{\sum_{i = 1}^{N} {(p_{i} - avg (p_{i}))}^{2}} \end{matrix}

(9)

For the Zipf exponent, denoted as

α

, it is determined based on the principle of least effort and the overarching principle of uniform effort, governing its correlation with the economic contributions of language. Following the tenets of minimal cognitive load, language users tend to economize their linguistic contributions as cognitive loads external to language tasks increase. Specifically, when greater cognitive effort is required, language users lean towards employing fewer words with higher frequencies, resulting in higher

α

values. Conversely, for tasks necessitating lesser cognitive effort, a less economical language featuring relatively lower

α

values is employed. Hence, smaller

α

values are indicative of higher complexity. This principle captures the intricate balance between cognitive effort and linguistic economy, revealing the adaptive nature of language use. Similarly, for the Heaps exponent, denoted as

β

, the same governing principle is observed from the perspective of lexical richness. A higher

β

signifies greater complexity, as it reflects an increased richness in vocabulary. The larger the

β

, the more expansive and diverse the vocabulary employed in language, indicating heightened linguistic intricacy. As for the generalized Hurst index, higher values signify stronger long-range correlations within the text. This index serves as a robust indicator of the persistence and self-similarity inherent in linguistic sequences, reflecting a more intricate and structured interplay between language components. The higher the generalized Hurst index, the more pronounced and persistent the long-range dependencies within the text, showcasing heightened complexity and organization in the language structure.

3.2. B-TBM

This paper introduces the B-TBM model, a comprehensive framework that incorporates the model structures of BERT, TextCNN, and BiLSTM, trained using the proposed LFCF loss function. Designed to enhance HAZOP safety analysis, the B-TBM model is specifically tailored for predicting severity and possibility levels of risks. By facilitating risk predictions, it assists experts in conducting HAZOP safety analyses and augments the daily safety monitoring capabilities of industrial facilities. The overall architecture of B-TBM is illustrated in Figure 1, depicting its integrated structure and highlighting its potential as a robust tool for advancing safety assessments in complex industrial processes.

3.2.1. BERT Layer: Capturing Multi-Level Semantic and Contextual Information

BERT is one of the core components of the B-TBM model. The bidirectional attention mechanism of BERT enables it to capture complex dependencies in text by considering both the preceding and following contexts when processing text. However, traditional approaches only use the output of the last layer of BERT and cannot fully utilize the multi-level semantic information captured by BERT at different levels. BERT is able to capture complex dependencies from the context and extract high-level semantic features. However, relying only on BERT may lead to the neglect of local contextual information, which requires the introduction of other models to compensate for the lack of local features. Formally, given the HAZOP risk W formulated as a word sequence {

w_{1}

,

w_{2}

, …,

w_{n}

} with the length N, B-TBM employs an initial step of invoking BERT to process the W, generating embeddings V = {

v_{1}

,

v_{2}

, …,

v_{n}

}. Recognizing the intricate nature of HAZOP risks, rather than conventionally adopting the last layer of BERT as the output, we consider outputs from different layers (

{B e r t}_{i}

) and linearly combine them to fully leverage the semantic information captured at various levels. Acknowledging that semantic details from the final layer remain crucial, we allocate weights

σ

to other layer outputs, determined through empirical experimentation, resulting in the comprehensive embedding V.

V = C o n c a t [σ_{i} B e r t_{i}]; σ_{i} \in (0, 1], σ_{12} = 1

(10)

where

{B e r t}_{i}

denotes the output vector of layer i and V is the feature vector after linear combination.

3.2.2. BiLSTM Layer: Capturing Long-Term Dependency Information

In the B-TBM model, we introduce BiLSTM on top of the multilayer output of BERT to enhance its context capturing capability. BiLSTM combines bidirectional contextual information to effectively capture forward and backward dependencies of the text, which makes it perform well in processing complex text, especially in tasks that require global understanding. Compared to other LSTM variants, BiLSTM has the following specific advantages: first, GRU, despite its higher computational efficiency, is not as powerful as BiLSTM in capturing bidirectional information. In HAZOP risky text categorization, accurate contextual understanding is crucial, and thus BiLSTM is better suited for such tasks that require bidirectional understanding [41]. Second, unidirectional LSTM performs well in dealing with long-range dependencies, but it can only acquire contextual information in a single direction. For risky texts that require complete understanding of the front and back contexts, BiLSTM captures global information more accurately. Although Hierarchical LSTM has advantages in long text processing, its higher computational complexity limits its effectiveness on shorter texts such as HAZOP text. In contrast, BiLSTM is able to accurately capture bidirectional context without introducing additional complexity. This advantage is key to improving the accuracy of HAZOP risk classification for this study.

In the B-TBM model, the multilayer outputs of BERT are used as inputs to BiLSTM, which processes these vector sequences to generate contextual features with global dependencies. This process enables the model to understand the long-term dependency structure in HAZOP text more accurately. BiLSTM is able to capture long-term dependencies in text and performs well in processing complex sentences. However, BiLSTM performs well in processing global dependencies, but it is not as effective as CNNs (Convolutional Neural Networks) in processing local contextual features.

The combined vector V generated by BERT is used as the input to BiLSTM. BiLSTM is a bidirectional LSTM network that can process both left-to-right and right-to-left sequential information, and is capable of capturing long-term dependencies in text. As shown in Equation (11), BiLSTM will output a time-series processed sequence of feature representations that are able to reflect the global contextual dependencies in the text.

H_{B i} = B i L S T M (V)

(11)

Here,

H_{B i}

is the sequence of contextual features generated by BiLSTM.

3.2.3. TextCNN Layer: Capturing Local Contextual Information

In parallel with BiLSTM, the combined output V of BERT is also fed into TextCNN. The TextCNN module extracts local contextual features, especially phrase-level n-gram features, from the multilayer output of BERT via a Convolutional Neural Network; extCNN’s convolutional kernel sizes are set to 4, 5, and 6 for capturing n-gram features of different lengths. For each convolutional kernel size, two filters are used and a total of six filters are used for feature extraction to extract local patterns and phrase-level features of the text as shown in Equation (12). This convolutional operation can effectively identify the key terms or phrases in a sentence and capture the local contextual information in the text.

H_{C N N} = T e x t C N N (V)

(12)

Among them,

H_{C N N}

is the local contextual feature extracted by TextCNN.

3.2.4. Feature Splicing and Fully Connected Layer

After obtaining the outputs of BiLSTM and TextCNN, the two features need to be fused next. In order to synthesize the global dependency and local context information, this paper adopts feature splicing (concatenation) to merge the outputs of the two into one feature representation as shown in Equation (13):

H_{f} = C o n c a t (H_{B i}, H_{C N N})

(13)

where

H_{f}

is the joint feature representation of BiLSTM and TextCNN after splicing. The purpose of this step is to integrate the long-term dependency information captured by BiLSTM with the local context information extracted by TextCNN, allowing the model to focus on the global context as well as capturing local details.

After feature fusion, the resulting

H_{f}

representation will be fed into the FCL (Fully Connected Layer), which serves to map the fused features and generate the final category possibility distributions through the Softmax function, as shown in Equation (14).

P = S o f t m a x (W H_{f} + b)

(14)

where W is the weight matrix, b is the bias term, and P is the final generated category possibility distribution. The Softmax function maps the fused features to the possibility space of multiple categories, ensuring that the output value is between 0 and 1 and the possibility of all categories sums to one.

3.2.5. LFCF Loss Function

Subsequently, V undergoes separate transformations through BiLSTM and TextCNN, yielding contextual feature sequences and local feature sequences, respectively. These two sequences are then concatenated and fed into a fully connected neural network with the LFCF loss function, as described in Equation (15), facilitating backpropagation for training.

L F C F = - (y_{c} ln p_{c} + \sum_{i = 1, i \neq c}^{M} (1 - y_{i}) ln (1 - p_{i}))

(15)

Here, M represents the number of categories, with

y_{c}

and

p_{c}

denoting the label and possibility of the correct category, while

y_{i}

and

p_{i}

represent the label and possibility of the incorrect categories.

Specifically, LFCF is designed with a specific consideration for the limitations of traditional Softmax cross-entropy loss functions. Traditional approaches tend to focus solely on the possibility associated with the correct category, often neglecting the contributions of other labels and leading to potential errors. To illustrate, let us consider a scenario with n distinct labels, where

n_{i}

represents the correct label for the current classification, and the remaining

(n - 1)

labels are incorrect. In the conventional setting of cross-entropy loss functions, there is a failure to account for the states of the incorrect labels. This often results in the emergence of two prominent prediction values,

n_{j}

and

n_{i}

, both exceeding 0.4. In cases where

n_{j}

slightly surpasses

n_{i}

, with a difference, for instance, of 0.02, such occurrences are common and can mislead the model. This situation demands a direct intervention—to reduce the predicted value of

n_{j}

to approximately the average value near other incorrect labels, such as 0.1. This adjustment aims to accentuate the high prediction value of

n_{i}

, thereby providing clarity to the model.

Therefore, LFCF represents an enhancement to the conventional cross-entropy loss function by taking into account the states of incorrect labels. It aligns their predicted values towards relatively lower values, often approaching the average state, thereby reducing their interference in accurately training the model. In an ideal scenario, the predicted possibility values for the correct label approach 1, while those for multiple incorrect labels tend towards 0. Additionally, LFCF exhibits superior performance when the predicted possibility for the correct label is relatively low.

4. Experiment

4.1. Data Description

The HAZOP reports used in this paper are from the PDSP (Petrochemical Desulfurization and Sulfurization Process) and the PHHP (Petrochemical Hydrogenation and Catalytic Cracking Process) in Liaoyang, China. The PDSP and PHHP datasets were merged, de-duplicated, and corrected to obtain 1492 and 945 unique risk instances, respectively. Each dataset was independently used for training to validate the adaptability of the model across different datasets. Notably, the labels in the HAZOP dataset exhibit imbalance. In the dataset preprocessing stage, this study designated 70% of the text for each label as training data, and the remaining 30% was allocated to the validation set.

Where the hidden dimension in the BiLSTM layer model is 384 with two layers, TextCNN is used to extract local features with convolutional kernel sizes of [3, 4, 5] and the number of convolutional output channels is 77. Finally, n-gram features are extracted by pooling and convolutional layers, after the convolutional layer ReLU is used as the activation function, the optimizer used is AdamW with a learning rate of 1

\times 10^{- 5}

, and the data of some datasets are shown in Table 2 and Table 3:

4.2. Experimental Setup

This study, in its pursuit of evaluating the performance of the B-TBM model in risk analysis, juxtaposes the B-TBM model against the following comparative models:

BERT Alone (Baseline): Utilizing BERT in isolation, serving as a baseline.

BERT+TextCNN₁ Model: Leveraging only the final layer of BERT without employing BERT’s linear multiple outputs.

BERT+TextCNN₂ Model: Implementing BERT with linear multiple outputs.

BERT+BiLSTM Model: This experiment inputs all information from the final layer of the BERT model into a Bidirectional Long Short-Term Memory (BiLSTM) model, followed by a fully connected layer with Softmax activation.

BERT+TextCNN and BiLSTM₁ (Without LFCF): This model, proposed in the current study without the LFCF component.

B-TBM Model (Complete Model): The fully developed model proposed in this study.

Each experimental configuration underwent 90 epochs of training on the training set, with subsequent validation conducted on the validation set.

4.3. Metric

In order to evaluate the fit of HAZOP data under different statistical laws for natural language, we performed error analysis on Zipf’s law, Heaps’ law, and complexity measures. The study of error percent of validation tests is significant to assess the model’s performance on specific types of data, especially the performance when dealing with specialized and complex text. This helps to guide the further development and adaptation of the model to optimize its efficacy in specific applications, such as enhancing its accuracy and reliability in HAZOP reports. Calculated as Equation (16), the coefficient of determination

R^{2}

is a statistically important metric for evaluating the predictive effectiveness of a regression model, with a value between 0 and 1. Higher error percentages under Zipf’s law indicate that the HAZOP data differ significantly from the general-purpose corpus in terms of word frequency distribution, suggesting that HAZOP reports sparser and non-uniform lexical usage, reflecting its high degree of specialization and complexity. The coefficient of determination

R^{2}

and percentage of error under this law provide a quantitative perspective to assess how well the model handles this complex text structure, and lower

R^{2}

values and higher percentages of error may indicate the need for advanced text mining techniques or more sophisticated models that are better suited to handle sparse data. The lower percentage of error under Heaps’ law indicates that the model is able to simulate the vocabulary growth trend of HAZOP data better, which reflects that despite the specialized nature of the text, its linguistic diversity and richness are within the predictive power of the model. The high

R^{2}

values and low percentage of error show the effectiveness of the model in capturing vocabulary growth, implying that the model is suitable for dealing with datasets of growing vocabulary sizes, which is especially important in specialized texts where new vocabulary occurs frequently. The model is prone to confusion in recognizing these categories.

E = (1 - R^{2}) \times 100 %

(16)

To assess the performance of our model, we employ the widely utilized metrics of accuracy, recall, and F1 score, as defined in Equation (17). Accuracy serves as the primary indicator for evaluating the overall correctness of the model predictions. Specifically, TPs (True Positives) denote instances where both the actual and predicted results are true, TNs (True Negatives) correspond to cases where both actual and predicted results are false, FPs (False Positives) represent situations where the actual result is true while the predicted result is false, and FNs (False Negatives) encapsulate instances where the actual result is false while the predicted result is true. For multi-class problems, non-correct labels are uniformly considered as false in this context. While accuracy serves as a direct metric for evaluating the overall performance of our text classification model, its adequacy diminishes in the face of imbalanced datasets. Consequently, relying solely on accuracy can offer a somewhat one-sided assessment. To provide a more comprehensive evaluation, we concurrently utilize recall and F1 score. Recall assesses the possibility of correctly predicting each distinct label, offering a more holistic measure of our model’s predictive accuracy across diverse labels. F1 score, being the harmonic mean of precision and recall, is particularly advantageous in affording equal weight to both precision and recall, ensuring a balanced consideration of the two metrics. This approach proves especially valuable in scenarios where imbalances exist among classes, providing a more nuanced and robust evaluation of the model’s performance. Also, we incorporate ROC (Receiver Operating Characteristic) curves and AUC (Area Under the Curve) values into our evaluation metrics. The AUC value represents the area under the ROC curve, serving as a comprehensive measure of the model’s discriminatory ability across labels. A higher AUC value indicates a stronger classification capability, with the upper limit set at 1. The ROC curve and AUC values provide a nuanced perspective on the model’s overall performance, offering insights into its ability to distinguish between different labels and showcasing the discriminatory power across a spectrum of classification thresholds.

\begin{matrix} a c c u r a c y = \frac{T P + T N}{T P + F P + T N + F N} \\ r e c a l l = \frac{T P}{T P + F N} \\ p r e c i s i o n = \frac{T P}{T P + F P} \\ F 1 = \frac{2 \times p r e c i s i o n \times r e c a l l}{p r e c i s i o n + r e c a l l} \end{matrix}

(17)

An additional noteworthy metric employed in this study is the standard deviation of non-correct label prediction probabilities, utilized to assess the performance of the proposed enhanced cross-entropy loss function. During the training process, real-time predictions are obtained, and the standard deviation of the remaining data (after removing the data corresponding to the correct labels) is computed. A smaller standard deviation indicates less variation among the predictions for non-correct data, suggesting a more uniform distribution of probabilities. In other words, a smaller standard deviation implies a reduced impact of extreme predictions for non-correct labels on the predictions for correct labels. The formula for this computation is depicted in Equation (18).

S = \sqrt{\frac{\sum_{i = 1}^{n} {(x_{i} - \bar{x})}^{2}}{n - 1}}

(18)

5. Result

5.1. HAZOP Complexity

Figure 2 visually depicts the complexity of HAZOP language through a word cloud visualization of HAZOP reports. Notably, the term ‘explosion’ prominently stands out, underscoring the significance placed on the consequence of explosions in HAZOP analyses. Furthermore, due to the nature of PDSP and PHHP, terms such as ‘raw oil buffer tank’ and ‘surge tower’ are relatively prominent, emphasizing their salience in the hazard analysis. Additionally, the term ‘health’ reflects the attention given by HAZOP to the potential risks to human well-being resulting from consequences. Overall, the word cloud is dominated by specialized terminologies in the fields of safety, risk, and danger, markedly diverging from everyday communication.

Illustrated in Figure 3 is a meticulous quantification of the comparative linguistic intricacies between HAZOP language and the more conventional lexicon, employing the lens of Zipf’s law. Examining word frequencies, it becomes apparent that HAZOP language impeccably conforms to the principles encapsulated within Zipf’s law. However, a notable revelation emerges — the Zipf exponent for HAZOP (

α

= 0.5762) starkly contrasts with the exponent observed in everyday language (

α

= 1.0859). This discrepancy underscores a profound manifestation of the minimum effort principle within HAZOP, portraying a linguistic landscape where a heightened degree of sophistication necessitates a richer and more expansive vocabulary to articulate the nuanced intricacies inherent in HAZOP reports. Under Zipf’s law, the error percentage is 15.56%, which shows that the HAZOP data are less adaptable in the word frequency-word order relationship. This higher error value suggests that HAZOP data are significantly different from generalized data in terms of sparsity and complexity, and it is difficult to use Zipf’s law to accurately characterize the word frequency distribution of HAZOP data. The high error value also implies that the model may need stronger feature processing capability when dealing with high complexity and sparse risky text. In essence, the lower Zipf exponent unveils a linguistic depth within HAZOP, compelling the utilization of a diverse lexicon that extends beyond the customary linguistic boundaries observed in general language corpora.

From the perspective of lexical richness, HAZOP language exhibits adherence to Heaps’ law, manifesting a power-law growth in the richness of its vocabulary as the corpus size increases. Notwithstanding this conformity, the Heaps’ exponent for HAZOP (

β

= 11.1987) markedly surpasses that of general language (

β

= 4.9086), portraying HAZOP language as exceptionally enriched. This observation can be comprehended as follows: when comparing sentences of equal length in both general language and HAZOP language, the latter tends to encompass a more extensive array of distinct vocabulary to convey its inherent meaning. Under Heaps’ law, the error percentage is 1.64%, indicating that the performance of HAZOP data in terms of vocabulary growth are close to that of generalized data. The low error percentage reflects that Heaps’ law is more suitable for describing the vocabulary size expansion characteristics of HAZOP data, indicating that the model can still effectively capture the vocabulary growth patterns in scenarios with high text complexity. This phenomenon is intricately linked to the inherent complexity that HAZOP language seeks to communicate, and the elevated Heaps’ exponent accentuates the heightened richness embedded within HAZOP language.

When examining long-range correlation, the generalized Hurst exponent of HAZOP language consistently surpasses that of general language across various orders q. This intriguing observation suggests that HAZOP language exhibits a heightened degree of logical coherence, placing a premium on contextual logic and causality in its expressions. The elevated generalized Hurst exponent implies a more intricate and deliberate weaving of logical connections within HAZOP language. This, in turn, underscores the inherent complexity ingrained in the linguistic fabric of HAZOP, showcasing its propensity for a more thoughtful consideration of logical and causal relationships within the contextual framework.

In summary, our investigation, both qualitative and quantitative, underscores the inherent complexity of HAZOP. The intricacies embedded in its language, woven into the fabric of words and sentences, necessitate nuanced measures for risk mitigation. This calls for a more thoughtful approach beyond the simplistic application of deep learning models, particularly those trained under the umbrella of general language learning, such as BERT. Recognizing the intricate nature of HAZOP’s characteristics, our study emphasizes the imperative of tailored and context-aware strategies to address risks within this complex linguistic domain.

5.2. Performance of B-TBM

5.2.1. Accuracy

Table 4 and Figure 4 demonstrate the performance comparison of different models in the HAZOP text categorization task, specifically covering the classification accuracy on PDSP and PHHP datasets. Through the comparison and ablation experiments in this study, the B-TBM model shows significant advantages in handling the text categorization task on PDSP and PHHP datasets. Compared to traditional machine learning models (e.g., SVM, KNN, MLP) as well as boosting algorithm models (e.g., LightGBM, XGBoost, CatBoost), the B-TBM model achieves the best performance on both severity labeling and possibility labeling classification tasks. On the PDSP dataset, the B-TBM has 85.51% severity labeling accuracy and 84.22% possibility labeling accuracy; on the PHHP dataset, the severity labeling accuracy is as high as 89.92%, and the possibility labeling accuracy is 69.88%. The performance of the MLP model is relatively more specific. On the PHHP dataset, the MLP achieved a severity labeling accuracy of 84.86%, which was the highest among the compared models, but it performed mediocrely on possibility labeling classification at 58.10%. This result indicates that MLP has an advantage in handling severity labels, but it does not perform as well as other deep learning models in handling the complex task of possibility label classification. Compared to these models, the B-TBM model performs more comprehensively on both datasets. This performance not only outperforms the traditional models, but also outperforms the MLP and other boosting algorithm models.

The significant performance advantage of the B-TBM model can be attributed to its multiple key designs. First, compared to the baseline model that uses only the last layer of BERT, the B-TBM model uses all the layers of BERT, which enables it to capture more levels of semantic information. BERT’s multilayered feature representation model is able to extract deeper linguistic structures in complex textual tasks, which dramatically improves classification performance. For example, the BERT+TextCNN₁ model has a severity labeling accuracy of 82.17% on the PDSP dataset, while the BERT+TextCNN₂ model improves the accuracy to 83.35% by using all the layers of information from BERT. Secondly, B-TBM further enhances the ability to capture text features by combining deep learning modules such as TextCNN and BiLSTM. TextCNN helps to extract local features while BiLSTM is able to capture long-range dependencies between sequences. This combination significantly improves the performance of the model on both datasets, especially in the possibility labeling classification task where the accuracy performance is better than the BERT model alone. Most critically, the B-TBM model introduces a new loss function design. This loss function not only optimizes the model’s performance on the severity labeling classification task, but is also specifically designed for the possibility labeling classification task. This improvement substantially improves the model’s ability to handle unbalanced data and solves many of the deficiencies of the traditional loss function in the possibility labeling classification task, thus making the B-TBM model far superior to other models in terms of the accuracy of possibility label classification. For example, the B-TBM achieves a possibility labeling accuracy of 69.88% on the PHHP dataset, which is significantly better than the 65.91% of the baseline BERT model.

In summary, the advantages of the B-TBM model lie in the comprehensive use of all the hierarchical information of BERT, the combination of deep learning modules such as TextCNN and BiLSTM, and the further optimization of the model’s performance by a new loss function. The model not only performs well on the severity labeling classification task, but also shows significant advantages in the more complex possibility labeling classification task, proving its strong adaptability and generalization ability in text classification tasks. The results show that the B-TBM model not only improves the overall classification accuracy, but also has significant advantages in terms of the model’s classification stability and accuracy, making it suitable for handling complex HAZOP text classification tasks.

5.2.2. Confusion Matrix

Figure 5 and Figure 6 show the different model confusion matrices for severity and possibility predictions for PDSP, respectively. Detailed examination of these matrices provides clear insight into the predictions of models trained on the same data. The main diagonal of the matrices represents the count of correctly labeled predictions. It is worth noting that most of the deep learning models show proficiency in identifying correct labels, as evidenced by the salient values on the main diagonal. This consistency underscores the robustness of these models in accurately identifying and predicting correct labels in a given dataset.

Take Figure 5 severity as an example, the B-TBM model demonstrates a significant advantage in the comparison of the different models. Taking the B-TBM model (Figure 5a) as an example, it performs well in classifying multiple categories, especially on category 1 and category 2. The B-TBM model has 160 correctly classified samples in category 1 and 138 correctly classified samples in category 2, with a low misclassification rate of 8 samples being incorrectly classified as other categories, respectively. In contrast, the BERT+TextCNN and BiLSTM₁ model (Figure 5b) performs slightly less well in terms of classification, although it also performs well on category 1 and category 2 (161 and 142 correctly classified), it has more misclassifications on category 3, in particular 32 samples from category 2 are misclassified as category 3. More noticeable is the BERT model (Figure 5f), which has the most serious misclassification rate, with 44 samples of category 2 misclassified as category 3, showing the obvious limitation of BERT on fine-grained classification.

By comparison, the B-TBM model shows greater stability and accuracy when dealing with the complex HAZOP text classification task, especially between neighboring categories, and the misclassification rate is significantly lower than that of other models. This is due to the fact that B-TBM combines the semantic feature extraction of BERT, the global context modeling of BiLSTM, and the local feature capture of TextCNN, which enables a more comprehensive understanding of the complex patterns and subtle differences in text. Overall, the comprehensive classification ability of the B-TBM model is much higher than that of the other benchmark models, allowing it to show higher accuracy and robustness when dealing with multi-category classification tasks.

5.2.3. ROC Curve

Figure 7 illustrates the ROC curves for severity and possibility prediction of PDSP, comparing the B-TBM model with other benchmark models. The corresponding AUC values for all models are detailed in Table 5. Notably, the B-TBM model exhibits the highest AUC value among all compared models. This emphasizes the superior performance of the proposed model, which has been carefully designed based on the unique features of HAZOP text.

The analysis of the ROC curves and the comparison of the AUC (Area Under the Curve) values clearly reflect the differences in the performance of the different models in the severity labeling and possibility labeling classification tasks. In both figures (Figure 7a,b), the AUC values of traditional machine learning models such as SVM, KNN, MLP, LightGBM, XGBoost, and CatBoost show differences in their performances in these two tasks. For example, SVM has an AUC of 0.93 in the severity labeling classification task, but performs poorly in the possibility labeling classification task, with an AUC of 0.88. The performance of KNN is even more limited, in particular, the AUC of KNN is only 0.86 (severity classification) and 0.79 (possibility classification). In contrast, boosting algorithm models such as LightGBM, XGBoost, and CatBoost perform more strongly on both tasks, with AUC values generally ranging from 0.88 to 0.93, which demonstrates their strengths in handling complex tasks. However, compared to the traditional models, the BERT-based deep learning models show stronger advantages in the severity and possibility labeling classification tasks, especially in the ablation experiments, and the two figures in (Figure 7c,d) demonstrate the performance of BERT and its different variants of the model. The B-TBM model performs well on both tasks, with an AUC of 0.95 in the severity labeling classification task, and an AUC of 0.95 in the possibility labeling categorization task with an AUC of 0.96, which is significantly better than the other models. This suggests that the B-TBM model is able to better capture semantic information by integrating multi-level features and complex deep learning modules, especially on the possibility labeling classification task with high data complexity. Compared with the BERT baseline model, B-TBM improves significantly, with an AUC of only 0.90 for the severity task and 0.91 for the possibility labeling classification task when BERT is used alone. In the ablation experiments, the BERT variant model (BERT+BiLSTM+TextTCNN) combining TextCNN and BiLSTM has a better performance in both the severity labeling and the possibility labeling classification tasks, which both reached an AUC of 0.95, second only to B-TBM, indicating the effectiveness of TextCNN and BiLSTM modules for enhanced feature extraction. While other variants such as BERT+BiLSTM and BERT+TextCNN (1 and 2) have AUC values between 0.93 and 0.94 in the severity task, the possibility labeling task has an AUC of 0.91–0.94, which is better but slightly inferior to the B-TBM model. This reflects that the use of BERT full hierarchy combined with complex structure in capturing deep semantic features and long-distance dependencies significantly improves the model performance.

By comparing the specific AUC data, it is clear that the B-TBM model, with its complex architecture that integrates BERT’s multi-level features, TextCNN, and BiLSTM modules, and through the optimized loss function design, greatly improves the model’s performance in severity and possibility labeling classification tasks. In the possibility labeling classification task especially, the AUC of the B-TBM model reaches 0.96, which is significantly better than other traditional models and BERT variant models, which indicates that the B-TBM has a stronger performance and generalization ability when dealing with complex classification tasks.

5.2.4. Loss Functions

The model’s loss values in the severity and possibility labeling classification tasks are shown with training cycles in Figure 8, where Figure 8a–d depict the curves of the LFCF loss function and Focus loss function values under severity and possibility labels under the PDSP dataset as a function of the training cycles during the model’s prediction process; the blue curves denote the LFCF loss function curves, and the red curves indicate the focal loss function curves. The analysis of the four curve plots shows that the LFCF loss function shows obvious superiority in dealing with unbalanced data in the HAZOP risk classification task. Compared with the focal loss function, the LFCF makes the model focus more on the classification accuracy of the low-frequency category samples by assigning higher weights to a small number of categories, which is outstanding in both the severity and possibility labeling classification tasks. The traditional cross-entropy loss function often struggles to accurately classify low-frequency categories in the case of category imbalance, while the focal loss function, although advantageous in reducing the weights of easy-to-categorize samples and allowing the model to focus on difficult-to-categorize samples, is relatively complex in parameter tuning, which tends to lead to unstable training. In contrast, LFCF maintains smoother curve fluctuations throughout the training process, especially in the early stage when the loss value decreases rapidly and the subsequent fluctuations are smaller, which not only reflects its higher robustness, but also effectively avoids overfitting the model to extreme samples. In addition, in terms of fitting ability, the loss curve of LFCF maintains a smooth decline throughout the training cycle and shows continuous low fluctuations in the later stages, a property that indicates that LFCF possesses stronger adaptability to capture deeper features of the data and reduce the occurrence of extreme mispredictions. Combined with the loss curves shown in the figure, the excellent performance of LFCF loss function in dealing with complex and unbalanced datasets further validates its applicability and superiority in the HAZOP risk classification task.

The variation in the model’s standard deviation in the severity and possibility labeling classification task over the training cycle is demonstrated in Figure 9, where Figure 9a,b show the standard deviation curves of incorrect labels for severity and possibility labels under the PDSP dataset during the training process. When analyzing the standard deviation change plots under the PDSP and PHHP datasets, we can observe that the standard deviation of the LFCF loss function decreases faster at the beginning of the training period, which indicates that this loss function is able to speed up the convergence of the model. This is demonstrated in both severity and possibility classification tasks, especially in the PHHP dataset, where the standard deviation of the new loss function is lower than that of the original loss function as a whole, suggesting that it is more effective in controlling the volatility of the training process. In addition, the LFCF loss function maintains a lower standard deviation throughout the training cycle, especially in the later stages of training, showing better stability. This lower volatility and consistently stable output is particularly important for building reliable predictive models, as it implies that the model is able to maintain consistent performance when dealing with different types of input data. Overall, the LFCF loss function demonstrates superior robustness and fit by maintaining a low and stable standard deviation during training.

5.2.5. Recall and F1

As can be seen in Figure 10, the two HAZOP datasets have significant data imbalance problems on the severity and possibility labels, and the difference in the distribution of the possibility labels is especially obvious. The large difference in the sample sizes for different levels of labels may lead to poor predictive performance of the model on certain labels. Nonetheless, Figure 11 and Figure 12 show that the proposed model performs better in terms of recall and F1 score, and maintains a high classification effect for most labels even in the presence of data imbalance. The model performs well in PDSP-related HAZOP reports, and its recall and F1 score are generally maintained above 80%, reflecting the stability and accuracy of the model in these categories. However, for PHHP-related HAZOP reports, especially in the categorization of possibility labels, the model does not perform well. Recall and F1 scores for low-frequency categories are significantly low, in some cases below 60%. This suggests that data imbalance negatively impacts model performance on these categories and that the model is inconsistent in processing possibility labels in the plus PHHP reports.

Although these low-frequency classifications still outperform the case using traditional loss functions such as cross-entropy loss and focus loss, the model’s classification accuracy and consistency show some robustness on the unbalanced dataset. The model shows strong adaptability in diverse HAZOP scenarios as a whole, especially in the prediction of severity labels, which demonstrates stable and excellent results. However, there is still room for improving the model’s performance when dealing with low-frequency possibility labels to cope with the semantic differences and data imbalance challenges that may be encountered in complex industrial safety scenarios. Compared with traditional methods, the model is able to capture key features in low-frequency categories more effectively, thus significantly improving the accuracy and consistency of classification. Although the prediction of possibility labels is still insufficient in PHHP reporting scenarios, the model still possesses good generalization ability and reliability in a wide range of industrial security applications.

6. Disccussion

A comparative analysis of multiple models demonstrates the significant advantages of the B-TBM model in the HAZOP text classification task. The confusion matrix shows that the number of misclassifications in the classification of severity and possibility labels by the B-TBM model is significantly less than that of other models, such as BERT, BERT+TextCNN, and BERT+BiLSTM. The reason for this is that the B-TBM model combines the semantic feature extraction capability of BERT, the global context modeling capability of BiLSTM, and the local feature capturing ability, which enables it to better identify and distinguish different categories in complex scenes, while other models often have difficulty in handling the nuances in multi-category classification. In particular, the BERT model, despite its semantic modeling capability, lacks contextual information processing, resulting in poor classification performance; BERT+TextCNN and BERT+BiLSTM improve the performance to a certain extent, but still have limitations in coping with data imbalance. The ROC curves further demonstrate the superiority of the B-TBM model, whose severity and possibility classification AUC values of 0.95 and 0.96, which are significantly higher than those of 0.90 and 0.91 for the BERT model, showing its stronger classification ability and accuracy. The loss function curves show that the B-TBM significantly accelerates the convergence speed of the model by introducing the LFCF loss function and maintains a low loss value throughout the training process, reducing the occurrence of extreme false predictions. Compared to the traditional cross-entropy loss, LFCF can effectively deal with the data imbalance problem and avoid model overfitting to a few categories. In addition, in terms of recall and F1 score, B-TBM performs well in several categories, especially in FGD and sulfur-related HAZOP reports where the data are more balanced. However, in the hydrocracking and FCC-related reports, the model’s performance drops slightly, particularly in the possibility label predictions, suggesting that data imbalance remains a key factor in model performance. Nonetheless, B-TBM demonstrates excellent adaptability and robustness when dealing with unbalanced datasets and complex multi-category classification tasks, providing valuable direction for future research on further optimizing the data imbalance problem.

6.1. Practical Applications

The practical applications of the B-TBM classification system are explored in depth to enhance the reliability and safety assessment of HAZOP. Currently, the classification system has been skillfully used in several key applications, including but not limited to the following:

1.: It can assist teams of experts performing safety analysis during raw processing to help identify risks in decision making. Complex interconnections and cause and effect relationships present HAZOP challenges, especially when dealing with new processes that require significant labor and time. Our classification system helps mitigate this challenge as a valuable aid. During the calibration of analysis results, experts can strategically cross-validate risks, noting any inconsistencies with the inferences provided by the classification system. This approach not only reduces labor costs, but also reduces the potential for human error.
2.: It can assist engineers during operations that are already in production, where additional risks may arise that have not yet been analyzed by a team of experts. Fortunately, engineers are supported by our classification system, which enables them to carry out qualitative analyses in advance and quickly take appropriate solutions and subsequent management measures.
3.: It can give guidance for related businesses, providing guidance for the initiation of HAZOP for processes in affiliated businesses, especially small-scale businesses and independent processes that often conduct security analyses that may lack comprehensiveness. Our categorization system incorporates HAZOP knowledge from large-scale enterprises, enabling it to provide effective guidance that enhances the reliability of these enterprises’ processes.

6.2. Outlook

Currently, the B-TBM model performs well on the severity and possibility classification tasks in HAZOP reports. However, the model’s ability to generalize to different industrial domains is limited, mainly due to the fact that migration learning strategies have not yet been integrated. Migration learning can help the model to utilize existing knowledge without retraining and adapt to risk classification tasks in new domains. With the introduction of migration learning, the B-TBM model will be expected to expand its ability to be applied in different industries, such as chemical, energy, and pharmaceuticals, thus improving its adaptability in diverse industrial contexts.

Although the B-TBM model achieves high accuracy and robustness on the HAZOP dataset, the model still struggles to fully capture subtle contextual differences in some multi-category classification tasks. Specifically, subtle semantic differences between different categories may affect the classification accuracy in some specific contexts. The current model mainly relies on BiLSTM and BERT embedding techniques, which may not be able to recognize all the subtle differences in some complex scenarios. In the future, the semantic perception ability of the model can be enhanced by introducing the attention mechanism or optimizing the embedding layer, so as to improve the performance of multi-category classification in complex scenarios.

In addition, the B-TBM model relies on expert validation during the current classification process to ensure the consistency and accuracy of the results. While this manual intervention improves the reliability of the model, it also limits its automation potential and increases labor costs. In fact, the model has demonstrated high consistency and reliability in specific contexts, showing that it is able to maintain better classification results with reduced expert involvement. In order to further reduce the dependence on experts, the model parameters can be fine-tuned in the future to enhance its autonomous classification capability. For example, optimizing hyperparameters and improving data preprocessing methods can enhance the model’s adaptivity in dealing with inconsistent classification, thus reducing the need for external validation. When combined with active learning methods, the model is expected to perform better in recognizing classification biases and making self-adjustments. The reliance on expert involvement can be further reduced through real-time monitoring and feedback loops that allow the model to progressively improve its accuracy over the course of its operation.

Despite the significant progress of the B-TBM model in HAZOP risk assessment, there is still room for further improvement in responding to the needs of multi-domain, multi-category, and multi-context applications. Future research will focus on exploring how to enhance the model’s automation capability and cross-domain adaptability through strategies such as transfer learning and active learning to ensure its effectiveness and scalability in more industrial scenarios.

7. Conclusions

HAZOP reports are written documents focused on analyzing the safety of manufacturing processes, which rely heavily on the expertise and experience of experts. However, the large amount of information in these reports is usually in the form of unstructured text, leading to under-utilization of their potential value and, in particular, to significant limitations in automated risk analysis. To address this issue, this paper first delves into the complexity of HAZOP texts from the perspective of statistical laws of natural language (e.g., Zipf’s law and long-range correlation), revealing underlying patterns and linguistic features in the texts. On this basis, this paper proposes a novel risk classification model B-TBM and designs an improved cross-entropy loss function, which significantly improves the classification accuracy of severity and possibility labels. Validated by a series of extensive experiments, our model exhibits superior classification ability and stability, especially when dealing with complex and unbalanced datasets. In addition, the model proposed in this paper can not only be effectively used for risk classification of HAZOP reports, but can also provide technical support for automated classification of daily safety text records in factories and play a supporting role in the employee training process. Although the B-TBM model can only solve the severity and possibility classification problems in similar HAZOP reports at present, in the future, we plan to introduce the transfer learning technology into the model to provide it with cross-domain general analysis capability, so that it can be applied to the analysis of HAZOP reports in different industrial fields, further extending its scope of application and usefulness. This development will significantly improve the efficiency of safety report analysis and promote the intelligent process of industrial safety management.

Author Contributions

B.X. proposed the writing and experiment of model code and the overall framework of the paper. D.G. participated in and revised the experimental paper. D.L. participated in the revision and review of the paper. B.Z. participated in the experiment. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Ningbo Key Technology Breakthrough Plan Project of ’Science and Technology Innovation Yongjiang 2035’ (No. 2024Z256) and the Key R&D and Transformation Plan Project of Qinghai 889 Province (No. 2023-QY-215).

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author/s.

Acknowledgments

The authors would like to thank the anonymous reviewers for careful reading and helpful remarks, and for making many contributions in improving the quality of this paper.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

No.	Description	Severity	Possibility
Risk#1	The liquid level in the liquid separation tank D-5612118 of the gas compressor is too high, causing the compressor to carry liquid and equipment damage. In severe cases, it can cause process hazards and excessive liquid entrainment in the upstream gas phase	2	2
Risk#2	The liquid level of the underground solvent tank V-9306 in the public engineering section may be low, and the false indication of the liquid level in the underground solvent tank may cause pump evacuation in severe cases	3	4

References

Suzuki, T.; Izato, Y.; Miyake, A. Identification of accident scenarios caused by internal factors using HAZOP to assess an organic hydride hydrogen refueling station involving methylcyclohexane. J. Loss Prev. Process Ind. 2021, 71, 104479. [Google Scholar] [CrossRef]
Zhu, L.; Ma, H.; Huang, Y.; Liu, X.; Xu, X.; Shi, Z. Analyzing construction workers’ unsafe behaviors in hoisting operations of prefabricated buildings using HAZOP. Int. J. Environ. Res. Public Health 2022, 19, 15275. [Google Scholar] [CrossRef] [PubMed]
Ahn, J.; Chang, D. Fuzzy-based HAZOP study for process industry. J. Hazard. Mater. 2016, 317, 303–311. [Google Scholar] [CrossRef] [PubMed]
Meng, Y.; Song, X.; Zhao, D.; Liu, Q. Alarm management optimization in chemical installations based on adapted HAZOP reports. J. Loss Prev. Process Ind. 2021, 72, 104578. [Google Scholar] [CrossRef]
Dunjó, J.; Fthenakis, V.; Vílchez, J.A.; Arnaldos, J. Hazard and operability (HAZOP) analysis. A literature review. J. Hazard. Mater. 2010, 173, 19–32. [Google Scholar] [CrossRef]
Yousofnejad, Y.; Afsari, F.; Es’haghi, M. Dynamic risk assessment of hospital oxygen supply system by HAZOP and intuitionistic fuzzy. PLoS ONE 2023, 18, e0280918. [Google Scholar] [CrossRef]
Cheraghi, M.; Eslami Baladeh, A.; Khakzad, N. Optimal selection of safety recommendations: A hybrid fuzzy multi-criteria decision-making approach to HAZOP. J. Loss Prev. Process Ind. 2022, 74, 104654. [Google Scholar] [CrossRef]
Wu, J.; Song, M.; Zhang, X.; Lind, M. Safeguards identification in computer aided HAZOP study by means of multilevel flow modelling. Proc. Inst. Mech. Eng. Part O J. Risk Reliab. 2023, 237, 922–946. [Google Scholar] [CrossRef]
Zhang, H.; Zhang, B.; Gao, D. A new approach of integrating industry prior knowledge for HAZOP interaction. J. Loss Prev. Process Ind. 2023, 82, 105005. [Google Scholar] [CrossRef]
Xu, K.; Hu, J.; Zhang, L.; Chen, Y.; Xiao, R.; Shi, J. A risk factor tracing method for LNG receiving terminals based on GAT and a bidirectional LSTM network. Process Saf. Environ. Prot. 2022, 170, 694–708. [Google Scholar] [CrossRef]
Ricketts, J.; Pelham, J.; Barry, D.; Guo, W. An NLP framework for extracting causes, consequences, and hazards from occurrence reports to validate a HAZOP study. In Proceedings of the 2022 IEEE/AIAA 41st Digital Avionics Systems Conference (DASC), Portsmouth, VA, USA, 18–22 September 2022; pp. 1–8. [Google Scholar]
Jia, Y.; Lawton, T.; McDermid, J.; Rojas, E.; Habli, I. A framework for assurance of medication safety using machine learning. arXiv 2021, arXiv:2101.05620. [Google Scholar]
Wang, Z.; Ren, M.; Gao, D.; Li, Z. A Zipf’s law-based text generation approach for addressing imbalance in entity extraction. J. Inf. 2023, 17, 101453. [Google Scholar] [CrossRef]
Peng, L.; Gao, D.; Bai, Y. A study on standardization of security evaluation information for chemical processes based on deep learning. Processes 2021, 9, 832. [Google Scholar] [CrossRef]
Zhao, Y.; Zhang, B.; Gao, D. Construction of petrochemical knowledge graph based on deep learning. J. Loss Prev. Process Ind. 2022, 76, 104736. [Google Scholar] [CrossRef]
Zhang, M.; Chen, W.; Zhang, Y.; Liu, F.; Yu, D.; Zhang, C.; Gao, L. Fault diagnosis of oil-immersed power transformer based on difference-mutation brain storm optimized CatBoost model. IEEE Access 2021, 9, 168767–168782. [Google Scholar] [CrossRef]
Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. In Human Language Technologies, Volume 1 (Long and Short Papers), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, Minneapolis, MN, USA, 2–7 June 2019; Association for Computational Linguistics: Stroudsburg, PA, USA, 2019. [Google Scholar]
Kim, Y. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; Association for Computational Linguistics: Doha, Qatar, 2014; pp. 1746–1751. [Google Scholar]
Graves, A.; Schmidhuber, J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 2005, 18, 602–610. [Google Scholar] [CrossRef] [PubMed]
Joachims, T. Text categorization with support vector machines: Learning with many relevant features. In Proceedings of the 10th European Conference on Machine Learning (ECML ’98), Chemnitz, Germany, 21–23 April 1998; Springer: Berlin/Heidelberg, Germany, 1998; pp. 137–142. [Google Scholar]
Fix, E.; Hodges, J.L. Discriminatory Analysis: Nonparametric Discrimination: Consistency Properties. Int. Stat. Rev. 1989, 57, 238–247. [Google Scholar] [CrossRef]
Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Liu, T.Y. LightGBM: A highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst. 2017, 30, 3146–3154. [Google Scholar]
Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased boosting with categorical features. Adv. Neural Inf. Process. Syst. 2018, 31, 6638–6648. [Google Scholar]
Jawahar, G.; Sagot, B.; Seddah, D. What does BERT learn about the structure of language? In Proceedings of the ACL 2019—57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019. [Google Scholar]
Samela, C.; Carisi, F.; Domeneghetti, A.; Petruccelli, N.; Castellarin, A.; Iacobini, F.; Brath, A. A methodological framework for flood hazard assessment for land transport infrastructures. Int. J. Disaster Risk Reduct. 2023, 85, 103491. [Google Scholar] [CrossRef]
Akay, H. Flood hazards susceptibility mapping using statistical, fuzzy logic, and MCDM methods. Soft Comput. 2021, 25, 9325–9346. [Google Scholar] [CrossRef]
Li, Y.; Wang, H.; Bai, K.; Chen, S. Dynamic intelligent risk assessment of hazardous chemical warehouse fire based on electrostatic discharge method and improved support vector machine. Process Saf. Environ. Prot. 2021, 145, 425–434. [Google Scholar] [CrossRef]
Tian, D.; Li, M.; Han, S.; Shen, Y. A novel and intelligent safety-hazard classification method with syntactic and semantic features for large-scale construction projects. J. Constr. Eng. Manag. 2022, 148, 04022109. [Google Scholar] [CrossRef]
Wang, F.; Gu, W.; Bai, Y.; Bian, J. A method for assisting the accident consequence prediction and cause investigation in petrochemical industries based on natural language processing technology. J. Loss Prev. Process Ind. 2023, 83, 105028. [Google Scholar] [CrossRef]
Feng, X.; Dai, Y.; Ji, X.; Zhou, L.; Dang, Y. Application of natural language processing in HAZOP reports. Process Saf. Environ. Prot. 2021, 155, 41–48. [Google Scholar] [CrossRef]
Zhang, F.; Wang, B.; Gao, D.; Yan, C.; Wang, Z. When grey model meets deep learning: A new hazard classification model. Inf. Sci. 2024, 670, 120653. [Google Scholar] [CrossRef]
Wang, Z.; Wang, B.; Ren, M.; Gao, D. A new hazard event classification model via deep learning and multifractal. Comput. Ind. 2023, 147, 103875. [Google Scholar] [CrossRef]
Ekramipooya, A.; Boroushaki, M.; Rashtchian, D. Predicting possible recommendations related to causes and consequences in the HAZOP study worksheet using natural language processing and machine learning: BERT, clustering, and classification. J. Loss Prev. Process Ind. 2024, 89, 105310. [Google Scholar] [CrossRef]
Rezashoar, S.; Kashi, E.; Saeidi, S. A hybrid algorithm based on machine learning (LightGBM-Optuna) for road accident severity classification (case study: United States from 2016 to 2020). Innov. Infrastruct. Solut. 2024, 9, 319. [Google Scholar] [CrossRef]
Xie, J.; Li, Z.; Zhou, Z.; Liu, S. A novel bearing fault classification method based on XGBoost: The fusion of deep learning-based features and empirical features. IEEE Trans. Instrum. Meas. 2020, 70, 1–9. [Google Scholar] [CrossRef]
Walczak, M.; Poniszewska-Marańda, A.; Stepień, K. Classification of events in selected industrial processes using weighted key words and K-nearest neighbors algorithm. Appl. Sci. 2023, 13, 10334. [Google Scholar] [CrossRef]
Orrù, P.F.; Zoccheddu, A.; Sassu, L.; Mattia, C.; Cozza, R.; Arena, S. Machine learning approach using MLP and SVM algorithms for the fault prediction of a centrifugal pump in the oil and gas industry. Sustainability 2020, 12, 4776. [Google Scholar] [CrossRef]
Wang, F.; Gu, W. Intelligent HAZOP analysis method based on data mining. J. Loss Prev. Process Ind. 2022, 80, 104911. [Google Scholar] [CrossRef]
Jang, B.; Kim, M.; Harerimana, G.; Kang, S.-u.; Kim, J.W. Bi-LSTM model to increase accuracy in text classification: Combining Word2vec CNN and attention mechanism. Appl. Sci. 2020, 10, 5841. [Google Scholar] [CrossRef]

Figure 1. Framework of B-TBM, which stands for BERT-based Text Classification Model. This model integrates BERT, TextCNN, BiLSTM, and the LFCF loss function to enhance the classification of HAZOP risk events.

Figure 2. Visualization of Word Clouds in PDSP and PHHP Dataset.

Figure 3. Quantitative Comparison of Language Complexity. The dot lines refer to the true values.

Figure 4. Accuracy Trend Line Variation for Several Comparison Models On Various HAZOP Reports.

Figure 5. Confusion Matrix for Different Model Severity Classification Results for PDSP Dataset. (a) B-TBM model; (b) BERT+TextCNN and BiLSTM₁ model; (c) BERT+BiLSTM model; (d) BERT+TextCNN₁ model; (e) BERT+TextCNN₂ model; (f) BERT model.

Figure 6. Confusion Matrix for Different Model Possibility Classification Results for PDSP Dataset. (a) B-TBM model; (b) BERT+TextCNN and BiLSTM₁ model; (c) BERT+BiLSTM model; (d) BERT+TextCNN₁ model; (e) BERT+TextCNN₂ model; (f) BERT model.

Figure 7. ROC Curves for Severity and Possibility Prediction for PDSP Dataset. (a) Severity of traditional machine learning and gradient algorithm models to predict ROC curves under the PDSP dataset. (b) Possibility of traditional machine learning and gradient algorithm models to predict ROC curves under the PDSP dataset. (c) Severity prediction ROC curves based on Bert model ablation experiments under the PDSP dataset. (d) Possibility prediction ROC curves based on Bert model ablation experiments under the PDSP dataset.

Figure 8. Severity and Possibility Loss Curves for Focus Loss Function and LFCF Loss Function for PDSP and PHHP Datasets. (a) Severity loss function curves on the PDSP dataset. (b) Possibility loss function curves on the PDSP dataset. (c) Severity loss function curves on the PHHP dataset. (d) Possibility loss function curves on the PHHP dataset.

Figure 9. Comparison of Standard Deviation between Traditional Cross-Entropy Loss Function and LFCF Loss Function on PDSP and PHHP Datasets. (a) Standard Deviation Curves for Severity under the PDSP Dataset. (b) Standard Deviation Curves for Possibility under the PDSP Dataset. (c) Standard Deviation Curves for Severity under the PHHP Dataset. (d) Standard Deviation Curves for Possibility under the PHHP Dataset.

Figure 10. Classification of Different Labels on PDSP and PHHP HAZOP Datasets. (a) Distribution of PDSP. (b) Distribution of PHHP.

Figure 11. B-TBM Model Recall Rates and F1 Scores for Different Labels in PDSP Predictions.

Figure 12. B-TBM Model Recall Rates and F1 Scores for Different Labels in PHHP Predictions.

Table 1. Commonly Used Text Classification Models.

Model Name	Structural Features	Advantages	Disadvantages	Applicable Scenarios
BERT [17] (Bidirectional Encoder Representations from Transformers)	Bidirectional Transformer, taking context into account	Excellent ability to capture contextual semantics, widely used in a variety of NLP tasks	Almost all text classification tasks, especially suitable for long and complex text	Various text classification tasks, especially complex texts
TextCNN [18] (Text Convolutional Neural Networks)	Uses convolutional layers to capture n-gram features	Captures local features effectively and is computationally efficient	Short text classification, e.g., news classification, comment classification	Short text classification tasks
BiLSTM [19] (Bidirectional Long Short-Term Memory)	Processes sequence data through long and short-term memory networks	Suitable for processing sequence-dependent text	Requires processing of contextual tasks, such as long text classification	Long text classification tasks
SVM [20] (Support Vector Machine)	High-dimensional linear classification	Suitable for small-scale datasets, high efficiency and high accuracy	Requires manual feature extraction and cannot handle contextual information	Small-scale text classification tasks
KNN [21] (K-Nearest Neighbors)	Instance-based classification	Easy to understand and implement	Computationally inefficient, has difficulty handling large-scale datasets	Small-scale text classification tasks
MLP [22] (Multilayer Perceptron)	Multilayer perceptron (Neural Network) for various tasks	Suitable for nonlinear classification problems	Requires a lot of data and time for training	Suitable problems with large amounts of data that cannot be solved by linear models
LightGBM [23] (Light Gradient Boosting Machine)	A framework based on gradient boosting decision trees for large-scale datasets	Fast training, low memory usage, support for category imbalance	Sensitive to hyperparameter tuning	Suitable for scenarios requiring fast training and high performance
XGBoost [24] (eXtreme Gradient Boosting)	Optimization models based on gradient boosted trees	Robust performance and robustness with automatic missing value handling	Relatively long training time and high model complexity	Suitable for large-scale datasets and scenarios with a high number of features
CatBoost [25] (Category Boosting)	Decision tree modeling based on gradient boosting	Good processing of class features and fast training speeds	Higher memory requirements and more parameters	Suitable for scenarios with many category features and large amount of data

Table 2. Petrochemical Desulfurization and Sulfurization Process (PDSP) Partial Data.

Description	Severity	Possibility
High temperatures in parts of the diesel output unit and excessive flow of refined diesel fuel, which in severe cases affects the operation of the diesel tank area.	1	2
Reaction effluent through the cold high-pressure separator, hydraulic turbine to the bottom of the low-pressure separator outlet pipeline part, cold high-pressure separator V-8107 boundary level is too high, a large number of raw oil with water, system pressure fluctuations, catalyst strength decreased, affecting product quality.	2	4
Water injection tank V-8110 liquid level is too high, serious water injection tank V-8110 overpressure, explosion.	3	2
Reaction feed through the reaction effluent/reaction feed heat exchanger, reaction feed heating furnace to the reactor inlet pipeline part, the fuel gas flow rate is too high, the fuel gas pipeline network pressure is too high, the temperature of the furnace tube rises, the temperature of the furnace outlet increases, and in severe cases, the reactor fly temperature.	4	3
Fractionation tower bottom oil through the bottom of the fractional distillation tower reboiler to return to the tower pipeline part, the bottom of the fractional distillation tower reboiler outlet temperature is high, the bottom of the fractional distillation tower oil flow rate is too low, and in severe cases, the stove pipe burnt.	5	3

Table 3. Petrochemical Hydrogenation and Catalytic Cracking Process (PHHP) Selected Data.

Description	Severity	Possibility
Hydrogen mixing oil is heated by the reaction effluent/mixed feed heat exchanger and reaction feed heater, and then enters the pipeline part of the hydrofinishing reactor. Abnormal inspection and maintenance, burnt and deflected flow of the furnace tube, and in serious cases, burnt the furnace tube, and the plant was shut down.	3	1
Reaction to the water injection part of the water injection tank V-8110 liquid level is too high, serious water injection tank V-8110 overpressure, explosion.	3	2
Circulating hydrogen through the circulating hydrogen inlet separator tank, circulating hydrogen desulfurization tower and circulating hydrogen compressor inlet separator tank to the circulating hydrogen compressor outlet pipeline part of the underground solvent tank V-8305 liquid level is too high, the accompanying pipeline leakage, and in severe cases, the tank is full to the flare system.	1	3
Hydrogen through the new hydrogen dechlorination tank, new hydrogen compressor to the cycle of hydrogen compressor outlet pipeline part, repair new hydrogen compressor C-8102A∼C, the gas valve valve plate failure, cut the machine to repair.	1	4
Low-minute oil passes through the reaction effluent/low-minute oil heat exchanger to the main stripper tower and the top return portion of the tower, where the main stripper tower T-8201 pressure is too low and the low-minute oil component becomes heavy and fills the tower.	2	5

Table 4. PDSP and PHHP Classification Results in Different Models.

Model	Method	PDSP		PHHP
Model	Method	Severity Label Accuracy (%)	Possibility Tag Accuracy (%)	Severity Label Accuracy (%)	Possibility Tag Accuracy (%)
1	SVM	74.78	74.11	78.87	57.39
2	KNN	70.31	68.97	73.94	61.27
3	MLP	72.10	68.71	84.86	58.10
4	LightGBM	77.09	77.23	76.76	60.21
5	XGBoost	74.55	77.68	75.10	59.86
6	CatBoost	74.55	77.01	77.46	62.98
7	BERT	80.41	81.45	86.89	65.91
8	BERT+TextCNN₁	82.17	81.98	87.69	66.33
9	BERT+TextCNN₂	83.35	82.18	88.49	67.14
10	BERT+BiLSTM	82.21	82.28	88.10	67.69
11	BERT+TextCNN and BiLSTM₁	84.01	83.09	89.18	68.68
12	B-TBM	85.51	84.22	89.92	69.88

Table 5. AUC of Different Models for PDSP Dataset.

Index	Method	AUC Values for Severity Labels	AUC Values for Possibility Labels
1	SVM	0.93	0.88
2	KNN	0.86	0.79
3	MLP	0.90	0.79
4	LightGBM	0.93	0.88
5	XGBoost	0.91	0.86
6	CatBoost	0.93	0.88
7	BERT	0.90	0.91
8	BERT+TextCNN₁	0.94	0.94
9	BERT+TextCNN₂	0.93	0.94
10	BERT+BiLSTM	0.94	0.95
11	BERT+TextCNN and BiLSTM₁	0.95	0.95
12	B-TBM	0.95	0.96

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xu, B.; Lu, D.; Gao, D.; Zhang, B. B-TBM: A Novel Deep Learning Model with Enhanced Loss Function for HAZOP Risk Classification Using Natural Language Statistical Laws. Processes 2024, 12, 2373. https://doi.org/10.3390/pr12112373

AMA Style

Xu B, Lu D, Gao D, Zhang B. B-TBM: A Novel Deep Learning Model with Enhanced Loss Function for HAZOP Risk Classification Using Natural Language Statistical Laws. Processes. 2024; 12(11):2373. https://doi.org/10.3390/pr12112373

Chicago/Turabian Style

Xu, Binxin, Duhui Lu, Dong Gao, and Beike Zhang. 2024. "B-TBM: A Novel Deep Learning Model with Enhanced Loss Function for HAZOP Risk Classification Using Natural Language Statistical Laws" Processes 12, no. 11: 2373. https://doi.org/10.3390/pr12112373

APA Style

Xu, B., Lu, D., Gao, D., & Zhang, B. (2024). B-TBM: A Novel Deep Learning Model with Enhanced Loss Function for HAZOP Risk Classification Using Natural Language Statistical Laws. Processes, 12(11), 2373. https://doi.org/10.3390/pr12112373

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

B-TBM: A Novel Deep Learning Model with Enhanced Loss Function for HAZOP Risk Classification Using Natural Language Statistical Laws

Abstract

1. Introduction

2. Related Work

2.1. Risk Classification

2.2. Statistical Laws of Natural Language

3. Method

3.1. Complexity Measurement

3.2. B-TBM

3.2.1. BERT Layer: Capturing Multi-Level Semantic and Contextual Information

3.2.2. BiLSTM Layer: Capturing Long-Term Dependency Information

3.2.3. TextCNN Layer: Capturing Local Contextual Information

3.2.4. Feature Splicing and Fully Connected Layer

3.2.5. LFCF Loss Function

4. Experiment

4.1. Data Description

4.2. Experimental Setup

4.3. Metric

5. Result

5.1. HAZOP Complexity

5.2. Performance of B-TBM

5.2.1. Accuracy

5.2.2. Confusion Matrix

5.2.3. ROC Curve

5.2.4. Loss Functions

5.2.5. Recall and F1

6. Disccussion

6.1. Practical Applications

6.2. Outlook

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI