1. Introduction
Hazardous materials (hazmats) not only play crucial roles in critical areas of industrial sectors, such as heavy metal industries, agriculture, pharmaceutical industries, and oil and fuel industries, they also constitute a significant part of our domestic lives [
1,
2,
3,
4]. An unprecedented boom in hazmat usage has resulted in a dramatic growth in accidents caused by road hazmat transportation. The degree of danger of transport together with the risk of life is increasing. In light of possible catastrophic losses of lives and economics, as well as damages to the environment, there is an urgent theoretical and practical need to reduce hazmat transportation accidents (HTAs) associated with consequence and severity, which has drawn increasing concerns from the public [
5,
6,
7]. In contrast to general traffic accidents, HTAs can lead to more severe results due to the inherent physical and chemical instabilities of the substances during the road transportation process. Thus, the catastrophic accidents involving road hazmat transportation have often been termed as low probability with high consequence. Previous studies have examined the social risk generated from hazmat transportation. However, more research is needed regarding how the decision-makers should react to the implications of the severity of the accident consequences, and which factors/aspects should be paid more attention since they could intuitively contribute to the HTAs. These problems can be solved by proposing a comprehensive analysis framework with integrations of multiple effective methods to understand the accident characteristics, causes, and severities for implementing adaptive management policies. Meanwhile, concerning the inherent uncertainty existing in the database, which may lead to fails in decision making, more advanced severity analysis and evaluation methods are required to be developed to address the complexity that has been ignored in previous studies. Moreover, the influencing factors that affect accident consequences are miscellaneous, and they interact in complex ways, making it difficult to disentangle the relationships between accident consequence and the variables by conventional methods, especially in the transportation of hazmats. The application of machine learning techniques is desired and challenging in the field of HTA, which can be expected to achieve a relatively satisfying performance in cause–consequence analysis.
Moreover, China is a major country producing and using chemicals, whose outputs of several species of hazmats rank first all over the world. The hazmat industry has become one of the pillar industries of China’s national economy, whose market scale is estimated to reach 10.128 × 109 Chinese Yuan in the next five years; meanwhile, the gross benefit of hazmats keeps growing at an annual average rate of around 5%. Accordingly, increasing numbers of HTAs in recent years has put the issue of hazmat transportation management high on China’s public and policy agendas. It calls for public, comprehensive, and accurate data on which an effective HTA management system can be constructed. In terms of HTAs in China, no specialized accident database is available, much less investigation that clarifies China’s current situation.
The objective of this work is to develop an integrated research framework for HTA associated with risk analysis. It firstly presents the trends and the characteristics of fatal HTAs to identify the related affecting factors between the years 2004 and 2018. Based on the analysis, the study models the accident causes and the corresponding consequences using the decision tree approach. Then, an enhanced F-N curve-based analysis method is proposed under uncertainty and is applied to several hazmat transportation-related data sets. The F-N curve is one of the important tools for severity analysis and risk judgment by describing the relationship between cumulative probability F and number of deaths N. These results are compared for comprehensively evaluating the severity and understanding of the associated implications of HTAs.
In response to the current research challenges, the main contributions of the study include the following three aspects. First, the study proposes a comprehensive research framework for characteristics, causes, and severities analysis of HTAs, so that the adaptive management policies can be implemented. Second, the study develops an enhanced F-N curve establishment method while addressing the inherent uncertainty in severity analysis. With the introduction of the as low as reasonably practicable (ALARP) principle, the possible severity of hazmat accidents by road transportation can be evaluated under certain confidence levels, which has been ignored in previous research. Third, the study extends the machine learning techniques in the application of hazmat transportation management, with the integration of multiple data analysis methods into the proposed research framework. In particular, the decision tree method is used to quantitatively describe the complex relationship between the accident consequences and key influencing factors.
The rest of the study is organized as follows.
Section 2 presents the state of the art that is divided into three subsections.
Section 3 describes the source of HTA data.
Section 4 summarizes the research methods.
Section 5 analyzes the temporal and spatial characteristics of HTAs. Then, we further investigate the relation among the influencing factors and HTA consequences using the decision tree approach in
Section 6.
Section 7 presents the proposed enhanced F-N curve method under uncertainty and the applications to several practical data sets for severity analysis.
Section 8 presents the discussions. Finally, we conclude this article in
Section 9.
3. Data Collection
To illustrate and assess the usefulness and limitations of the existing datasets for HTA analysis, we collected and analyzed data and information on the accidents by road transportation with Levels II to V over the past 15 years (2004–2018). Generally, accidents can be divided into five levels, which are Levels I to V based on the casualties or/and economic losses according to the Byelaw Governing Reporting, Investigation, and Handling of Production Safety Accidents of China. The classified five levels as follows: (1) Level I means no deaths and serious injuries; (2) Level II refers to an accident that leads to less than 3 deaths, or less than 10 serious injuries, or direct economic losses of less than 10 million yuan; (3) Level III means an accident that causes the death of more than 3 people, but less than 10 people, or the serious injury of more than 10 people but less than 50 people, or the direct economic losses of more than 10 million yuan but less than 50 million yuan; (4) Level IV means an accident that causes more than 10 but less than 30 deaths, or more than 50 but less than 100 serious injuries, or direct economic losses of more than 50 million but less than 100 million yuan; (5) Level V means an accident that causes more than 30 deaths, or more than 100 serious injuries, or a direct economic loss of more than 100 million yuan. Among them, Level I is the lowest level without casualties, and Level V is the severest one with serious casualties or/and economic losses. However, as of now, there is no public and free database of HTAs that can offer complete information covering the required spatial, temporal, and detailed contexts of accident descriptions, making it difficult to obtain relevant statistical data. In the study, we review and extract the useful information regarding HTAs by road mainly from the following resources, specifically several official websites (e.g., the Ministry of Transport, the Ministry of Emergency Management, and the Ministry of Environmental Protection of China), useful data and information collected from professorial websites (e.g., the National Registration Center for Chemicals, the China Chemical Safety Association of China and State Administration of Work Safety, and safety management), reports from journal publications (e.g., Chemical Safety and Environment), and other online news. Moreover, since some accident records are brief and incomplete, we also additionally searched the internet for complementary information item by item.
Accordingly, HTAs were identified from more than 13,000 road chemical accidents from the above of mentioned sources. Only accidents of Levels II to V, for which the number of deaths is known, were included in the analysis. Finally, a total of 371 HTAs were singled out for review, all of which occurred during transportation, excluding the accidents that occurred during loading, unloading, and maintenance. Information on each HTA record is given concerning the hazmat classes and species of hazmat, United Nations (UN) number, the locations, time, date, types, impacts (leaking, non-leaking, fire, explosive, poisonous, corrosive, and others), and consequences (e.g., levels, casualties, and the direct economic losses to property), as well as specific detailed descriptions of the accident. In detail, the accident occurrence probabilities for Levels II to V are shown in
Figure 1. It is shown that the accident level is negatively related to its proportion, the higher the accident level, the lower the occurrence. The proportion of accidents at Level II was the highest, accounting for 75.47% of HTAs. Particularly accidents belonging to Levels IV and V account for less than 5% in total.
4. Methods
The study introduces multiple analytical methods to support the development of a comprehensive research framework for the accident characteristics, causes, and severities analysis, so that adaptive management policies can be implemented. The method used in the study includes the statistical analysis, the
k-means clustering, the decision tree model, and the F-N curve method associated with the ALARP principle. The proposed analytical framework with the integration of research methods is summarized in
Figure 2.
In detail, in the characteristics of the HTAs section, the data analysis method based on the statistics and probability theory was applied to explore the distribution of hazmat classes, causes, and frequencies of HTAs, which has been widely used in the existing literature [
8,
9]. The
k-means clustering method, serving as a prototype of the cluster that can partition the input data set into
k partitions (clusters), was introduced for spatial characteristics analysis. It addresses the multiple socio-economic factors that influence the number of accidents for each province/city, and thus, can help the decision-makers understand the importance of managing serious accidents based on the spatial characteristics of HTAs. Then, the decision tree model, as a suitable machine learning method, was used for illustrating the complex relationship existing in the cause–consequence analysis compared with conventional methods (i.e., statistic regression models and Bayesian network). Although it has been reported in the road safety field, the applications in examining the key influencing factors in relation to the consequences for hazmat road transportation accidents is still somewhat limited [
28,
29]. The study is an extension of the existing accident analysis. In the severity analysis section, an enhanced F-N curve was adapted under uncertainty by presenting the relationship between the accumulated probability and the number of deaths. As for the other severity measurement indicators, such as casualties, property damage, or environmental impact, it may be difficult to employ cost–benefit analysis because it is not possible either to identify in advance or to value all the major consequences of accidents. Moreover, the existing applications of F-N curves rarely consider the inherent uncertainty in the database, which may influence the F-N curve formulation associated with a social risk evaluation and lead to failures during decision making. The ALARP principle allowed us to evaluate the relevant tolerability risks under different scenarios and make the result of severity evaluation practical significance [
36,
38].
6. Decision Tree-Based HTAs Cause–Consequence Analysis
In order to understand the correlation among the hazmat properties, accident characteristics, and the consequences of HTAs, the decision tree method was used to identify which characteristics define the specified groups, and to outline which specific variables have the most substantial impact on group differentiation. A decision tree is a tree-like collection of nodes intended to create a decision on values affiliation to a class or an estimate of a numerical target value. Each node represents a splitting rule for one specific attribute. For classification, this rule separates values belonging to different classes. Generally, the chi-squared automatic interaction detection (CHAID) analysis is one way to determine how variables combine to determine group membership [
23]. It is based on the target variable (independent variable; the independent variables of CHAID can only be classified) self-hierarchical tree structure. The root node is the dependent variable. The predictors continuously generate parent and child nodes based on the degree of chi-square significance. The higher the chi-square significance, the more variable the predictor becomes the root node. The CHAID operator provides a pruned decision tree that uses a chi-squared-based criterion instead of information gain or ratio criteria gain, which is useful when the various existing predictive variables are classification ones. In the study, the consequence level of accidents was the dependent variable, and hazmat class, the accident type, the time and the cause of accidents, and road level were independent variables. The variables, with the exception the time, were all specified as classification variables. As some accidents may contain more than one class of hazmat, the study separated different classes into multiple accident cases. For example, if an accident contained two classes of hazmats and resulted in an accident consequence of Level II, it was separated into two accidents of Level II with different inputs of hazmat classes. Thus, the accident inputs at the root of the decision tree were more significant than the actual number of HTAs. The detailed classification of independent variables prepared for modeling the decision tree is presented in
Table 2. The last column of the table provides the input number of accident cases; in particular, the HTAs containing two or more hazmat classes have been separated into multiple cases.
To interpret the complex relationship among multiple variables, the decision tree model was formulated consisting of four subgraphs, as shown in
Figure 11. Considering the accident separation, a total of 569 accident cases could be input at node 1. Among them,
Figure 11a represents the root and the first-layer branches of the decision tree. It is shown that the root node is split into three branches based on the causes of accidents in the study. Then, the first-layer branches perform as new root nodes which leads to more branches for further elaboration of the relations between the causes and the consequences of the accidents.
Figure 11b–d are respectively the 1st, 2nd, and 3rd branches generated from the root of the decision tree.
Taking node 2 generated from the root shown in
Figure 11a as an example, node 2 represents the accidents caused by equipment malfunction, with a total of 44. The decision tree then classified the specific types of accidents 1, 7, 8, 9, 10, and 11 to node 7 in
Figure 11c, which represents the number of accidents caused by factor 3, that is, equipment malfunction As shown in
Figure 11c, based on the data of HTAs that belong to the specific types and the corresponding proportion of different consequence levels, there was a 5.6% chance that can lead to the specific types of accidents at node 7, that is, collision with fixed objects, multivehicle collide, rollover, run-off-road, tank and safety accessory failure and vehicle-related defect. If it happens, there was an 87.5%, 9.4%, and 3.1% chance for Level II, III, and IV accidents occurring, respectively.
Figure 11c also displays that the specific types of accidents 4, 6, and 12, that is, two-vehicle head-on, two-vehicle rear-end and other specific types, were classified to node 8, representing a chance of 2.1% that can result in the specific type of accident at the node. If it happens, the probabilities were 16.7%, 16.7%, and 66.6% for Levels II, III, and IV of accidents, respectively.
Moreover, a confusion matrix (
Table 3) was used to visualize the performance of cause–consequence analysis by using the decision tree, which displays whether the system is confusing two classes (i.e., commonly mislabeling one as another). For example, there were 399 correct predictions in the first line; meanwhile, nine cases were incorrectly predicted to other levels. Thus, the correct prediction percentage was 97.8%.
Table 3 shows that the overall accuracy evaluation of the results was 75.2%, indicating that the decision tree model resulted in a relatively satisfying interpretation of the relationship among the hazmat properties, accident characteristics, and the consequences of HTAs.
7. HTA Severity Analysis Using an Enhanced F-N Curve Under Uncertainty
For identify the severity of HTAs, an F-N curve was used to describe the relationship between cumulative probability and the number of deaths, which can be provided as follows [
32]:
where
is the probability of more than
x fatalities;
is the probability density function of the number of fatalities; and
is the probability distribution function of the number of fatalities, signifying the probability of less than or equal to
x fatalities.
In real-world practice, the probability of accidents is usually replaced by the frequency, and the accidents are grouped according to the number of deaths. Thus, the F-N curve can be presented as:
where
N is the number of fatalities,
n is the total number of groups, and
Ni is the number of accidents in group
i, and
F is the cumulative frequency of
N or more fatalities.
Let
C be the constant that determines the limit line, and
a be the steepness of the limit line. Thus, the F-N criterion curve can be presented as follows:
On a log-log basis, the equation is derived as follows:
Generally, the F-N curve is displayed as a straight line with a precise slope of −
a. The slope values are different in various accident conditions, and the number of deaths ranges in the broader interval; using a precise slope may underestimate or overestimate the social risks of accidents. To overcome the problems, the F-N curve under uncertainty can be proposed with the slope of −
a obeying a certain type of distribution. It will help the authorities to evaluate the risks under different accident levels comprehensively. Therefore, to better present this distribution in quantification and visualization, a method for specifying the F-N curve under uncertainty was developed for the best-fitted curve selection using the dataset of HTAs, which was mainly under the considerations of the normal, improved normal, and power-law distributions.
Table 4 presents the data for F-N curves fitting under uncertainty. More details are presented as follows.
- (a)
The probability density function for the normal distribution is present as follows:
where µ is mean value and σ is the standard deviation.
On a log–log basis, the cumulative probability
and
of the death toll in HTAs associated with the logarithmic values are calculated to specify the distribution of the slope with the corresponding mean value µ and standard deviation σ.
Figure 12a shows the fitting of the normal distribution according to the database. From the confidence interval and the Equation (4a), the ratio of 10 or more accidental deaths and 100 or more accidental deaths ranged from 14.56–33.25.
- (b)
The improved normal distribution. Concerning the traditional deficiencies, an improved normal distribution is proposed as follows:
where
B and
D are adjusting parameters that can be obtained through the fitting process.
Similarly, the mean value µ and the standard deviation σ, as well as the distribution function of slope −
a can be obtained through data fitting, as shown in
Figure 12b.
- (c)
The power-law distribution. Based on Equation (3b) and
Table 4, we obtained the inexact F-N curve associated with the parameter intervals of −
a and C under the 95% confidence level.
The detailed fitting parameters of F-N curves are listed under different distributions in
Table 5. According to
Figure 12a,b, the distribution of slope −
a can be obtained. In detail, the distribution of −
a under the normal distribution is presented as follows:
and under the improved normal distribution, it is presented as follows:
Moreover, the F-N curve under the power-law distribution is presented with 95% confidence levels, as follows:
It is shown that the R2 (i.e., coefficient of determination) obtained under the distributions of normal, improved normal, and power-law were 0.6422, 0.89952, and 0.9936, respectively. Moreover, we also compared the root-mean square error (RMSE) values that were used to measure the deviation between the projected values from the fitting curves and the observation values. The RSME values were 1.0921 under the normal distribution, 0.02306 under the improved normal distribution, and 0.0197 under power-law distribution, respectively. It is indicated that the fitting curve based on the power-law function was the best in performance from the perspectives of both R2 and RMSE.
The social risk of HTAs was further assessed using the proposed enhanced F-N curve under uncertainty, reflecting both the direct influence on casualties and the long-term harm to society. The social risk is usually evaluated based on the socially acceptable risk standard, such as the ALARP principle. For example, the ALARP principle divides the area that is formulated by the coordinate axes of casualty number and the associated occurrence frequency into the unacceptable risk, tolerable risk if ALARP, and broadly acceptable risk zones by the tolerable and acceptable lines [
43]. Therefore, through assigning values to
a and
C in Equation (3a), the social risk of HTAs could be assessed with the utilization of the enhanced F-N curve. The tolerable and acceptable lines based on the F-N curve were obtained through deciding the slope of −
a in Equation (3b), with
a = 1 or 2 from existing studies [
37]. Considering the influences of the HTAs, let
a = 1 to reflect the neutral risk situation. The values of
C in the equation represent the Y-intercept of the tolerable and acceptable lines, respectively. As for the tolerable line,
b = 0.1 is taken in the study. Accordingly, the acceptable line was set along with the tolerable one according to the ALARP principle, and the Y-intercept was 1% of that from the tolerable line that was
b = 0.001. The detailed unacceptable, tolerable if ALARP, and broadly acceptable risk zones associated with the tolerable and acceptable lines are displayed in
Figure 13.
The F-N curve under uncertainty could be developed under the power-law distribution for evaluating the severity of accidents based on the social risk criterion using the method proposed in the previous section. Besides the database used in the study, the other seven groups of data were digitized from references [
8,
9]. Among them, groups of (b) to (f) were from survey data of 1932 accidents (including 242 fatal accidents) that occurred during the land transport of hazardous material from 1931–2004 over 95 countries. Thus, a total of eight groups of data were used to help understand the applications of the proposed F-N curve method for severity analysis. The eight groups of data were as follows: (a) 371 HTAs from 2004–2018 in China; (b) 1217 HTAs on road in 95 countries; (c) 715 HTAs on the railway in 95 countries; (d) HTAs in the United States, Canada, Australia, Japan, New Zealand, and Norway; (e) HTAs in the European Union; (f) HTAs in the rest of the world with the exception of the United States, Canada, Australia, Japan, New Zealand, Norway, and the European Union; (g) survey data of 322 hazmat accidents from road transport from 2000–2008 in China; and (h) 217 fatal HTAs from 2013–2017 in China. The enhanced F-N curves under uncertainty associated with the lower and upper 95% confidence intervals, as well as the tolerable and acceptable lines, were obtained from the above data sets shown in
Figure 14.
On a log–log basis of the power-law distribution, none of the F-N curves obtained fell into the acceptable risk zone that ranged below the acceptable line, as can be seen from
Figure 14. It is indicated that the severity of the HTAs by road was relatively high for all the investigated cases. When a curve exists in the acceptable risk zone, it demonstrates that the corresponding risk may have an acceptable impact on the whole society, and there is no need to take extra measures to mitigate the risk generated by the HTAs. Furthermore, most of the curves appeared above the tolerable line, located in the unacceptable risk zone. It is indicated that the risks of HTAs had significant impacts on society, leading to immediate actions to control the related risks being needed. The risks of HTAs that occurred in China do not seem more severe than the other countries/regions based on the investigated data analysis with the layout positions. Specifically, the risk tended to be lower by the curve slope compared with
Figure 14g,h.
In
Figure 14, the black line represents the ordinary F-N curves, while the proposed method allows the local authorities to consider the risk uncertainty presented by the upper and lower 95% confidence intervals. The area between the tolerable and tolerance lines is the zone wherein the impact of risk is not unusually large or tolerable according to the aforementioned ALARP principle. However, it cannot be ignored, and is generally maintained at a level that is reasonably feasible with less social losses. The risk assessment of accidents can lead to different results considering the existing uncertainty. For example, as for
Figure 14a, the risk of HTAs would be acceptable when the number of deaths exceeds 100 based on the ordinary F-N curve (i.e., the black line). The acceptable risk could be over 800 fatalities from a pessimistic attitude. On the contrary, according to an optimistic attitude, the risk of accidents with more than 40 deaths would be already small. Similarly, the risk could be acceptable when the fatalities are more than 40 for
Figure 14g. As shown in
Figure 14h, the optimistic risk assessment could be acceptable when fatalities exceed three. While making decisions with a pessimistic attitude, the risk would be relatively low when the number could exceed 18, based on the investigated data from 2013–2017 in China.
8. Discussion
The study highlights the need for the evaluation of accidents and continuous improvements in risk management for hazmat transportation. Thus, a comprehensive analysis framework for characteristic analysis, cause identification, and severity assessment was proposed and applied to a database of 15 years of fatal accidents of Levels II to V, which accounted for 14% of the total accidents caused by hazmat road transportation. Although the research mainly focused on the study case of China, the results suggest that the proposed methodology framework consisting of characteristics, causes, and severity analysis methods for the HTAs can offer the opportunity to be applied to many other safety management practices. For example, as for the HTA database of a specific country (or a region), the characteristics of HTAs can be analyzed first to obtain the necessary information, such as frequency, temporal and spatial trends, as well as a consequence. Then, the decision tree-based cause–consequence analysis method can be used to illustrate the complex relationship between causes and consequences. Through the severity analysis, decision-makers can be aware of HTA risk levels of the country (or region) by combining the ALARP principle with an enhanced F-N curve, while taking into account uncertainty. According to the severity assessment, if the social risk of HTAs is not acceptable, stricter policies for hazmat transportation management are desired to be implemented based on the characteristics and cause–consequence analysis. It can make policies more rational, as emphasis is placed on the management of critical aspects/factors that have more significant impacts on the consequences of HTAs. Otherwise, the current policies may not necessarily be adjusted.
Concerning the inherent uncertainty in severity analysis, the study proposed a method for establishing an enhanced F-N curve. Thus, the possible severity of hazmat accidents by road transportation could be evaluated under certain confidence levels, which has been ignored in previous research. When the F-N curve locates in the unacceptable risk zone, it means immediate and strict preventative measures are required, and the measures can be inspired and incorporated by the study findings from the characteristics and causes analysis of HTAs. In contrast to the conventional deterministic F-N curves, the study provides the empirically supported theoretical basis for severity analysis under various scenarios.
From the perspective of data analysis, as the application of machine learning techniques is desired and challenging in the field of HTA, multiple machine learning methods are introduced into the proposed characteristics–cause–severity analysis method framework for hazmat transportation. The
k-means clustering approach was used for examining the spatial characteristics. The decision tree method was employed for revealing the complex relationships among the accident consequence and causes in terms of the key influencing factors of HTAs. For the method selection of the cause–consequence analysis, several methods proposed by literature were compared. In the study, the multinomial logit model was also tested for cause–consequence analysis purposes. Unfortunately, only the prediction accuracy of Level II was above 50%, and those of the other three levels were less than 5%. The significant result of the likelihood ratio was above 0.1, which indicates that the model cannot pass the test. The poor performance of the multinomial logit method could be attributed to the strong assumptions (i.e., the linear form of the utility function and the distribution assumption of errors), which make it unacceptable while applying to the HTA database. One of the aims of the study was to propose an accurate and non-biased method capable of HTA prediction. The Bayesian network was not suitable for the aim and the investigated data set due to a great deal of conditional probability estimation work and potential expert scoring. For the method assessment, the accuracy associated with the confusion matrix was introduced to measure the percentage of cases correctly classified by the classifier. In the investigation from existing studies, the prediction accuracy was acceptable when it exceeded 50% [
28,
29,
44]. Thus, the decision tree model proposed in the study achieved a relatively satisfying performance in the cause–consequence analysis.
There are still some limitations of the study in the following aspects. First, HTA-related studies require high-quality information, but the specialized databases are not yet available for the public; thus, more complete HTA data can provide a comprehensive understanding of the status of hazmat transportation management. Second, although the result of the decision tree-based cause–consequence analysis was acceptable, improvements on data information and modeling methods may increase the accuracy. Third, multiple sources of databases in various countries regarding HTAs are still required to help enhance the proposed method in real-world applications. Therefore, enriching HTA databases, proposing more advanced analysis techniques, and improving the existing methods are desired for better conducting the characteristics, cause, and severity analysis for hazmat transportation risk management in the future.