Next Article in Journal
A Realistic and Integrated Model for Evaluating Offshore Oil Development
Previous Article in Journal
Recognition and Depth Estimation of Ships Based on Binocular Stereo Vision
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Fusing XGBoost and SHAP Models for Maritime Accident Prediction and Causality Interpretability Analysis

1
School of Law, Fuzhou University, Fuzhou 350116, China
2
School of Electrical Engineering and Automation, Fuzhou University, Fuzhou 350108, China
*
Author to whom correspondence should be addressed.
J. Mar. Sci. Eng. 2022, 10(8), 1154; https://doi.org/10.3390/jmse10081154
Submission received: 5 July 2022 / Revised: 18 August 2022 / Accepted: 19 August 2022 / Published: 20 August 2022
(This article belongs to the Section Ocean Engineering)

Abstract

:
In order to prevent safety risks, control marine accidents and improve the overall safety of marine navigation, this study established a marine accident prediction model. The influences of management characteristics, environmental characteristics, personnel characteristics, ship characteristics, pilotage characteristics, wharf characteristics and other factors on the safety risk of maritime navigation are discussed. Based on the official data of Zhejiang Maritime Bureau, the extreme gradient boosting (XGBoost) algorithm was used to construct a maritime accident classification prediction model, and the explainable machine learning framework SHAP was used to analyze the causal factors of accident risk and the contribution of each feature to the occurrence of maritime accidents. The results show that the XGBoost algorithm can accurately predict the accident types of maritime accidents with an accuracy, precision and recall rate of 97.14%. The crew factor is an important factor affecting the safety risk of maritime navigation, whereas maintaining the equipment and facilities in good condition and improving the management level of shipping companies have positive effects on improving maritime safety. By explaining the correlation between maritime accident characteristics and maritime accidents, this study can provide scientific guidance for maritime management departments and ship companies regarding the control or management of maritime accident prevention.

1. Introduction

In recent years, along with the development of the global economy, the trend of growth in the size of ships has accelerated. The tonnage of ships and the number of boats have increased significantly, making the density of vessels at sea higher and higher, and the maritime environment at sea increasingly complex, which has led to an increase in the possibility of ship accidents. Ensuring the safety of maritime navigation and preventing significant accidents on ships have always been the focus of attention in maritime operation. The International Maritime Organization (IMO) Global Integrated Shipping Information System (GISIS) statistics show that: from 2010 to 2019, a total of 11,671 maritime accidents happened worldwide, including casualties, ship damage, groundings, collisions and marine environmental pollution, of which 2135 were severe incidents [1], as shown in Figure 1. The International Association of Dry Cargo Shipowners (INTERCARGO) reported that 27 large bulk carriers with over 10,000 DWT were identified as total losses between 2012 and 2021, which caused 92 crew member deaths and a total carrying capacity loss of 2.3 million DWT, with a significant impact on global maritime transportation capacity [2]. Despite the effect of the COVID-19 pandemic and the large decline in maritime vessel traffic, the latest statistics released by the European Maritime Safety Agency (EMSA) showed that there were still 2837 maritime casualties in Europe in 2020, with 675 deaths [3].
The occurrence of maritime accidents can cause casualties, severe damage to the marine environment and considerable losses to the economy [4]. As a means of transportation, elements related to people, ships, the environment and management are intertwined during the operation of a boat, and these related elements may work together to seriously threaten the lives of crew members or cause damage to the ship’s property [5]. As the means of transportation with the largest capacity and lowest average cost, ships effectively communicate with the global trading system. Nevertheless, the maritime environment is intricate and complex, and is constantly affected by the natural environment, such as wind and waves, which are crucial to ship safety [6]. The combination of these complex factors mentioned above is bound to exacerbate the challenges of maritime safety and can also create confusion for safety managers amidst the countless risk factors. Thus, scientific identification and assessment of the risks suffered by ships during navigation, especially the monitoring of key risk factors, is of extraordinary significance for the prevention of maritime accidents, as well as an essential reference for the authorities’ safety supervision of ships at sea.
Marine vessels, enormous oceangoing vessels of 100,000 DWT, play a vital role in the trade economy. However, because of the uncertainty of maritime risks, maritime accidents frequently occur, which not only involve the economy but are also closely related to family happiness. Therefore, achieving an accurate assessment of ship risks has become an issue actively explored by scholars. In previous studies, experts have mainly used accident probability calculation analysis [7], hierarchical analysis [8], fuzzy analysis [9], integrated safety assessment [10], Markov chain modeling [11] and econometric model analysis [12] to consider the risks of ships and assess their essential risk-causing factors. However, these research approaches may be too subjective, missing some key data information or analyzing only a relatively simple problem, and cannot achieve an objective and comprehensive in-depth consideration of maritime accidents.
In recent years, with the rapid development of information technology and acceleration of the use of big data, the study of model extension methods combined with Bayesian networks has gradually become the primary method used in maritime accident research. Li et al. [13] constructed an environment–human Bayesian network and a ship–human Bayesian network. The study discussed the consequences of collisions caused by different types of human error and tried to solve the small sample problem in conditional probability table estimation, especially to fully examine the influence of external factors on human errors in the case of insufficient collision records. Wu et al. [14] proposed a novel Bayesian network-based emergency decision model with a three-level decision framework for mitigating the consequences of single-ship collisions in the Yangtze River. Fan et al. [15] used a Bayesian network (BN) model to investigate the effects of PSC inspections on ship accidents and the time interval between reviews of the ship’s inherent properties. The ship safety rating was determined on the basis of the number of defects in the PSC inspection and the risk of ship accidents. Maceiras et al. [16] examined 163 ship accidents in Spain, and the detailed combined ANOVA analysis and Bayesian network results showed good agreement with studies from other regions, but with some specificity in the type of accident for each analysis. Jiang et al. [17] proposed a Bayesian network (BN)-based risk analysis method for maritime transport safety along the Maritime Silk Road (MSR) in the 21st century and used the findings to analyze the probability of each possible type of maritime accident along the MSR, providing useful insights to ship owners for accident prevention. Wu et al. [18] developed a data-driven Bayesian network by analyzing a sample of 132 fire accidents, deriving quantitative and qualitative relationships among the influencing factors using mutual information and expectation maximization algorithms, respectively, and also analyzing the marginal probabilities to discover the impact of the influencing factors on the consequences. However, in these studies, the subjective factors of human beings were the primary consideration during model construction, effective data may have been screened, each risk factor was treated separately when using Bayesian networks for maritime accident analysis, which may have caused some deviation from real maritime accidents, as no accident can be made exactly like the model.
Machine learning has become a hot research topic for solving complex problems in engineering applications and scientific fields, and various algorithms have been widely used in a range of fields related to risk assessment such as COVID-19 infection diagnosis [19], marine biocide [20], systemic risk in the financial sector [21], climate change [22] and the fairness of criminal justice [23] to restore the authenticity of data in model construction. Among the different methods, the extreme gradient boosting algorithm (XGBoost), as an improved integrated learning method, has the advantages of good ability to handle sparse data, the high interpretability of results and support for parallel computing [24,25,26]. These advantages are also very suitable for considering the multi-factor risks of ship safety in maritime accidents. Explainable frameworks are also a hot research topic in several scientific fields, and foreign studies have started to use the explainable machine learning framework SHAP technique to explain how feature variables within complex machine learning models affect predictions, and the technique is worth applying for research into the safety risk dynamics of maritime traffic accidents [27,28,29].
In this study, we expanded the information of management, environment, personnel and ships on an item-by-item basis for the collected maritime accident data, and established an experimental dataset focusing on analyzing the impact of the factors of different accident types on ship safety. The pioneering model was applied with machine learning methods in a large data sample environment, with the aim of analyzing the key risk factors affecting maritime navigation safety more objectively and comprehensively, and contributing to the prevention of major accidents on ships, enhanced personnel safety and security and the reduction of property losses.

2. Database

2.1. Data Sources

In 2021, the cargo throughput of ports in Zhejiang Province of China was 1.93 billion tons, with an annual increase of 4.0%, including 1.49 billion tons in coastal ports, an increase of 5.4%. Ningbo Zhoushan Port had a cargo throughput of 1.22 billion tons and has recorded the greatest volume in the world for 13 consecutive years, a container throughput of 31.08 million TEU, and had the third highest throughput in the world for the fourth year running, after the ports of Shanghai and Singapore, which are in the “30 million club” of ports [30]. The number of ships entering and leaving the port in Zhejiang Province was 2.1407 million in 2017, 2.1766 million in 2018, 2.2586 million in 2019, 2.3772 million in 2020 and 2.6466 million in 2021, showing an increasing trend year by year [31]. However, maritime traffic accidents are causing significant damage to the shipping economy and navigable environment. In 2021, for example, 24 maritime accidents of general grade and above occurred in Zhejiang’s jurisdiction, resulting in 51 deaths and disappearances (26 deaths and disappearances of merchant ships and 25 deaths and disappearances of fishing vessels). Eighteen ships sank, which caused a direct economic loss of about CNY 45.3 million [32].
The main data of the initial dataset were taken from the maritime accident records of the Zhejiang Maritime Safety Administration of the People’s Republic of China in the past 10 years, but the data features in the reports are not obvious and the dimensional scale is not ideal. In order to carry out dimensional expansion, this study extracted 15 new features from the accident reports of the Maritime Bureau using the polynomial method, such as fishing vessel crew grabbing the bow and inadequate assessment and training drills, and extended 13 artificial features to obtain an initial dataset with 47 data features and 105 data samples in total. A map that shows the region of the port where the data were obtained and where the accidents occurred is shown in Figure 2. The classification of the features and the symbols are shown in Table 1. All the data come from the maritime accident reports published on the official website of Zhejiang Maritime Bureau, and the authenticity and reliability of the data can be guaranteed.

2.2. Statistical Analysis

As shown in Figure 3, the proportions of 105 maritime accidents in the past 10 years issued by the Zhejiang Maritime Safety Administration of the People’s Republic of China were calculated. Among them were 64 collision accidents, accounting for 60.95% of all accidents; 6 grounding accidents, accounting for 5.71% of all accidents; 9 allision accidents, accounting for 8.57% of all accidents; and 2 fire and explosion accidents, accounting for 1.90% of all accidents; 18 capsizing accidents, accounting for 17.14% of all accidents; the remaining 6 were other accidents, accounting for 5.71%. As can be seen from Figure 3, the probability of a collision accident is the highest, which may be because there are many islands and reefs in Zhejiang’s marine area, and the ships are greatly affected by fog, typhoons and gales all year round. When sailing in this sea area, ships face very complicated situations. Some vessels violate the International Convention on the Rules of Collision Avoidance by Sea of 1972, and some fishing vessels still follow the habits of fishermen and take the right of navigation for granted. The ships are in poor technical condition, and some of them have not been maintained and overhauled for a long time. The overall quality of the crew needs to be improved, navigation skills and water safety regulations are lacking, and the navigational aids are seldom play used to best effect [33].
The characteristics of 105 maritime accidents were calculated, with a total frequency of 749 for all features, among which ‘improper operation’ appeared the most frequently (72 times). This shows that ‘improper operation’ is a frequent error in accidents and attention should be paid to this. This was followed by ‘poor lookout’, which appeared 64 times, and ‘inadequate training and drills’, which appeared 49 times. The frequency of occurrence of the specific features is shown in Figure 4 (showing the top 23 features in terms of frequency).

3. Methodology

With the database described above, the accident classification prediction model of maritime accidents was constructed based on the XGBoost algorithm, considering various elements such as crew factors, ship factors, company management factors and environmental factors. In addition, the SHapley Additive ExPlanations (SHAP) methodology was applied to the trained model to resolve the combined effect of feature variables on the accident classification prediction values. The technical pathway is shown in Figure 5.

3.1. XGBoost Model

The XGBoost algorithm is an artificial intelligence integrated machine learning algorithm proposed by Chen et al. [34]. It has the advantages of fast parallelism, controlled complexity, fault tolerance and strong generalization ability. The algorithm is an iterative synthesis of multiple decision tree models with low accuracy into a high-precision strong learner. Second-order Taylor expansion is used, and a regular term is added to the loss function to control the model’s complexity and prevent model overfitting.
The objective function of the XGBoost algorithm is:
O b j ( t ) = i = 1 n l ( y i , y i ( t 1 ) + f t ( x i ) ) + Ω ( f t )
Ω ( f t ) = γ ψ + 1 2 λ ω 2
where O b j ( t ) is the objective function after several iterations, l ( y i , y i ( t 1 ) + f t ( x i ) ) is the selected training loss function, y i is the true value of the ith sample, y i ( t 1 ) is the predicted value of the ith sample after t − 1 iterations, f t ( x i ) is the decision tree function of the sample x i after t iterations of training. Ω ( f t ) is the complexity function of the tth iteration, γ is the complexity parameter, ψ is the number of leaf nodes, λ is the regular term penalty factor and ω is the weights of the leaf nodes.
With second-order Taylor expansion, the objective function O b j ( t ) is expressed as follows:
O b j ( t ) = i = 1 n l ( y i , y i ( t 1 ) + g i f t ( x i ) + 1 2 h i f t 2 ) + Ω ( f t ) + C
where g i and h i are the loss gradients of the output under the first and second order of the ith sample, respectively; C is a constant term.

3.2. Interpretable Machine Learning Framework SHAP

Since machine learning models are inherently black-box models, they do not allow humans to understand the process of machine learning. Explainable machine learning is an algorithm or model that makes the behavior and predictions of a machine learning system understandable to humans. Lundberg developed the unified framework SHAP (short for SHapley Additive exPlanation) for the post-hoc explanation of machine learning methods. The model generates a prediction value for each test sample and provides an explainable prediction. The main idea is to calculate the marginal contribution of the features added to the model, i.e., the SHAP value, which is equivalent to the impact of the features on the sample. In cooperative game theory, the SHAP value is calculated as follows:
Φ m = L N { m } | L | ! ( M | L | 1 ) ! M ! [ v ( L { m } ) v ( L ) ]
where Φ m is the contribution of the mth feature, L is the feature subset, N{m} is the feature set, M is the total number of input features, v (L ∪ {m}) is the predicted value of the model when the sample has only the feature values in L ∪ {m} and v (L) is the predicted value of the model when the sample has only the feature values in L. In line with the additive eigenproperty approach, the linear function g is defined as:
g ( x ) = Φ 0 + m = 1 M Φ m x m
where g(x) is the explained model prediction for sample x, Φ m is the mean of the model prediction and xm is the mth feature sample.

4. Results and Analysis

The process of constructing the accident classification prediction model for maritime accidents included feature variable selection, parameter tuning and model evaluation, as shown in Figure 6.

4.1. Feature Variable Selection and Parameter Tuning

The 47 feature variables shown in Table 1 were used as the data collation criteria for each maritime accident. The corresponding variable features were marked as “1” and “0” for each accident, producing 105 sample data for training the XGBoost model. The target variables of the XGBoost model were six types of maritime accidents, namely collision accidents, grounding accidents, allision accidents, fire and explosion accidents and capsizing accidents. The XGBoost algorithm has good tolerance of multicollinearity among the features, and therefore did not require extensive preprocessing. The dataset was randomly divided into a training set and a test set with the ratio of 8:2, with the training set used to fit the prediction model and the test set used to evaluate the model performance.
The purpose of parameter tuning is to effectively control the model’s complexity and prevent overfitting, thus improving the model’s performance. The optimal prediction model was constructed using grid search and a five-fold cross-validation method. The five-fold cross-validation randomly divided the training data into five parts, and each training round was conducted in the form of four subsamples for training and one subsample for validation. After several tests of parameter tuning to determine the optimal results, the optimal parameters of XGBoost were determined, as shown in Table 2.

4.2. Feature Variable Selection and Parameter Tuning

To evaluate the performance of the prediction model, a confusion matrix was selected to visualize the model prediction results, and the accuracy, precision and recall were calculated as the evaluation indices of the prediction results. For multi-category prediction, the weighted average of each evaluation index in each category was calculated according to the proportion of the dataset occupied by each category. The evaluation metrics are defined as follows:
A c c = T P + T N T P + T N + F N + F P
P r e = T P T P + F P
R e c = T P T P + F N
where Acc is the accuracy rate, Pre is the precision rate, Rec is the recall rate, TN denotes the number of samples that are actually negative and are predicted to be negative, TP denotes the number of samples that are actually positive and are predicted to be positive, FN denotes the number of samples that are actually positive but are predicted to be negative and FP denotes the number of samples that are actually negative but are predicted to be positive.

4.3. Analysis of the Model Results

4.3.1. Analysis of the XGBoost Classification Prediction Results

  • Evaluation of the Prediction Accuracy Results
The risk prediction confusion matrix of the XGBoost model for six types of maritime accidents is shown in Figure 7. In addition, this study applied a random forest algorithm (RF) and a logistic regression algorithm (LR) to compare the results with those of XGBoost; the results are shown in Table 3. The results show that the maritime accident classification prediction model based on the XGBoost algorithm has higher prediction accuracy compared with RF and LR. RF is unsuitable for model training with a small sample of data, while LR has the disadvantage of multicollinearity.
2
Analysis of Feature Importance
The degree of importance of the features can be evaluated by the weight index in the XGBoost algorithm package, and a feature importance ranking was obtained, as shown in Figure 8. The more important the feature is in the decision tree, the higher the feature importance score is. In Figure 8 (showing the top 23 important features), it can be seen that the five most important features from most to least important are ‘poor lookout’, ‘nautical chart not marked’, ‘crew in poor condition’, ‘unreasonable loading’ and ‘poor psychological quality’.

4.3.2. SHAP Interpretability Analysis

The feature importance of the XGBoost classification prediction model characterized the importance of the features within the construction of the classification prediction model, and was positive. Unlike feature importance in the XGBoost model, the SHAP interpretable framework is analyzed in terms of the contribution of the features to the model output, which can be negative or positive. The contribution of each incident feature to the incident category can be ranked quantitatively by the average SHAP value (|SHAP|), and the trend of each feature’s contribution to the category can also be visualized and plotted by the SHAP value.
  • |SHAP| Value Analysis of the Features
Figure 9 (top 20 features) shows the average SHAP value of each feature within the model output, where: 0 indicates collision accidents, 1 indicates grounding accidents, 2 indicates allision accidents, 3 indicates fire and explosion accidents, 4 indicates capsizing accidents and 5 indicates other accidents. The different colors indicate the degree of influence of each type of factor on different maritime accidents. For example, in the first row of the figure, ‘poor lookout’ has a |SHAP| value of 1.37 for ‘collision accidents’, which has the largest |SHAP| value compared with other accident types, indicating that this factor (poor lookout) of ‘collision accidents’ has a more significant effect on the probability of this type of accident occurring. Among these 20 significant features, 9 are related to personnel factors, 3 are related to the ship itself, 3 are related to the management of the ship’s company and 5 are related to environmental features. Of the 20 factors, crew features (poor lookout, misoperation and negligence of good sailing practice), environmental features (weather conditions, navigational environment) and ship features (equipment, maintenance and cargo) are the most important features affecting the probability of maritime accidents.
The occurrence of maritime accidents, in addition to force majeure causes, is inseparable from the crew. In fact, as early as more than 30 years ago, the importance of the human factor in the occurrence of maritime accidents was recognized. In 1993, the U.S. Coast Guard (USCG; Washington, DC, USA) reported that approximately 80% of maritime accidents and near misses were essentially caused by human factors, and it has since been widely recognized that human factors play an important role in a significant number of accidents or disasters by triggering a chain of events [35]. However, it cannot be overlooked that the human factor in maritime accidents is usually combined with other external factors (e.g., marine conditions, weather conditions, channel traffic and vessel conditions) to influence safety procedures during navigation [36]. After all, if a ship is sailing on a calm sea without any other factors intervening, it is difficult to directly cause a maritime accident simply through operational errors or poor lookout, which was confirmed by the subsequent analysis of the characteristics’ SHAP values in this study. Inadequate training by shipping companies makes crew members more likely to be killed or injured when they fall overboard or fall into cargo storage, as they may not have developed adequate muscle memory [37]. The third item in Figure 8 shows that the more the ship is loaded in an unreasonable way, the more likely it is to cause a capsizing accident. This may be due to the characteristics of the cargo itself. During a voyage, the ship encounters the influence of surging currents and eddy currents, and if some goods have been improperly tied up and therefore shift under the influence of strong wind and waves, this leads to tilting of the ship. During the transportation of some goods, the cargo hold of the ship is not sealed, and the water in the cargo hold is fluid, resulting in the displacement of the free liquid level and the tilt of the hull, ultimately leading to the ship capsizing [38]. Consistent with the understanding of all maritime stakeholders, despite the vastness of the ocean, the speed at which a ship is moving has a significant impact on the occurrence of accidents. When a ship is at sea, the use of safe speed is an important factor to avoid accidents, which all maritime stakeholders should bear in mind and cannot ignore when at sea [39].
2.
SHAP Value Analysis of the Features
To further investigate the causal mechanism of maritime accident risks, interpretable machine learning was used to perform a causal analysis of the prediction results. The SHAP algorithm provides local interpretability and has significant advantages for explaining the impact of single or dual features on maritime accidents. In this study, collisions (accounting for 60.95% of all accidents), allision accidents (accounting for 8.57% of all accidents) and capsizing accidents (accounting for 17.14% of all accidents) were selected for analysis.
Figure 10, Figure 11 and Figure 12 show the SHAP summary graphs, and the horizontal SHAP value was used to measure the degree of contribution and the influence of the features on the model prediction. A positive SHAP value indicates that the feature aggravates the possibility of maritime accidents and worsens maritime safety; a negative SHAP value indicates that the feature reduces the possibility of maritime accidents and maintains maritime safety. The color bar on the right side of the vertical axis indicates the magnitude of the feature’s eigenvalue, and each row represents the SHAP value of the contribution of each feature to the model output. Each point represents one data record for each feature in each data sample.
(1)
Ship Collision Accidents
A comprehensive evaluation of the influence of each factor on ship collision accidents is shown in Figure 10. Among them, when the eigenvalues of ‘poor lookout’, ‘misjudgment’ and ‘failure to use safe speed’ are larger (red), the more positive the corresponding SHAP value is, indicating that these factors aggravate the possibility of a collision. The following three factors were used to analyze the specific causes of collision accidents: poor lookout, misjudgment and failure to use safe speed.
(i)
Unlike traffic accidents on land, the scope of what the eyes can see is relatively comprehensive. There are many uncertain factors at sea. Before an accident, the officer on duty could fail to make full use of all effective means that were suitable for the environment and conditions at that time, such as radar, AIS, etc., to maintain uninterrupted systematic observations. Such irregular operations will make the captain unable to know the situation of the surrounding sea in time, which will lead to chaos and a high risk of encountering maritime traffic.
(ii)
When sailing at sea, the ability to improvise is invaluable to an experienced captain. It is human nature to panic when encountering unexpected events and knowing how to avoid errors in judgment is a must for a captain to mature. Zhejiang’s navigation area, as the world’s busiest sea area, not only has a large number of merchant ships, but with its strong fishing industry, Zhejiang also has tens of thousands of fishing boats. For economic reasons, many small fishing boats without a license will choose to take risks during the closed season, which brings a huge risk of collisions during the normal navigation of a ship.
(iii)
Similar to traffic accidents on land, the use of safe speed for navigation when a ship is sailing at sea is also an important factor in avoiding accidents. As mentioned earlier, the density of ships in the Zhejiang navigation area is very high. As you can imagine, with such density, even if the crew members do their best, sometimes maritime accidents are inevitable. Ensuring that the ship’s equipment and facilities are in good condition can also make the captain more adept when navigating; otherwise, in an emergency, it will make the captain powerless.
(2)
Allision Accidents
Figure 11 shows a comprehensive evaluation of the influence of the factors on ship allision accidents. Among the most important are ‘misjudgment’, ‘insufficient crew’ and ‘failure to use safe speed’. The greater the value of these factors (red), the more positive the corresponding SHAP value, indicating that these factors aggravate the possibility of an allision accident. In the following, the three factors ‘misjudgment’, ‘efficient crew’ and ‘failure to use safe speed’ were used to analyze the specific causes of allision accidents.
(i)
All the collision accidents in this dataset are the responsibility of only one ship, and can almost be called unilateral accidents. In some of these accidents, the captain did not arrange a safe watch during berthing, and the ship was affected by ocean currents, causing the cable to break and then go out of control. In some cases, the captain did not check the actual height of the cargo and blindly steered the ship through the bridge. Some of them involved ships in cases of temporary water traffic deregulation, being out of the anchor’s range, not fixing routes as early as possible, a lack of indications on the chart, and the captain not fully considering the water traffic control or the ship traffic flow density and blindly choosing to plan a route through traffic flow congestion areas. With such misjudgments, maritime accidents are understandable.
(ii)
Poor visibility is also an important factor in allision accidents. In the vast sea, ships sailing at night are easily affected by night vision impairment, especially in the complex sailing environment, which is easily affected by background light on the shore, making it very easy to strike the dock or bridge. If the ship encounters poor visibility during navigation, and the captain fails to take measures such as blowing fog signals, driving at a safe speed and arranging regular lookouts according to the visibility conditions at that time, the probability of collision and allision accidents at sea will be significantly increased.
(iii)
The meteorological and hydrological conditions of the quay also affect allision accidents. In some cases, the shore base lacked sufficient support, but some captains were blindly confident that their ships could be berthed, ignoring the fact that they lacked berthing space. Some large tonnage ships attempted to berth at small quays. All these phenomena frequently occurred in Zhejiang’s waters, and all these risk factors may lead to maritime accidents.
(3)
Capsizing Accidents
A comprehensive evaluation of the effects of the factors on capsizing accidents is shown in Figure 12. Among the factors, ‘unreasonable loading mode’, ‘improper operation’, ‘strong wind’ and ‘high tidal current’ were the most important. The higher the characteristic’s value (red), the higher the corresponding SHAP value, indicating that these factors will aggravate the possibility of capsizing accidents. In the following, the SHAP values of ‘unreasonable loading mode’, ‘imperfect operation’, ‘strong wind’ and ‘high tidal current’ were used to analyze the specific causes of capsizing accidents.
(i)
When a capsizing accident occurs, an unreasonable loading method is arguably the primary factor leading to a maritime accident. The more a vessel is loaded with cargo in an unreasonable manner, the more likely it is to cause a capsizing accident. This is not only related to the properties of the cargo itself, but also to the captain’s excessive pursuit of saving time and effort to reduce costs and to carry out illegal operations. For example, after loading had been completed on one ship, the bulky cargo was higher than the circumference of the ship’s cargo hold hatch, the cargo hold was not covered with a hatch cover, the cargo was covered with canvas, and then the canvas was tied and fixed. There was no calculation of the lashing force and stability strength for the ship’s loading condition. When the ship encountered strong wind or the ocean current was fast, the cargo was jolted and displaced, which led to the ship becoming unbalanced, which led to water in the cargo hold and the ship tilted. With an increase in the ship’s roll, the water in the cargo hold increased, and the ship sank rapidly in the strong wind and waves.
(ii)
In addition, the captain did not fully consider the maneuvering ability of the ship and the ability to resist wind and waves due to the fast speed of some ships and the harsh meteorological conditions in the waters where the accident occurred. Due to insufficient estimation of the danger at that time, the ship ventured to sail close to the waters of the island and reefs, which led to the flooding of the hull with the bow wave, and then the ship lost stability and capsized [40].
(iii)
In fact, maritime operations are a comprehensive behavioral performance influenced by multiple factors. For example, unlike the general perception, the SHAP value shows that a rapid tide does not necessarily bring an increased risk of maritime accidents. It may be that the crew are more alert and pay more attention to the safety of navigation during a rapid tide. Focusing on a single characteristic alone cannot fully analyze the causal results of the orderliness of maritime navigation, so it is necessary to further investigate the effects of multiple factors on the orderliness of maritime navigation.

5. Conclusions

This study used the machine learning method XGBoost to analyze and investigate the factors that led to the occurrence of different types of maritime accidents. XGBoost used its data mining performance and optimization algorithms to help this analysis achieve a higher prediction accuracy. The maritime accidents investigated in this study were classified into six categories: collision accidents, grounding accidents, allision accidents, fire accidents, capsizing accidents and other accidents.
The overall prediction accuracy of the model was evaluated. Compared with RF and LR, XGBoost achieved higher accuracy than the other traditional models, showing higher performance, which indicates that it is a relatively new and better algorithm to model the probability of different types of maritime accidents occurring.
The second highlight of this study is the use of SHAP to interpret the model results by breaking the interpretation limitations of XGBoost. The SHAP model was implemented to interpret the results of XGBoost and visualize the results. Instead of analyzing the importance of a single feature, the SHAP interpretation in the results section enriched this study by analyzing the characteristics of the factors and elements of different maritime accidents occurring. Using SHAP to explain the correlation between maritime accident characteristics and maritime accidents can provide scientific guidance for maritime administrative departments and shipping companies regarding maritime accident prevention, control or management. The most important factors of ship collisions are ‘poor lookout’, ‘misjudgment’ and ‘failure to use safe speed’. It is suggested that the on-duty officer should make full use of radar, AIS and other effective means suitable for the environment and conditions at that time to maintain continuous and uninterrupted systematic observations, particularly paying attention to their use and carefully listening to the VHF radio phone. In dense navigable areas and in and out of ports, the VHF should be listened to carefully and information should be exchanged in time, and one’s own dynamics, intentions and suggestions should be clearly stated, so that the other side can understand the ship’s maneuvering intentions and the actions of the two ships can be coordinated. Enhanced flexibility and the ability to avoid mistakes in judgment are possible, especially at a safe speed. The captain cannot simply act in order to pursue economic benefits and to save time and labor during operations, and not comply with the rules and regulations. Captains should, as far as possible, remain in the wheelhouse, communicate and contact with other ships in advance, unify their avoidance intentions and avoid taking uncoordinated avoidance actions in close proximity. The master should obey the management directions of the VTS and conduct navigation and production operations in accordance with the prescribed requirements. The captain should pay attention to the guidance of all the crew during the voyage and pay attention to the safety tips of the pilot. The crew and the ship management company must strengthen the implementation of personnel training measures to improve business skills. The maritime administrative departments should intensify the publicity of water safety laws and regulations, resolutely stop behaviors such as operating boats without a license, overloaded transportation and risky navigation under bad weather or sea conditions, urge ship management companies to take the main responsibility for safety, strengthen enforcement inspections of coastal waters and increase the intensity of administrative law enforcement.
Although XGBoost and SHAP have several advantages, this study also has the following limitations:
(i)
There are few maritime accident data samples. The data adopted in this study were only from the maritime data of Zhejiang Province, China, and the relationship between maritime accidents and their characteristics analyzed here is applicable to the maritime situation of Zhejiang Province. The relationship between maritime accidents and accident risk characteristics can be further analyzed by obtaining more data from ports in other coastal provinces. At the same time, different possible causes of maritime accidents in different regions could be analyzed, and more targeted preventive measures could be proposed.
(ii)
The summary of risk characteristics of maritime accidents in Table 1 is based on accident reports and work experience and lacks specific methodology and theoretical support.
In light of the above, further research can be conducted:
(i)
The interpretability analysis model of SHAP adopted in this study can visualize and quantify the risk characteristics of accidents, but it belongs to the class of post hoc analysis. We are committed to developing a methodological model that provides real-time risk quantification during maritime operations.
(ii)
According to relevant studies, it is highly likely that human factors lead to accidents in maritime accidents. In future research, we will combine the theory of human factor reliability analysis to analyze the human factor accidents in maritime accidents.

Author Contributions

Writing—original draft preparation, C.Z.; supervision, X.Z.; writing—review and editing, C.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Social Science fund of China, grant number “18BFX175”.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. IMO. Statistics. Available online: https://www.imo.org/en/OurWork/IIIS/Pages/Statistics.aspx (accessed on 2 July 2022).
  2. INTERCARGO. Bulk Carrier Casualty Report 2012–2021. Available online: https://www.intercargo.org/bulk-carrier-casualty-report-2012-2021/.pdf (accessed on 2 July 2022).
  3. EMSA. Annual Overview of Marine Casualties and Incidents 2021; EMSA: Griffin, Germany, 2022. [Google Scholar]
  4. Chauvin, C.; Lardjane, S.; Morel, G.; Clostermann, J.-P.; Langard, B. Human and organisational factors in maritime accidents: Analysis of collisions at sea using the HFACS. Accid. Anal. Prev. 2013, 59, 26–37. [Google Scholar] [CrossRef]
  5. Zhang, J.; Teixeira, Â.P.; Soares, C.G.; Yan, X. Quantitative assessment of collision risk influence factors in the Tianjin port. Saf. Sci. 2018, 110, 363–371. [Google Scholar] [CrossRef]
  6. Heij, C.; Knapp, S. Effects of wind strength and wave height on ship incident risk: Regional trends and seasonality. Transp. Res. Part D Transp. Environ. 2015, 37, 29–39. [Google Scholar] [CrossRef] [Green Version]
  7. Ståhlberg, K.; Goerlandt, F.; Ehlers, S.; Kujala, P. Impact scenario models for probabilistic risk-based design for ship-ship collision. Mar. Struct. 2013, 33, 238–264. [Google Scholar] [CrossRef]
  8. Deng, J.; Liu, S.; Xie, C.; Liu, K. Risk Coupling Characteristics of Maritime Accidents in Chinese Inland and Coastal Waters Based on NK Model. J. Mar. Sci. Eng. 2021, 10, 4. [Google Scholar] [CrossRef]
  9. Erol, S.; Demir, M.; Çetişli, B.; Eyüboğlu, E. Analysis of ship accidents in the Istanbul Strait using neuro-fuzzy and genetically optimised fuzzy classifiers. J. Navig. 2018, 71, 419–436. [Google Scholar] [CrossRef]
  10. Xue, J.; Papadimitriou, E.; Reniers, G.; Wu, C.; Jiang, D.; van Gelder, P. A comprehensive statistical investigation framework for characteristics and causes analysis of ship accidents: A case study in the fluctuating backwater area of Three Gorges Reservoir region. Ocean. Eng. 2021, 229, 108981. [Google Scholar]
  11. Faghih-Roohi, S.; Xie, M.; Ng, K.M. Accident risk assessment in marine transportation via Markov modelling and Markov Chain Monte Carlo simulation. Ocean. Eng. 2014, 91, 363–370. [Google Scholar] [CrossRef]
  12. Roberts, S.E.; Pettit, S.J.; Marlow, P.B. Casualties and loss of life in bulk carriers from 1980 to 2010. Mar. Policy 2013, 42, 223–235. [Google Scholar] [CrossRef]
  13. Li, G.; Weng, J.; Hou, Z. Impact analysis of external factors on human errors using the ARBN method based on small-sample ship collision records. Ocean. Eng. 2021, 236, 109533. [Google Scholar] [CrossRef]
  14. Wu, B.; Zhao, C.; Yip, T.L.; Jiang, D. A novel emergency decision-making model for collision accidents in the Yangtze River. Ocean. Eng. 2021, 223, 108622. [Google Scholar] [CrossRef]
  15. Fan, L.; Zheng, L.; Luo, M. Effectiveness of port state control inspection using Bayesian network modelling. Marit. Policy Manag. 2022, 49, 261–278. [Google Scholar] [CrossRef]
  16. Maceiras, C.; Pérez-Canosa, J.; Vergara, D.; Orosa, J. A Detailed Identification of Classificatory Variables in Ship Accidents: A Spanish Case Study. J. Mar. Sci. Eng. 2021, 9, 192. [Google Scholar] [CrossRef]
  17. Jiang, M.; Lu, J.; Yang, Z.; Li, J. Risk analysis of maritime accidents along the main route of the Maritime Silk Road: A Bayesian network approach. Marit. Policy Manag. 2020, 47, 815–832. [Google Scholar] [CrossRef]
  18. Wu, B.; Tang, Y.; Yan, X.; Soares, C.G. Bayesian network modelling for safety management of electric vehicles transported in RoPax ships. Reliab. Eng. Syst. Saf. 2021, 209, 107466. [Google Scholar] [CrossRef]
  19. Wynants, L.; Van Calster, B.; Collins, G.S.; Riley, R.D.; Heinze, G.; Schuit, E.; Bonten, M.M.J.; Dahly, D.L.; Damen, J.A.; Debray, T.P.A.; et al. Prediction models for diagnosis and prognosis of covid-19: Systematic review and critical appraisal. BMJ 2020, 369, m1328. [Google Scholar]
  20. Pacoureau, N.; Rigby, C.L.; Kyne, P.M.; Sherley, R.B.; Winker, H.; Carlson, J.K.; Fordham, S.V.; Barreto, R.; Fernando, D.; Francis, M.P.; et al. Half a century of global decline in oceanic sharks and rays. Nature 2021, 589, 567–571. [Google Scholar] [CrossRef]
  21. Kou, G.; Chao, X.; Peng, Y.; Alsaadi, F.E.; Herrera-Viedma, E. Machine learning methods for systemic risk analysis in financial sectors. Technol. Econ. Dev. Econ. 2019, 25, 716–742. [Google Scholar] [CrossRef]
  22. Kumar, N.; Poonia, V.; Gupta, B.; Goyal, M.K. A novel framework for risk assessment and resilience of critical infrastructure towards climate change. Technol. Forecast. Soc. Change 2021, 165, 120532. [Google Scholar] [CrossRef]
  23. Berk, R.; Heidari, H.; Jabbari, S.; Kearns, M.; Roth, A. Fairness in criminal justice risk assessments: The state of the art. Sociol. Methods Res. 2021, 50, 3–44. [Google Scholar] [CrossRef]
  24. Zhang, W.; Wu, C.; Zhong, H.; Li, Y.; Wang, L. Prediction of undrained shear strength using extreme gradient boosting and random forest based on Bayesian optimization. Geosci. Front. 2021, 12, 469–477. [Google Scholar] [CrossRef]
  25. Zhang, D.; Chen, H.-D.; Zulfiqar, H.; Yuan, S.-S.; Huang, Q.-L.; Zhang, Z.-Y.; Deng, K.-J. iBLP: An XGBoost-based predictor for identifying bioluminescent proteins. Comput. Math. Methods Med. 2021, 2021, 6664362. [Google Scholar] [CrossRef] [PubMed]
  26. Zhao, D.; Wang, J.; Zhao, X.; Triantafilis, J. Clay content mapping and uncertainty estimation using weighted model averaging. Catena 2022, 209, 105791. [Google Scholar] [CrossRef]
  27. Yuan, C.; Li, Y.; Huang, H.; Wang, S.; Sun, Z.; Wang, H. Application of explainable machine learning for real-time safety analysis toward a connected vehicle environment. Accid. Anal. Prev. 2022, 171, 106681. [Google Scholar] [CrossRef]
  28. Qi, H.; Yao, Y.; Zhao, X.; Guo, J.; Zhang, Y.; Bi, C. Applying an interpretable machine learning framework to the traffic safety order analysis of expressway exits based on aggregate driving behavior data. Phys. A Stat. Mech. Its Appl. 2022, 597, 127277. [Google Scholar] [CrossRef]
  29. Parsa, A.B.; Movahedi, A.; Taghipour, H.; Derrible, S.; Mohammadian, A.K. Toward safer highways, application of XGBoost and SHAP for real-time accident detection and feature analysis. Accid. Anal. Prev. 2020, 136, 105405. [Google Scholar] [CrossRef]
  30. Zhejiang Provincial Bureau of Statistics. Statistical Bulletin of National Economic and Social Development of Zhejiang Province in 2021; Zhejiang Provincial Bureau of Statistics: Hangzhou, China, 2022.
  31. Zhejiang Maritime Safety Administration. Statistics of Ships of Zhejiang Maritime Safety Administration 2016–2021; Zhejiang Maritime Safety Administration: Hangzhou, China, 2022.
  32. Zhejiang Maritime Safety Administration. Analysis Report on Water Safety Situation of Zhejiang Maritime Safety Administration in 2021 and the Fourth Quarter; Zhejiang Maritime Safety Administration: Hangzhou, China, 2022.
  33. Onyshchenko, S.; Shibaev, O.; Melnyk, O. Assessment of potential negative impact of the system of factors on the ship’s operational condition during transportation of oversized and heavy cargoes. Trans. Marit. Sci. 2021, 10, 126–134. [Google Scholar] [CrossRef]
  34. Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
  35. Fan, S.; Blanco-Davis, E.; Yang, Z.; Zhang, J.; Yan, X. Incorporation of human factors into maritime accident analysis using a data-driven Bayesian network. Reliab. Eng. Syst. Saf. 2020, 203, 107070. [Google Scholar] [CrossRef]
  36. Qiao, W.; Liu, Y.; Ma, X.; Liu, Y. A methodology to evaluate human factors contributed to maritime accident by mapping fuzzy FT into ANN based on HFACS. Ocean. Eng. 2020, 197, 106892. [Google Scholar] [CrossRef]
  37. Ahn, S.I.; Kurt, R.E.; Akyuz, E. Application of a SPAR-H based framework to assess human reliability during emergency response drill for man overboard on ships. Ocean. Eng. 2022, 251, 111089. [Google Scholar] [CrossRef]
  38. Lv, P.; Zhen, R.; Shao, Z. A Novel Method for Navigational Risk Assessment in Wind Farm Waters Based on the Fuzzy Inference System. Math. Probl. Eng. 2021, 2021, 4588333. [Google Scholar] [CrossRef]
  39. Szlapczynski, R. Evolutionary sets of safe ship trajectories: A new approach to collision avoidance. J. Navig. 2011, 64, 169–181. [Google Scholar] [CrossRef]
  40. Zhejiang Provincial Bureau of Statistics. Water Safety Accident. Available online: https://www.zj.msa.gov.cn/ZJ/zwgk/gkml/xzqz/index.html (accessed on 4 July 2022).
Figure 1. International Maritime Organization (IMO) Global Integrated Shipping Information System (GISIS) Statistics (from 1 January 2010 to 1 June 2019).
Figure 1. International Maritime Organization (IMO) Global Integrated Shipping Information System (GISIS) Statistics (from 1 January 2010 to 1 June 2019).
Jmse 10 01154 g001
Figure 2. Map of maritime accident sites in Zhejiang, People’s Republic of China.
Figure 2. Map of maritime accident sites in Zhejiang, People’s Republic of China.
Jmse 10 01154 g002
Figure 3. The proportions of different marine accidents.
Figure 3. The proportions of different marine accidents.
Jmse 10 01154 g003
Figure 4. Frequency of the features of marine accidents.
Figure 4. Frequency of the features of marine accidents.
Jmse 10 01154 g004
Figure 5. Technological pathway.
Figure 5. Technological pathway.
Jmse 10 01154 g005
Figure 6. Model construction process.
Figure 6. Model construction process.
Jmse 10 01154 g006
Figure 7. Confusion matrix of the marine accident classification and prediction model.
Figure 7. Confusion matrix of the marine accident classification and prediction model.
Jmse 10 01154 g007
Figure 8. Feature importance ranking.
Figure 8. Feature importance ranking.
Jmse 10 01154 g008
Figure 9. SHAP values of the top 20 significant eigenvalues.
Figure 9. SHAP values of the top 20 significant eigenvalues.
Jmse 10 01154 g009
Figure 10. Scatter diagram of the comprehensive influence of the factors on ship collisions.
Figure 10. Scatter diagram of the comprehensive influence of the factors on ship collisions.
Jmse 10 01154 g010
Figure 11. Scatter diagram of the comprehensive influence of the factors on ship allision accidents.
Figure 11. Scatter diagram of the comprehensive influence of the factors on ship allision accidents.
Jmse 10 01154 g011
Figure 12. Scatter diagram of the comprehensive influence of the factors on capsizing accidents.
Figure 12. Scatter diagram of the comprehensive influence of the factors on capsizing accidents.
Jmse 10 01154 g012
Table 1. Features representation and classification of marine accidents.
Table 1. Features representation and classification of marine accidents.
General Feature ClassificationSecond-Level Feature ClassificationSymbolNumberElements
Management characteristicsCompany, shipP1–P55Inadequate implementation of SMS, inadequate shore-based support, poor duty arrangement, personnel training and equipment maintenance
Environmental characteristicsWeather, navigation environmentO1–O1010Poor visibility, strong wind, high tidal current, high ship density, narrow waterways, fishing areas, natural environment (e.g., intermediate obstacles and channel curvature), no shipping marks set or damaged, poor channel maintenance and no chart mark
Personnel characteristicsCompetency, lack of lookout, operational errors, insufficient crew, fishing boats, failure to use safe speed, negligence of good sailing practice, lack of crew responsibility, hit and runV1–V1515Inadequate training and drills, poor psychological quality, poor physical quality, poor lookout, misjudgment, improper operation, insufficient crew, fishing boat crew grabbing the bow, fishing boat crew fatigue, failure to use safe speeds, negligence of good sailing practice, failure to equip or correct charts, crew in poor condition (drunk and/or fatigued), failure to stand by the engine and anchor, hitting and running
Ship characteristicsEquipment, maintenance, goods, fishing boatsT1–T99Incomplete equipment, unreasonable design, equipment damage, loss of function, unseaworthiness, unreasonable loading mode, poor ship condition, fishing boat operation mode, fishing boat sailing without a license
Pilotage characteristicsInaccurate pilotage scheme, improper pilot operation, communication and cooperation between pilot and pilot, physical and mental state of pilotR1–R44Unsuitable pilotage plan, improper pilot operation, communication and cooperation between pilot and pilot, physical and mental state of pilot
Wharf characteristicsImproper berthing arrangements and over-standard berthingS1–S44Lack of berthing space, poor navigation of surrounding ships, interference of frontier meteorological and hydrological conditions, and over-standard berthing
Table 2. Parameter configuration table.
Table 2. Parameter configuration table.
ParametersSet ValueParametersSet Value
max_depth5n_estimators100
min_child_weight1scale_pos_weight1
gama0.1alpha0.1
subsample0.8lambda1
Colsample_bytree0.8Learning_rate0.1
Boost
(General parameters)
gbtreeNum_class
(Task parameters)
2
Table 3. Comparison of model prediction accuracy (%).
Table 3. Comparison of model prediction accuracy (%).
Evaluation IndicatorsXGBoostRFLR
Acc97.1494.2891.42
Pre97.1494.2891.42
Rec97.1494.2891.42
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Zhang, C.; Zou, X.; Lin, C. Fusing XGBoost and SHAP Models for Maritime Accident Prediction and Causality Interpretability Analysis. J. Mar. Sci. Eng. 2022, 10, 1154. https://doi.org/10.3390/jmse10081154

AMA Style

Zhang C, Zou X, Lin C. Fusing XGBoost and SHAP Models for Maritime Accident Prediction and Causality Interpretability Analysis. Journal of Marine Science and Engineering. 2022; 10(8):1154. https://doi.org/10.3390/jmse10081154

Chicago/Turabian Style

Zhang, Cheng, Xiong Zou, and Chuan Lin. 2022. "Fusing XGBoost and SHAP Models for Maritime Accident Prediction and Causality Interpretability Analysis" Journal of Marine Science and Engineering 10, no. 8: 1154. https://doi.org/10.3390/jmse10081154

APA Style

Zhang, C., Zou, X., & Lin, C. (2022). Fusing XGBoost and SHAP Models for Maritime Accident Prediction and Causality Interpretability Analysis. Journal of Marine Science and Engineering, 10(8), 1154. https://doi.org/10.3390/jmse10081154

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop