1. Introduction
The World Health Organization (WHO) declared in May 2023 that the COVID-19 pandemic no longer constituted a “public health emergency of international concern”. International trade is gradually returning to business as usual and the global supply chain is beginning to restart, prompting a gradual return to normalcy in the shipping industry.
As an indispensable part of the international shipping industry, bulk carriers mainly transport grain, coal, ore, and other cargoes, and they are mainly divided into four categories according to their tonnage: Capesize, Panamax, Handymax, and Handysize. As a complex and volatile market, the dry bulk shipping market is susceptible to the influence of the commodity market and fuel market [
1], and shipowners’ investment in ships is closely related to the market, operation, corporate strategy, and industry cluster factors [
2]. In recent years, with the changes in the external economic and trade environment, it has become increasingly difficult for shipowners to make decisions [
3].
Emissions of sulfur oxides are only behind nitrogen oxides, and are also an environmental issues worthy of consideration [
4]. As the International Maritime Organization (IMO) requires the implementation of the “sulfur cap” from 1 January 2020, shipowners’ decisions on sulfur oxide emission abatement solutions have become a new topic of discussion, requiring a global reduction in the maximum sulfur content of ship fuels to 0.5% for environmental improvements and human health [
5]. In response, the shipping industry is always looking for more economical ways to reduce sulfur emissions, and there tend to be different decision options for different ship types and operating models [
6]. In the case of newbuilding ships, retrofitting is generally not carried out in the short term after the determination of the scheme, so it is more necessary to combine the initial investment and long-term operating costs to choose a suitable emission abatement solution [
7], which is more conducive to maintaining long-term competitiveness.
Currently, the three most popular approaches are using Very Low Sulfur Fuel Oil (VLSFO), continuing to use High Sulfur Fuel Oil (HSFO) but installing sulfur oxide scrubbers to reduce the exhaust gases, and using alternative fuels. VLSFO refers to fuel oil with a sulfur content of no more than 0.5%. The scrubber, which filters the exhaust gas to reduce the sulfur content, is a high investment in the initial construction. The initial investment in choosing to use VLSFO is relatively low compared to installing a scrubber tower, making it more cost-effective in the short term, as scrubbers require a significant up-front capital outlay and regular maintenance and monitoring [
8]. Karatuğ et al. concluded that although SOx emissions were effectively reduced due to the installation of scrubbers, other types of emissions increased with the use of HSFO and a higher fuel consumption (including CO
2), thus contradicting the decarbonization strategy assigned by IMO, which is to reduce the carbon intensity of all ships by 40% by 2030, compared to the 2008 baseline [
9]. The most typical representative of alternative fuels is Liquefied Natural Gas (LNG), and ammonia has also been gradually developed. Even before the policy was implemented, LNG was seen as a promising alternative fuel that could effectively reduce pollution, but also had certain safety concerns [
10]. It is mainly characterized by its susceptibility to leaks that can cause serious accidents [
11], especially in coastal areas where there is a lack of complete infrastructure for fuel refueling [
12]. Although some government subsidies are widely used [
13], LNG prices may change abruptly at any time [
14]. It is more difficult to grasp the cost, causing many ship owners to remain in the wait-and-see stage or take the “LNG Ready” option, which is to design for future conversion to LNG. It has also been suggested that improving energy consumption efficiency will help maritime policy makers to provide more reasonable regulations for improving the energy consumption of ships [
15]. Through the survey and analysis of existing bulk carriers in service after 2020 and their new ship orders, the use of VLSFO and the installation of scrubbers are the two most common emission abatement solutions currently used by bulk carriers. Therefore, this paper will also make a decision among these two emission abatement solutions.
The data in this article come from the Clarkson Maritime Intelligence Network, which manually collected data on about 680 newbuilding bulk carriers. Although the amount of data is small, each sample is highly representative. The purpose of this paper is to find out the factors affecting newbuilding bulk carriers regarding the selection of sulfur oxide emission abatement solutions through big data, and to establish a classification prediction model, including Logistic Regression (LR), K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Random Forest (RF), Adaptive Boosting (AdaBoost), and Extreme Gradient Boosting (XGBoost), to compare their prediction accuracy and select the best model to provide some reference and suggestions for shipowners’ decisions on sulfur abatement solutions. Our analysis, at the theoretical level, suggests that the XGBoost model is more suitable for solving this research problem.
The remainder of the paper is structured as follows.
Section 2 reviews the existing literature, provides an overview of the existing research results, and highlights the innovation of the paper.
Section 3 details the data processing and machine learning methods used in this paper.
Section 4 describes the data used in this paper and the pre-processing procedure and provides a descriptive analysis of the data.
Section 5 conducts an empirical analysis where multiple machine learning models are used to make predictions and the predictive effects of different models are compared through hyperparameter tuning. In addition, the importance of each influencing factor is analyzed and forecasts are made for newbuilding bulk carriers.
Section 6 summarizes the full paper and provides an outlook for future research.
4. Data Description and Preprocessing
4.1. Data Sources
This paper uses data from the Clarkson Shipping Intelligence Network. The data set collected includes 683 bulk carriers in service between January 2020 and October 2023, covering three types of bulk carriers: Capesize, Panamax, and Handymax. To avoid the duplication of statistics, only one of the sister vessels ordered by the same company was taken as a sample. Meanwhile, the Handysize bulker almost tended to use VLSFO in the emission abatement solution, so they are not included in the research sample of this paper.
Among active bulk carriers and new ship orders, ships using alternative fuels as an emission abatement solution occupy a very small proportion, with only eight active bulk carriers using alternative fuels between 2020 and 2023, and only 38 ships using alternative fuels, accounting for only 4.26% of all new bulk carrier orders, according to Clarkson’s new ship orders. We understand that the emission abatement solution for bulk carriers is mainly based on VLSFO and the installation of scrubbers, so in the choice of a solution, this paper focuses on these two.
A total of nine representative characteristics were collected in this paper, which are the Age of the Ship (Age), Deadweight Tonnage (DWT), Main Engine Power (Power), Annual Average Speed (Speed), Annual Distance Traveled (Distance), Price Difference between Low and High Sulfur Fuel (PD), Baltic Dry Index (BDI), Ship Docking Time in Port (DT), and the Proportion of Sailing Time in the Sulfur Emission Control Area (ECA).
The age of the ship is measured in years, and the collected ships were put into service from 2020 to 2023, so the value is taken to be between 1 and 4. The DWT reflects the actual weight of the ship, and the unit of Power is kW. The data suggest that there may be a strong correlation between the DWT and Power.
In the process of obtaining ship features data, the DWT, Age, and Power can be directly obtained. Speed is the average speed of the ship in a service year in knots. The Distance is the average distance traveled by the ship in a service year in nautical miles. For the distance traveled by the ship in 2023, it can be derived from the distance traveled and the months traveled. The fuel price difference is the price difference between HSFO and Marine Gas Oil (MGO). Here, since VLSFO has been available since 2020, MGO is used instead of VLSFO in this paper, and the fuel price is the current month’s fuel price when the shipbuilding contract is signed by the shipowner. Singapore is the world’s largest bunker oil trading market, so the current month’s bunker price in Singapore is chosen as a feature variable in this paper. BDI is also derived from the corresponding value in the month when the shipbuilding contract is signed. In addition, this paper also introduces the sailing time weighting and ship docking time in port in the ECAs, which includes four regions, the Baltic Sea, the North Sea, North America, and the U.S. Caribbean, where the sulfur content of ship fuel is required to be no more than 0.1%. We also take the Hainan and Yangtze River of China into consideration.
4.2. Data Preprocessing
Data preprocessing is an essential preparation before data analysis. We mainly deal with missing values and outliers, and then further standardize them.
In dealing with the missing data, some ships are missing important variables, such as the distance traveled and the time when the shipbuilding contract was signed. Since the missing samples are small and have no major impact on the construction of the model, the ships with missing data are directly excluded from this paper.
For the outliers in the individual features of the ships, mainly for the outliers of the Speed and Distance, we considered the data other than the mean plus or minus three times the standard deviation as outliers and deleted the corresponding ships. In addition, we also fully considered whether the ships had been repaired or modified during service, and corrected the sailing distances for such samples. Finally, we obtained 683 valid samples, which are detailed in
Table 2.
To reduce the variation in the magnitude of the data, we standardized the data by the Z-score standardization method, which is based on the mean and standard deviation of the original data. The formula used is as follows:
where
is the normalized data,
is the original value of the sample, and
is the mean value of the corresponding feature of the sample.
4.3. Descriptive Analysis
Table 3 calculates the mean, median, and standard deviation of each feature of the two emission abatement solutions for newbuilding bulk carriers, respectively.
Figure 3 shows the box plots of each feature separately, where the distribution can be seen further. Based on the descriptive statistical analyses and box plots, the data can be better understood and certain conclusions can be drawn to provide some references for further empirical analyses.
According to the data in
Figure 3a, we can find that the proportion of bulk carriers using VLSFO has been increasing in recent years. In addition,
Figure 3b,c shows that as the DWT and Power increase, the more likely the shipowner is to use scrubbers.
Figure 3d,e represent that those ships using scrubbers tend to be faster and travel longer distances, which may have some relationship with their fuel costs.
Figure 3f,g indicate that the higher the PD and the higher the BDI, the more likely the shipowner is to use scrubbers, but the effect is not significant.
Figure 3h shows that the longer the time sailing in ECA is, the more likely the ship owner is to use VLSFO. According to the data in
Figure 3i, the ships using VLSFO have a shorter DT. However, this result may require further validation. The above conclusions can also be obtained from the descriptive statistics charts in
Table 3.
Finally, we calculated the correlation coefficients among the indicators, and
Figure 4 shows the correlation coefficients between VLSFO and the scrubbers. We found that there is a highly positive correlation between the DWT and Power among the ships using scrubbers in
Figure 4b, and the ships with a larger tonnage also traveled relatively more distance, so we need to consider the multicollinearity problem in the modeling process later. Secondly, there are also correlations between the DWT and the Power, Distance, and PD.
5. Results and Analysis
In this section, we analyze the predictions of the model and
Table 4 shows the list of acronyms for the features and evaluation metrics involved.
5.1. Model Predictions
By screening different features for iterative modeling comparisons, we finally selected seven features as the input variables for the prediction model, namely Age, DWT, Power, Speed, Distance, PD, and BDI. We excluded the two features ECA and DT because their introduction does not significantly improve the prediction and may even lead to a decrease in the prediction results, like the results of Zis et al. [
28]. In addition, we mentioned in
Section 3 that there is a strong correlation between the DWT and Power, so when using LR, we excluded Power to avoid the covariance problem. In the machine learning model, there is no need to consider the multicollinearity problem.
After the standardization of the samples, we first tried to balance the samples using three different methods and constructed the model using the default hyperparameters to compare the effects of the three methods. The results are shown in
Table 5, where we can see that the method of using Borderline SMOTE to synthesize the data is less effective. Therefore, our next work will be based on the two methods of using raw data and stratified sampling to divide the dataset to adjust the hyperparameters of the model.
After the experimental comparison, we found that the highest ACC rate of 81.04% can be obtained using the XGBoost model without the data balancing process. Additionally, the SVM model has an advantage in terms of the individual error rate. In addition, we also found that the error rate of the VLSFO samples is lower compared to the scrubber samples. This may be due to the higher similarity of the features of some of the ships installed with scrubbers and those using VLSFO. Therefore, in future research work, we need to introduce relevant features with more differentiation. Overall, the integrated learning algorithm outperforms single classifiers by combining the predictions of multiple classifiers, reducing the risk of the overfitting of individual classifiers and improving generalization capabilities, so we have focused on further tuning the hyperparameters of these three integrated learning models.
5.2. Adjustment of Hyperparameters
We have listed their main hyperparameters and their adjusted ranges for the three models, and the details of the hyperparameters and their adjusted ranges are shown in
Table 6. For the unmentioned hyperparameters, we took the default values because they do not positively affect the models during the adjustment process. By using a grid search combined with ten-fold cross-validation, we obtained the results of the optimal hyperparameters with the goal of finding the highest
ACC. In
Table 7, we display the obtained optimal hyperparameters and the
ACC for each model on the test set under the two imbalance treatments.
By comparing the results, we have found that the highest
ACC of 84.25% is obtained after adjusting the three hyperparameters of ‘
n_estimators’, ‘
learning_rate’,
and ‘
max_depth’ under the method of stratified sampling to divide the dataset.
The ‘n_estimators’ is used to control the number of trees. The ‘
learning_rate’ and ‘
max_depth’ adjustments have a large fluctuation effect on the
ACC. The ‘
learning_rate’ controls the weight reduction factor of each weak learner. The smaller the value is, the faster the model converges to the optimal solution, but it may also increase the risk of overfitting. The ‘
max_depth’ is used to control the maximum depth of the tree; a larger value allows the model to predict the results more accurately, but it also leads to a longer training time. Therefore,
Figure 5 shows the effect of the variation of these two hyperparameters on the
ACC when ‘
n_estimators’ is taken as 800, with the horizontal and vertical coordinates of ‘
learning_rate’ and ‘
max_depth’, and the vertical coordinates indicating the
ACC after tenfold cross-validation on the test set, from which it can be clearly seen that most of the
ACC is distributed around 0.82 to 0.83. The highest
ACC of 84.25% is obtained when the values of ‘
learning_rate’ and ‘
max_depth’ are 0.4 and 5, respectively.
The remaining hyperparameters are default values, mainly ‘Alpha’ is the L1 regularization parameter in the range [0, +∞) with a default of 0. The larger the parameter, the less likely it is to overfit. The ‘Lambda’ is used to control the regularization part of XGBoost, i.e., it is used to control the complexity of the model, with a default of 0, which means that the default regularization term is used. When the value of the ‘Lambda’ parameter is greater than 0, it means that a stricter regularization term is used, which can reduce the risk of overfitting. The ‘Gamma’ is used to control the splitting strategy of the tree, i.e., the splitting operation is initiated when the loss function of a node drops below a set threshold. When its value is taken as large values, the model will be more conservative and not split easily, which leads to more stable leaf nodes of the tree. When its value is small, the model will be more aggressive and tends to split the child nodes earlier to obtain higher gains with the highest ACC when it is taken as 0.
We introduced in
Section 2 that the superior performance of the XGBoost model may be more appropriate for this study, and the results are consistent with the expectations. The RF model is integrated by the averaging of votes and uses random subsets of features and the randomness of decision trees to improve model diversity, compared to the XGBoost model which uses more complex weak classifiers (e.g., decision trees with variable depths) and provides a better fit and prediction performance through effective techniques such as gradient calculations and feature subsampling to update the model parameters more accurately in each iteration. Thus, it improves the model performance providing a better fitting ability and predictive performance. The AdaBoost model uses loss functions, but focuses only on misclassified samples in each iteration, while XGBoost updates the model parameters more accurately in each iteration. The XGBoost model is designed with parallel computing and efficiency optimization in mind to improve the training speed and efficiency, which may cause the AdaBoost model to perform less well than the XGBoost model in some cases.
Figure 6 presents the effect of the number of iterations of the XGBoost model on the loss value of the training set for the optimal hyperparameters found above. ‘
logloss’ refers to the loss value of the training set calculated using log loss as an evaluation metric during the training process. The training process of the XGBoost model aims to minimize the ‘
Train_logloss’ in order to improve the model’s ability to fit the training set. When the ‘
Train_logloss’ is smaller, it indicates that the model’s predictions on the training set fit the true labels better. However, in order to avoid overfitting, it is also necessary to pay attention to the performance of the model on the test set. It is clear from the
Figure 6 that the decreasing trend of the model logloss value slows down when the number of iterations is greater than 200, and finally, when the number of iterations reaches 800, the value of ‘
Train_logloss’ is about 0.0039 and the value of ‘
Test_logloss’ is about 0.0550, which reaches a low level and continues to decrease, indicating that the model has reached some kind of steady state and there is no overfitting. We have also searched through the grid to obtain the highest
ACC on the test set at a number of iterations of 800.
In
Table 8, we compare the changes in the relevant metrics before and after hyperparameter tuning, and find that the
ACC, the false positive rate of both types of samples, and the
AUC value are all optimized to some extent. Specifically, our
ACC significantly improved from 79.99% to 84.25%, and the false positive rate also decreased, while its
AUC value improved from 0.8781 to 0.9019. These results further indicate that the model quality is significantly improved after hyperparameter adjustment.
5.3. The Importance of Output Variables
We calculated the important share ranking of each feature in the decision making of the ship abatement scheme by the XGBoost model based on stratified sampling, and the results show that the DWT, Distance, and Power are the three most important features in the decision-making process, and their importance shares are 0.2757, 0.2461, and 0.1277, respectively. In contrast, Age, BDI, and PD have less influence on decision making, and their importance ratios are relatively small. The ranking of the importance of these features can provide a direct reference basis for shipowners’ decision making and help them to choose the optimal emission abatement solution more accurately. The results of the feature importance ranking are shown in
Figure 7.
DWT and Distance are two important features of bulk carriers, accounting for more than half of the total. The descriptive statistical analysis shows that these two features have a strong differentiation. On large bulk carriers, they are more inclined to use scrubbers. Therefore, when shipowners need to make a decision, they can first consider the feature of deadweight tonnage. In addition, travel distance is another very important feature, and there is a large difference in the travel distance between the two different emission abatement solutions. Bulk carriers using the scrubber solution tend to travel longer distances, which is also related to the vessel’s travel cost. Referring to the results of Bai et al. [
34], the larger the ship tonnage is and the longer the distance traveled, the more likely the scrubber solution will be used. Therefore, when considering emission abatement solutions, it is recommended to give priority to these two features. Shipowners can refer to the importance of each feature when making decisions, so as to choose a more appropriate decision solution.
5.4. Certification of Newbuilding Bulk Carriers
To verify the practical applicability of the model on the newest ships, we again collected 51 ships that were built and commissioned in January 2024 and introduced their characteristics into the model to make the predictions. New ships built at this time have relatively smooth characteristics and are also representative of the new trend.
Due to the time delay of data collection, it was not possible to obtain an accurate annual distance traveled by the ship, so the distance traveled in the last 30 days was used for the estimation, and the data were collected in the year 2024 in May.
The 51 sample vessels were brought into the optimal model in the previous section for prediction, and their
ACC,
Error Rate, and
AUC values were obtained as 86.27%, 15.79%, 15.38%, and 0.9017, respectively. The prediction results might have been affected by the errors in their sailing distances and speeds. However, overall, this can indicate a better prediction for new bulk carriers. Some of the prediction samples are shown in
Table 9. In addition, the input characteristics of these samples are shown in
Appendix A.
6. Conclusions
Taking the selection of newbuilding emission abatement options as a binary classification problem, this study uses four years of newbuilding bulk carrier data for preprocessing and constructs a classification model based on three methods of balancing samples and six machine learning models to bring into the test set for prediction, and adjusts the hyperparameters of the model by ten-fold cross-validation. The results show that the XGBoost model based on stratified sampling to divide the dataset has the highest ACC of 84.25% with an AUC value of 0.9019, which is also consistent with our theoretical rationale for choosing the XGBoost model. Forecasts for newbuilding bulk carriers in 2024 have also been more successful. In addition, this study outputs the importance level of each feature to be considered for decision making. We found that the deadweight tonnage of the ship and the sailing distance can be considered as priority factors. Combined with the results of descriptive statistics, we found that the larger the tonnage of the ship is and the longer the annual distance sailing is, the more inclined the shipowner is to install scrubbers. The above results are a guide for shipowners when making decisions on emission abatement solutions. Shipowners can focus on the deadweight tonnage of the ship and the future route to be sailed. The XGBoost model can be constructed by calculating the approximate sailing distance and ship speed, combining the current fuel price, BDI, and other factors, and bringing in the expected feature variables for prediction, in order to come up with a suitable solution. At the same time, it is possible to flexibly adapt these features, especially those related to the ship itself, for comparison between solutions.
This paper has the following shortcomings, which can be further improved in future work: 1. VLSFO and scrubbers are the most important emission abatement solutions for bulk carriers at present. In the future, as the proportion of alternative fuel ships is expanding, shipowners may face new choices. Therefore, when the sample of alternative fuel is increased, it will be more reasonable to build a decision model for shipowners. Furthermore, the data used are from three years after the implementation of the “sulfur cap” policy. 2. Consider adding other relevant economic and non-economic factors, such as national policies, the impact of other major pollutants, antitrust exemptions, and shipping alliances [
40], to consider the factors of shipowner’s choice of emission abatement solutions from a more comprehensive perspective, and also whether the COVID-19 outbreak had a certain impact on carbon emission reduction in shipping [
41] and on sulfur emission abatement. How will the factors affecting shipowners’ decision making change in the post-epidemic era? 3. Applying the ideas of this study to other types of ships and comparing the differences in the models and results between different types of ships will better demonstrate the models and prediction methods adapted to different types of ships. 4. Explore and apply emerging machine learning models to improve the accuracy and efficiency of predictive models, making research more meaningful and providing stronger support for ship owners to make the right emission abatement solutions.