1. Introduction
A ship is a watercraft for transporting cargo and passengers through the sea, rivers, and waterways. It is a complex plant comprising cargo loading and unloading systems, a navigation system, and a system that operates engine room machinery, among other things. In addition, ships are usually operated for more than 25 years after construction and are considered mobile factories that require economic maintenance before being scrapped [
1]. Therefore, continuous maintenance systems and processes are needed for stable and efficient ship operations [
2,
3,
4,
5,
6,
7,
8].
Because ships sail far from land, the maritime environment, such as weather conditions, has a significant impact on them. The number of sailors operating them is also limited, making immediate response difficult when ships and machines are damaged, and immediate land support during emergencies is also limited. The failure of major systems, such as the ship’s propulsion engine, requires stopping the ship’s operation and repairing or replacing it. This not only delays the voyage, but maintenance work in the sea, where the ship fluctuates, can also result in casualties. In addition, companies responsible for ship maintenance have poor access to ships, and an immediate supply of related equipment and materials, which also require high costs, is difficult [
9].
To improve economic feasibility and operational efficiency, time-based maintenance (TBM) and corrective maintenance methods, which are commonly used for ship maintenance, must be enhanced. Therefore, studies are being conducted to introduce predictive maintenance (PdM) by diagnosing machine and system conditions and failures to improve the operational efficiency of ships.
Michala et al. (2015) and Lazakis et al. (2016) applied a machinery risk assessment (MRA) that diagnoses conditions and failures by calculating failure rates, mean time between failures, and probability of failure based on current and historical data. Decision support systems using MRA were combined with a ship monitoring system to support maintenance prediction and maintenance decisions of ship machinery to improve and optimize the energy efficiency of ship machinery [
2,
10]. Liu et al. (2022) measured marine environment data of wind and waves and performance data such as engine speed, output, and ship speed. They calibrated the impact of the marine environment data on the collected performance data. In addition, the ship’s acceleration and deceleration, route change, and wave conditions are filtered hydrodynamically. The filtered data were derived as an actual propulsion performance curve of engine power-RPM, and a methodology for predicting the ship’s condition according to the fouling of the ship’s hull and propeller was presented by comparing it with the design performance curve of the engine. A study was conducted to support a prediction-based maintenance and repair system to reduce exhaust gas emissions from ships, such as greenhouse gases, and to increase operational efficiency [
7].
Notably, research to improve maintenance efficiency is being conducted; however, due to the operational characteristics of the ship, and the difficulty in securing valid data classified according to the characteristics of the machine and system, many studies are not being conducted.
If a defect or breakdown occurs in one of the numerous systems configured in a ship, the ship cannot be operated, and research that can realistically utilize PdM for propulsion engines that have a high maintenance cost burden is urgently needed. Machine learning, a branch of artificial intelligence, is being used as a technology to implement PdM through the condition monitoring of devices and systems [
11].
To derive reliable results from machine learning, large amounts of normal and abnormal state data are required, as well as data classification based on the engine’s and each machine system’s control and operational characteristics. However, securing abnormal operation data for ships is difficult because of engine abnormalities when applying TBM-based preventive maintenance, and ship data fluctuate frequently due to control characteristics. In addition, owing to the wide range of steady states of the engine, securing an effective learning database for PdM remains challenging, implying that many limitations exist when implementing the PdM of the engine using machine learning.
Therefore, this study analyzes the operation and control characteristics of the propulsion engine, acquires engine data from the alarm monitoring system (AMS) of the ship in operation, and develops a data preprocessing algorithm that considers the engine control characteristics for data preprocessing. Principal component analysis (PCA) and K-nearest neighbors (KNN), which belong to machine learning algorithms, are used to check whether preprocessing data is classified according to engine control characteristics. By building and verifying an algorithm that can detect abnormal data, a valid learning database necessary for PdM is secured. In addition, an algorithm for detecting the abnormal operation state of an engine is constructed and verified. Thus, the basis of the ship propulsion engine PdM using machine learning is established.
2. Materials and Methods
2.1. Research Target Ship and Data
The ship used to collect operational data was a training ship with a total length of 133 m, a total tonnage of 9196 tons, a cruising distance of 14,500 nmi, and a sailing speed of 17.7 knots, sailing on the coast and an ocean. The ship was equipped with a controllable pitch propeller (CPP) with one thruster each at the fore and aft. It was also equipped with a dynamic positioning system installed to maintain the ship’s position by controlling the surge, way, and yaw among the six degrees of freedom of the ship using the engine, thruster, and CPP.
The research target ship was a vessel equipped with the latest machinery and system to be applied to general ships, such as merchant ships, in the future, and acquiring the data necessary for future PdM technology supply in advance is possible. The established AMS was also selected as a collection ship for operational data because it can acquire more data types than existing ships.
Table 1 lists the specifications of ship details.
AMS is a system that monitors, controls, and manages navigation equipment on a ship, such as radar, GPS, and steering systems; cargo equipment that monitors, loads, and unloads cargo; and engine room equipment, such as propulsion and generator engines. The propulsion engine data used in the study only included numerical data for each system in AMS that can determine the engine state. The 104 selected data were acquired during the 8th voyage (Voyage No. 21071, 21081, 21091, 21101, 21102, 21112, 21121, and 21122) and used for the study.
2.2. Ship Propulsion Engine Control Characteristics
The propulsion engines of research target ships and medium and large ships are generally applied with two-stroke diesel engines that have advantages in output (torque), structure, and weight considering the ship size. The propulsion engine under consideration has a rated power of 6618 kW and is a six-cylinder engine, as detailed in
Table 2.
In general, unlike automobile engines, ship propulsion engines lack idling that operates without a load, as the engine and propeller are directly connected to the shaft without a transmission or are connected to the propeller shaft through a reducer. However, for a special ship equipped with CPP, such as the research target ship, the idling operation is possible through propeller pitch control.
The ship’s propulsion engine is controlled by the engine telegraph, which immediately starts the engine with compressed air to move forward and backward at the engine stop state, but the CPP-type ship operates in the engine idling state. Engine control is classified into a maneuvering mode that can quickly and flexibly change engine load according to operating and maritime conditions, such as entry, departure, and anchoring, and a cruising mode for constant-speed operation according to the operating target speed.
The maneuvering mode on the engine telegraph comprises four stages, each having stop, dead slow, slow, half, full, and ahead (forward), and astern (backward), and each control stage is sequentially controlled based on the engine speed (RPM).
Table 3 lists the engine control step of the ship under study, and unlike ordinary ships, it simultaneously controls the pitch of the propeller simultaneously as the RPM. The navigation full step on the engine telegraph is configured to reach the target RPM (speed) by the logic program in cruise mode.
The load of the ship’s propulsion engine varies according to the engine telegraph operation step and the marine environment, and the operating status is changed accordingly.
2.3. Algorithm Development Tools
Python’s grammar is simple, organized, and easy, so anyone can acquire and use it in a short time. In addition, the grammar structure is simple, so the error rate due to the complex code configuration is low, the code configuration enables fast and many tasks, and the development efficiency is high because it is easy to link with other programming languages and libraries [
12,
13,
14,
15]. Furthermore, Python can be used in various open-source packages in a general-purpose program language, so developers in various fields can share libraries and source codes so that users can easily access and use them [
12,
13,
14,
15].
Python can use various machine learning libraries, such as Scikit-learn and tensorflow, and provide many standard libraries, so it is possible to quickly build the machine learning algorithms required in each field by utilizing shared libraries and source codes [
12,
13,
14,
15].
As described above, Python has the advantage of being able to utilize various libraries and source codes, including scalability and interworking. Therefore, Python 3 was used in this study to develop and verify machine learning algorithms for ship data analysis and PdM and among the Python libraries, numpy, pandas, matplotlib, and sklearn were utilized for research.
2.4. Research Utilization Algorithm
2.4.1. PCA
PCA is an algorithm that reduces data dimension and uses an orthogonal transformation to transform high-dimensional data that are correlated with each other into low-dimensional data with minimal linear correlation [
16,
17,
18]. PCA is an analysis method for finding the weight for each variable of data in which information loss is minimized, and the conversion formula is shown in Equation (1).
In the equation, x represents measurement data, w represents weight, z represents conversion data, n represents the number of measurement data, p represents the number of characteristics, and m represents the required main component number. The individual data (X) is converted into a new dimension of data (Z) by the weight (W) calculated by the PCA. The core of PCA is to derive a weight (W) in which the transformed data (Z) may have a maximum variance.
PCA is often used when data reduction is required or when outliers are found, as well as when multicollinearity that has an inappropriate effect on data analysis occurs because some variables correlate highly with other variables [
16,
17,
18,
19,
20].
Ship machinery and system-related data comprise a large amount of data comprising multiple factors, and many processes and time are required to analyze data according to the presence or absence of abnormalities in individual factors or the control characteristics. Therefore, PCA with functions such as dimension reduction and abnormal data analysis was utilized for the propulsion engine data analysis.
2.4.2. KNN
KNN is a representative classification algorithm that calculates the distance between existing data adjacent to new data and grasps the characteristics of the nearest K data to determine the type of new data.
In the KNN algorithm, the distance between data is generally defined as Equation (2) using the Euclidean distance, which obtains the shortest distance between two points in all dimensions [
21,
22,
23,
24].
In the equation, d is the distance between the two points p and q.
KNN is excellent in performance and suitable for analysis among data classification techniques and is effective when much learning data exist [
21,
22,
23,
24]. Therefore, KNN was used to analyze propulsion engine data with a large number of different data types and to detect abnormal data.
2.5. Propulsion Engine Data Preprocessing Algorithm
To secure the reliability of data analysis and machine learning results, abnormal data should be absent, and the data classification reflecting the characteristics of the object to be analyzed, such as the engine, should be clarified. In addition, if abnormal data or factors unrelated to the analysis target are included, the machine learning results are derived in a different direction from the goal, so securing the reliability of data preprocessing is necessary. Therefore, a data preprocessing algorithm that considers the propulsion engine control characteristics was constructed, and the collected data were preprocessed and then used for the research.
Propulsion engine data processing algorithms, such as merging data, removing abnormal data, such as null data, and classifying data based on engine control characteristics, are required to build a database for AMS data analysis and machine learning. Therefore, the propulsion engine data preprocessing algorithm was constructed, as shown in
Figure 1.
The files for each data group (LO, EXH, GAS, etc.) acquired for each data extraction section in AMS were merged according to the extraction section. After setting the name of each column (factors), the data acquired by merging each voyage resulted in one data file. Null values and duplicate data that did not exist, such as LO feed rate cylinders, were deleted from the collected data, and engine non-operation and idling operation data were removed, as they were unnecessary for propulsion engine machine learning. In addition, the propulsion engine data preprocessing algorithm was constructed to extract only the engine ahead (forward) operating conditions and to classify data by engine control mode to process the data required for machine learning of the propulsion engine. The data preprocessing algorithm included CPP-related data processing procedures because the propulsion engine used in this study was equipped with a special CPP. Except for CPP-related procedures and the engine control mode setting value, the proposed algorithm can be used as an AMS data preprocessing algorithm for general ships.
2.6. Standard Data Analysis and Abnormal Operation Data Detection Algorithm
A standard data analysis and abnormal operation data detection algorithm were constructed and verified based on the propulsion engine data below.
There were 104 preprocessed propulsion engine data factors (140,176 rows × 104 columns), and it took a long time to determine whether the data for each factor was abnormal or whether the data processing results were appropriate for the engine control mode. Therefore, a standard data analysis algorithm for normal operation conditions comprising PCA and KNN was constructed to determine whether the data were effectively processed through the data preprocessing algorithm and whether they could be used as normal state data (standard data).
The preprocessing data were analyzed by reducing the data dimension through PCA based on Equations (1) and (2). In addition, abnormal data were found among the collected data by setting the standard distance and by calculating the distance between the set standard (K) data(points) using KNN. To verify data validity, such as separation and abnormal data confirmation according to data characteristics, a PCA/KNN standard data processing algorithm was constructed (
Figure 2).
The K value was set through data analysis because the optimal K value varies depending on the data analysis target and data distribution. The function for extracting the abnormal data index was also included in the algorithm so that the abnormal data could be removed by optimizing the distance, which is the criterion for obtaining the outlier value by analyzing the PCA variance ratio and the distance distribution graph between the data.
The abnormal operation data detection algorithm was designed and tested to detect data generated by the engine’s abnormal operation during engine operation.
The ratio of engine operation during the entire voyage was 94% in full-navigation mode and 6% in a manning mode based on the AMS acquisition data. The maneuvering mode had an extremely small operation ratio, which was mainly low-load operating conditions of the engine, making it difficult to establish stable normal driving standards due to the large data change, depending on the driving environment, such as departure (engine cold condition) and arrival. Therefore, an algorithm was constructed to detect abnormal operation data that occurred during operation, considering only full-navigation mode operation data, except for the maneuvering mode in the abnormal operation data detection analysis.
The abnormal operating data detection algorithm also used the PCA/KNN technique shown in
Figure 2 to derive the main component of the preprocessed full-navigation mode normal operation data through PCA and then analyzed the distance between the data according to the K value with KNN. The K and outlier values of the normal operation data were selected to detect abnormal operating data. The collected data were applied to the algorithm to analyze whether abnormal operating data that deviated from the outlier standard existed, and the corresponding index could be extracted when abnormal operating data were detected.