Dynamic Response Prediction Model for Jack-Up Platform Pile Legs Based on Random Forest Algorithm

Cui, Xiaohui; Liu, Hui; Lin, Xiang; Zou, Jiahe; Wang, Yu; Zhou, Bo

doi:10.3390/jmse12101829

Open AccessArticle

Dynamic Response Prediction Model for Jack-Up Platform Pile Legs Based on Random Forest Algorithm

by

Xiaohui Cui

¹

,

Hui Liu

¹

,

Xiang Lin

¹,

Jiahe Zou

¹,

Yu Wang

^1,2 and

Bo Zhou

^1,*

¹

State Key Laboratory of Structural Analysis, Optimization and CAE Software for Industrial Equipment, School of Naval Architecture Engineering, Dalian University of Technology, Dalian 116024, China

²

COSCO Shipping Heavy Industry (Dalian) Co., Ltd., Dalian 116113, China

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2024, 12(10), 1829; https://doi.org/10.3390/jmse12101829

Submission received: 24 September 2024 / Revised: 7 October 2024 / Accepted: 8 October 2024 / Published: 14 October 2024

(This article belongs to the Special Issue Advanced Condition Monitoring and Intelligent Operation & Maintenance Technologies in Ships and Offshore Facilities)

Download

Browse Figures

Versions Notes

Abstract

:

Jack-up offshore platforms are widely used in many fields, and it is of great importance to quickly and accurately predict the dynamic response of platform pile leg structures in real time. The current analytical techniques are founded upon numerical modelling of the platform structure. Although these methods can be used to accurately analyze the dynamic response of the platform, they require a large quantity of computational resources and cannot meet the requirements of real-time prediction. A predictive model for the dynamic response of the pile leg of a jack-up platform based on the random forest algorithm is proposed. Firstly, a pile leg dynamic response database is established based on high-fidelity numerical model simulation calculations. The data are subjected to cleaning and dimensional reduction in order to facilitate the training of the random forest model. Cross-checking and Bayesian optimization algorithms are used for the selection of random forest parameters. The results show that the prediction model is capable of outputting response results for new environmental load inputs within a few milliseconds, and the prediction results remain highly accurate and perform well at extreme values.

Keywords:

machine learning; jack-up platform; random forest; prediction model

1. Introduction

With the advantages of flexible movement, low manufacturing cost, and wide range of water depth, jack-up platforms are widely used in many fields such as marine oil and gas exploration and development, offshore construction, island construction, and so on. As marine resources are increasingly developed, jack-up platforms are undergoing continual advancement for use in deeper waters. The long-term action of wind and wave current loads, complex platform structure, and extreme sea conditions present a significant challenge to the reliability of the platform. It is therefore of paramount importance to predict and evaluate the operational status and safety of jack-up platforms. A jack-up platform generally consists of a main hull and several piling legs that provide support. The pile legs not only have to bear the huge weight of the main hull, but also have to bear the effects of wind, waves, currents, and other environmental loads. The health condition of the pile legs has a very important influence on the safe operation of the whole platform. Therefore, it is necessary to analyze the dynamic response of the pile leg structure under environmental loads, that is, the force condition. Traditional methods for dynamic response analysis are mainly numerical simulations, but they often consume a lot of time and computational resources. The development of artificial intelligence has provided new ideas for the assessment of platform pile leg operational status. The focus of this paper is to use a powerful integrated learning method, random forest, to predict the dynamic response of jack-up platform pile legs under marine environmental loads.

Regarding the numerical simulation of the dynamic response of pile leg structures in jack-up platforms, numerous scholars have conducted research utilizing the equivalent beam theory, demonstrating its accuracy. Bienen et al. [1,2] developed the SOS_3D numerical computation program, which is based on the beam–column formulation to establish the stiffness matrix of jack-up platform. Additionally, the program possesses the capability to conduct three-dimensional dynamic simulations, incorporating factors such as soil–structure interaction and environmental loads. Meanwhile, it is also able to investigate the overturning response of the platform based on the displacement-hardened plasticity theory. Their study demonstrated that reliable results can be obtained for dynamic response analysis and overturning analysis of jack-up platforms based on the equivalent beam theory. Cassidy et al. [3] introduced the constrained NewWave method of stochastic wave theory to consider the spectral content of wave loading based on the equivalent beam theory. A comprehensive consideration of nonlinear factors, including structural behavior, pile–soil interaction, and wave loading, was conducted to investigate the dynamic response of the platform in extreme sea conditions. Cassidy et al. [4] developed an equivalent beam finite element model for jack-up platforms to fully express the load–displacement response of pile boots in the form of combined forces for full 3D dynamic response analysis. The effects of spatial variations and directional wave diffusion on the response of jack-up rigs were also investigated. Pisano et al. [5] developed a 3D continuum model based on equivalent beam theory to capture nonlinear soil–structure interactions in a jack-up installation. The 3D continuum modelling proved to be a reliable method to analyze the stationarity and overall operational performance of jack-up platforms through experiments and numerical simulations. Wang, Y. et al. [6] combined macro-element models with equivalent beam theory and applied it to the structural response analysis of a jack-up platform. The advantage of using the macro-element base model to accurately assess the capacity of the jack-up device under the external loads of ocean wind, waves, and currents was verified. He et al. [7] developed an equivalent model for the whole jack-up platform, which simplified the main hull and pile legs using beam units that represent the geometric and overall stiffness characteristics of the actual structure. All the above studies are based on the equivalent beam theory for simulation modelling of platforms and have been proven to have high accuracy and reliability through experiments and numerical simulations. However, the numerical solution requires relatively large computational resources. As an example, for the high-fidelity model in this study, it takes 3 to 4 min to compute the dynamic response of the pile leg of a jack-up platform under a single environmental load on a 28-core CPU. The numerical solution’s time lag makes it difficult to meet the real-time state detection requirements of the platform.

Structural health monitoring (SHM) has been utilized in numerous studies [8,9] to analyze the operational status of structures and to detect and locate structural damage, in conjunction with operational modal analysis (OMA). Nonetheless, these monitoring techniques are heavily reliant on sensors. The intricate structure of the pile legs of jack-up platforms renders sensor deployment infeasible at certain critical points, and maintaining sensor availability over an extended period in the complex marine environment is challenging. The dependence on sensors limits the application of these methods.

The development of artificial intelligence technology brings new ideas to break the limitations of SHM and the numerical computation solution. A mapping was established between input operating conditions and output dynamic response based on the random forest machine learning algorithm. The trained prediction model can provide fast results within milliseconds. This will help to ensure the reliability of jack-up platform operations, improve economic efficiency, and safeguard the lives of the personnel working on the platform.

A prediction model based on a jack-up platform simulation calculation database and combining intelligent optimization algorithms was developed to reflect the platform force field quickly. Firstly, a high-fidelity simulation model of the jack-up platform was established, and the dynamic analysis of the platform under different combined loads of wind, waves, and currents was calculated to establish a database of the dynamic response of the platform. Bayesian optimization and random forest algorithms were applied to train the prediction model based on the database to predict the response of the jack-up platform under specific loads. Lastly, the impact of various sample sizes and train/test ratios on the performance of the training model was assessed and compared. The prediction model gives a highly accurate prediction of the dynamic response of the pile leg within a few milliseconds and is also highly accurate at extreme values. The framework of the prediction model is shown in Figure 1.

The remainder of this paper is organized as follows: Section 2 introduces the algorithms used in this article, including principal component analysis, the random forest algorithm, and the Bayesian optimization algorithm. In Section 3, the database is constructed, and the parameters of the algorithm are adjusted to train the prediction model. Section 4 evaluates the performance of the prediction model, and the effects of the number of samples in the training set and the train/test ratio on the accuracy of the prediction model are compared. Section 5 summarizes the whole study.

2. Methods

2.1. Data Dimension Reduction

Predictive models necessitate the analysis of datasets and the utilization of identified patterns to forecast actions that are not yet known. The presence of noise and redundant features within extensive datasets can adversely impact the predictive outcomes of the model [10], thereby necessitating a reduction in a substantial number of features within the dataset to ascertain the pivotal features of the event. This helps to improve the accuracy of model prediction, while shortening the computation time and saving computational resources. According to the transformation method, dimensionality reduction methods can be classified into linear dimensionality reduction and nonlinear dimensionality reduction. In contrast to nonlinear dimensionality reduction, which can capture the complex inner laws of data, principal component analysis (PCA), which is a representative of linear dimensionality reduction methods, has powerful dimensionality reduction capabilities. PCA is an orthogonal linear transformation method. It does not require labelled data, has a wide range of application scenarios, and can substantially improve computational efficiency.

In this study, the PCA methodology is employed to reduce the dimensionality of the dataset comprising the simulation results for each operational condition. It provides a characterization of dominant correlated activity using the mathematical algorithm that underlies singular value decomposition (SVD) [11,12]. The principle of the PCA algorithm is to assume that the original n d-dimensional samples {

x_{1}, x_{2}, \dots, x_{n}

} are projected onto the new coordinates with k principal components, such that the new coordinate system becomes {

α_{1}, α_{2}, \dots, α_{n}

}. That is, after the coordinate transformation

{\bar{x}}_{i j} = α_{j}^{T} x_{i}

, the projection of the sample is transformed to {

{\bar{x}}_{1}, {\bar{x}}_{2}, \dots, {\bar{x}}_{n}

}, and

{\bar{x}}_{i}

is a k-dimensional column vector.

In order to keep the most important characteristics in the original sample and remove redundant information, it should be ensured that the distance from the sample points to the projection hyperplane is close enough, that is, the reconstruction error is small enough, i.e.:

\min {‖X - \bar{X}‖}^{2}

(1)

where

X

is the original sample and

\bar{X}

is the sample after dimensionality reduction.

2.2. Abnormal Value Detection

Abnormal values typically represent data points that deviate significantly from others in terms of their numerical values, distribution patterns, or trends. The existence of abnormal values affects the distribution and statistical characteristics of the whole dataset, and has a great impact on the results of data analysis, so abnormal detection is an important part of data preprocessing [13]. Abnormal detection involves the unsupervised or weakly supervised classification of the dataset based on specific criteria, with the aim of isolating outliers, noise, and other anomalous points from the entire dataset. The abnormal detection methods used in this paper are One-Class Support Vector Machine and Isolation Forest.

One-Class Support Vector Machine (OCSVM) is a support vector machine-based algorithm that solves the novelty detection problem by constructing an optimal separation hyperplane [14]. OCSVM tries to find an optimal hyperplane in the high-dimensional feature space that makes the image of the training samples have the largest boundary with the origin. The core idea is to treat normal samples as belonging to a dense distribution that surrounds or excludes abnormal samples. By constructing a hyperplane, sample points outside the hyperplane can be defined as abnormal samples. The algorithm is able to handle many types of outliers such as abnormal distributions and noise. It is also able to identify under the condition of a small quantity of anomalous data and has wide applicability.

The Isolation Forest approach focuses on few but different outliers [15]. The fundamental idea is to use the isolation property of data points for outlier detection. Firstly, the isolation tree is constructed by randomly selecting sub-sampling sets from across the whole dataset, and the instances are recursively partitioned. The number of times an instance is partitioned is called the path length, which is a key indicator for evaluating the degree of anomalies. The shorter the path length, the easier it is to be isolated, and thus the more likely it is to be an abnormal value. Isolated forest algorithms are suitable for handling large datasets and can be computed in parallel, making them very efficient.

2.3. Random Forest

Random forest is a bagging algorithm constructed with a decision tree as the base learner. Decision tree analyzes the problem by making decisions through a tree structure. Starting from the root node, the sample attributes are tested from top to bottom and samples are assigned into child nodes based on different results. Continuously looping this process, it eventually reaches the leaf node, indicating its result. Decision trees are classified as classification trees and regression trees, which make decisions on discrete and continuous variables, respectively [16].

The random forest algorithm is a member of the ensemble algorithm [17]. The term ‘forest’ refers to the integration of multiple decision trees, while ‘random’ signifies that a subset of samples is randomly selected from the original dataset for each tree. The given dataset is

D = {X_{i}, Y_{i}}

; on this dataset, m decision trees

{g (D, θ_{j}), j = 1,2 \dots, m}

are used as the base classifiers, and the random forest classifier is obtained after integrated learning. When the samples to be analyzed are input, the classification results of each decision tree are majority voted to decide the classification results of the random forest. And for the regression problem, the output of the random forest is the mean of the results of each decision tree [18]:

\hat{y} = \frac{1}{m} \sum_{i = 1}^{m} y_{i} (x)

(2)

where

\hat{y}

is the output of the random forest algorithm, m is the number of decision trees, and

y_{i} (x)

is the output of the ith tree. The flow of the random forest algorithm is shown in Figure 2.

The fitness of the digital twin model was evaluated using the R² determination coefficient, which quantifies the proportion of the total sum of squares of deviations that is accounted for by the regression sum of squares, thereby indicating the density of clustering of the test sample data points around the regression line:

R_{y}^{2} = \frac{S S A}{A A T} = \frac{\sum_{i = 1}^{n} {({\hat{y}}_{i} - \bar{y})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}

(3)

where the true values are

y_{1}, y_{2}, \dots, y_{n}

, the predicted values are

{\hat{y}}_{1}, {\hat{y}}_{2}, \dots, {\hat{y}}_{n}

, and

\bar{y}

is the average of all the true values. The R² determination coefficient represents the fraction of total variability that is explained by the model, and a value closer to 1 signifies a better fit of the model to the data.

2.4. Bayesian Optimization Algorithm

Bayesian optimization constitutes an efficient global optimization algorithm capable of achieving satisfactory solutions at minimal evaluation costs, [19], and it is extensively utilized in addressing problems including algorithm selection and sensor placement, while effectively balancing exploration and exploitation. Bayesian optimization is used here to find the optimal parameters of the random forest algorithm. Here Bayesian optimization algorithm is used to select parameters such as the number of decision trees, maximum depth of decision trees, minimum number of samples, minimum weighted score, etc., in the random forest algorithm.

The Bayesian optimization algorithm comprises two primary components: the probabilistic surrogate model and the acquisition function. The probabilistic surrogate model approximates the true objective function and selects the next evaluation point by maximizing the acquisition function. Bayesian tuning uses a Gaussian process, which considers previous parameter information, constantly updates the prior and updates the posterior distribution of the objective function by constantly adding sample points. This gives the advantages of relatively fewer iterations and faster speed [20].

Bayesian optimization algorithms usually consider the optimization problem as a maximization problem of the objective function, i.e.:

x^{'} = {a r g m a x}_{x \in χ \subseteq R^{n}} f (x)

(4)

where

x

is an n-dimensional decision vector, which represents the random forest algorithm parameter vector in this problem; χ denotes the decision space; f denotes the objective function, which represents the evaluation index of the random forest algorithm in this problem.

3. Database Construction and Model Training

3.1. Database Construction

The main hull of the target jack-up platform is a box-type structure, and the platform is supported by four truss pile legs, which are composed of main chords, diagonal braces, and internal braces, with a triangular cross-section form and distributed at the four corners of the platform. Figure 3 illustrates the pile leg structure of the target jack-up platform. The platform is a steel, non-self-propelled structure equipped with cranes and other necessary equipment, primarily designed for exploration tasks, with an operational water depth ranging from 65 m to 45 m. In the pile leg part, the inner brace and diagonal brace are standard circular pipe fittings, modelled according to the real size. The main chord is a structure of two semi-circular tubes clamped with racks, and the main chord is modelled as a circular equivalent tube in the modelling. The sectional area, torsional moment of inertia and bending moment of inertia of the equivalent tubes are calculated according to the actual cross section. The main hull is simulated by using the beam unit plate frame model, and the bulkheads of the compartments are simplified into box-type beam units, with the width of the box beams equal to half of the width of the compartments and the height equal to the height of the compartments.

In the lifting device part, the upper and lower guides and rack and pinion are modelled equivalently, as shown in Figure 4. The upper and lower guides are modelled in a simplified way, and the degrees of freedom for vertical translation and three-direction rotation of the nodes connecting the guides to the pile legs are released. Only the horizontal force is transmitted at the guide. To simplify the modelling of the rack and pinion, two degrees of freedom are released in the horizontal plane of the node at the rack and pinion contact and three degrees of freedom in the direction of rotation. Only vertical forces are transmitted at the rack and pinion

The working conditions and loads that affect the dynamic response of the platform mainly include the height of the jack-up platform, wind speed, wind direction, wave height, period, current speed, and current direction. In order to ensure that the digital twin model can give relatively stable prediction results under different working environments, the input wind, wave, and current loads for numerical calculations should be made to cover most of the working conditions that may occur in the work as much as possible. According to the environment of the workplace, the most dangerous loads are selected as wind speed 36 m/s, wave height 5 m, period 6 s, water surface flow velocity 0.77 m/s, and mud surface flow velocity 0.26 m/s. The loads of other working conditions cover a wide range of possibilities as far as possible. A total of 1680 combinations of working conditions were selected for calculation, and the specific selection of working conditions is shown in Figure 5.

Under each working condition, the forces and moments applied to the ends of each rod within the pile leg structure are calculated. The forces and moments applied to both ends of each rod of the platform under each combination of working conditions are extracted and exported to a TXT file named with the working condition name in a uniform format of position, rod type, and force to form a database for subsequent processing and utilization of the data. A part of the database is shown in Table 1.

In Table 1, the first column is the number of a rod in the pile leg structure, which is represented by the points connected at both ends of the rod; the second column is the node name; columns 2 to 8 show, respectively, the forces in three directions and the degrees of freedom in three directions of this node of the member in the local coordinate system.

3.2. Data Preprocessing

To avoid inaccurate analyses and unreliable decisions, it is crucial to perform operations for detecting and correcting dirty data prior to their utilization [21]. The One-Class Support Vector Machine and Isolation Forest methods were employed to identify data points in the dataset that deviated from the majority in terms of value, trend, and distribution, thereby tagging them as outliers and subsequently removing both the data points and their corresponding labels.

The resultant dynamic response files of the platform contain extensive data on the forces and moments at the ends of each bar, which are highly correlated. To avoid introducing redundant information and excessive noise into the model, dimensionality reduction of the data files is necessary. The dimensionality reduction process inevitably results in loss of information. The reconstruction error represents the discrepancy between the low-dimensional and original high-dimensional representations when the former is transformed back to the latter, serving as a crucial metric for evaluating the effectiveness of dimensionality reduction in PCA. The reconstruction error was calculated cyclically for the principal components from 1 to 100, and the results obtained are shown in Figure 6.

From Figure 6, it can be seen that after the principal component fraction reaches 10, the loss of information is small enough and the contribution of the increase in the principal component fraction to the reconstruction reduction error becomes very small, which leads to the selection of 10 as the principal component fraction after dimensionality reduction.

3.3. Training Models

The machine learning algorithm was implemented using Python3.12 and its open source machine learning framework scikit-learn1, etc. The experimental environment was based on the Windows 10 operating system. The dataset after dimensionality reduction is randomly partitioned into training set and testing set, and the training set ratio is given as 0.8, which means the ratio of training set to testing set is 8:2. To avoid the interference of random division of dataset on the accuracy of model training, the random state is given as 20.

When the model overly captures specific details of the training set, the random forest algorithm may overfit, making it unable to predict situations outside of the training dataset. However, when the model is too simple, it can lead to underfitting occurring, and the model not accurately capture the relationships present in the training data. Therefore, it is important to make a reasonable choice of parameters, such as the depth of the decision tree and the maximum number of leaf nodes in the random forest algorithm.

The Bayesian optimization algorithm is used to parameterize the random forest, and 10-fold cross validation is used to test the performance of the data in the training set under a certain combination of parameters. As shown in Figure 7, 10-fold cross validation starts by dividing the dataset D into 10 similarly sized and consistently distributed subsets: D₁, D₂, …, D₁₀. Nine of them are taken as the training set each time, and one is taken as the test set for the computation, so that ten groups of training/testing sets are obtained, and ten training and testing cycles can be carried out. Finally, after 10 cycles of training, the average value is taken as the final output [14].

The objective function for Bayesian optimization is chosen as the mean square error (MSE):

M S E = \frac{1}{n} \sum_{i = 1}^{n} {({\hat{y}}_{i} - y_{i})}^{2}

(5)

where

{\hat{y}}_{i}

is the ith predicted value,

y_{i}

is the true value, and

n

is the number of data.

The MSE captures the difference between the predicted and true values. Cross-validation takes into account the nature of the metrics and treats all errors as losses. Although the mean square error (MSE) is inherently positive, it is the negative MSE that is used in the calculations. That is to say, the larger the NMSE, the smaller the difference between the predicted value and the criterion value, and so maximizing the objective function is chosen as the goal of the Bayesian optimization.

A total of 200 iterations were performed and a five-step random search was executed to expand the exploration space. The value space of each parameter and the optimal parameter selection of FR after iterations are shown in Table 2.

The RandomForestRegressor class from the ensemble module in the scikit-learn library is used to fit the data in the training set, with parameter configurations as shown in Table 2.

4. Model Evaluation

4.1. Performance of the Prediction Model

The use of relative error δ in engineering problems provides a measure of the confidence level associated with the data. Due to the mechanical nature of the pile leg structure of the jack-up platform, some of the data values are zero, and it is necessary to replace zero with a very small value when calculating the relative error. However, subsequent to this operation, the relative error of these zero-valued points can easily reach 100%, prompting the introduction of a weighting factor when calculating the relative error. Min-max normalization of the absolute value of each condition in the sample data

y^{'} = \frac{y - m i n (y)}{\max (y) - m i n (y)}

(6)

where the maximum and minimum values are for each column of data in the response variable and the physical meaning is a force or moment in one direction. Thus, the weighted relative error is as follows:

δ_{ω} = y^{'} \times δ

(7)

where

y^{'}

is the weight and

δ

is the absolute error.

In pile leg structures, the greater the force applied, the greater the likelihood of damage, the more dangerous, and thus the most important place to focus on in the prediction of the dynamic response. Using weighted relative error to measure the accuracy of the prediction model not only eliminates the effect of the 0 value in the response variable, but also conforms to the laws of physics and can increase the focus on the dangerous points.

The trained model fits well, with an R² determination coefficient of 0.935 on the test set. This indicates that the model has high prediction accuracy on the test set and captures the patterns and trends of the data well. The mean value of the weighted relative error for all data points is 0.413%, which is much less than 1%; the maximum weighted relative error is 5.277%, which is much less than 10%, satisfying the accuracy requirement.

To further analyze the capability of the prediction model, including its prediction accuracy at extreme values, a scatterplot was generated for condition 901, selected from the test set. In condition 901 the wind direction and wave direction are the same, in this condition the platform is more dangerous, the scatter plot is shown in Figure 8.

The horizontal coordinates in Figure 8 are the indexes of the data points, the red circles represent the calculated true values, and the blue asterisks represent the predicted values. It can be noticed that all the points almost coincide. Especially at extreme values, the model can also accurately predict the force state, which indicates the high availability of the predicted data. It is proved that the model is able to predict the force condition of the pile leg structure under a certain working condition, bringing it in line with the problem to be solved in this paper.

4.2. Importance of Features

Evaluating the importance of the features of a sample is a feature that comes with the random forest algorithm, which calculates the average of how much contribution each feature makes to each tree and determines the importance of the feature by comparison. The calculation of the contribution is based on mean decrease impurity (MDI) and mean decrease accuracy (MDA) [22]. The scikit-learn machine learning library used in this paper is based on the MDI method to compare the importance of features.

The importance of features in the prediction model was evaluated and ranked as shown in Figure 9.

As shown in Figure 9, the most influential feature for the dynamic response of the jack-up platform is the height of the main hull of the platform, with an importance score of 0.432, which is much higher than the remaining five features. The main hull weighs a great deal, and these weights are carried by the rods at locations below the main hull, so the height of the main hull affects the stresses on the entire pile leg structure. Among the remaining five features, the direction of wind loads has the highest importance score of 0.295, which affects the prediction results much more than the remaining four features. The main hull of the target jack-up platform has a huge wind area and the four pile leg structures are complex. Among all the environmental loads considered, the force of wind load on the whole structure is much larger compared to the other loads. The aspect ratio of the main hull is as high as 1.8, which makes the dynamic response of the platform very sensitive to the direction of the wind load. The feature of wind direction makes the second largest contribution in the model prediction process, which is in line with the characteristics of the jack-up platform structure.

4.3. Effect of Sample Size of Dataset on Precision

The impact of varying sample sizes on the accuracy of model prediction outcomes was investigated; 1680, 1512, and 1344 sample points were randomly selected for training and testing, respectively. The samples were allocated as 80% training set and 20% testing set. Scatter plots depicting the prediction results for the three scenarios, with the true values on the horizontal axis and the predicted values on the vertical axis, are presented in Figure 10, Figure 11 and Figure 12.

As shown in Figure 10, Figure 11 and Figure 12, when the model is trained with 1680 samples, the points in the graph are all closely distributed near the y = x straight line and the weighted relative errors of all the points are within 10%, which is the best prediction for each data point among the three models. When the sample size is decreased to 1512, it is observable that some points deviate further from the y = x line, and a minority of points exhibit weighted relative errors exceeding 10%. Upon further reduction in sample size to 1344, it becomes apparent that the distribution of data points becomes more dispersed, with an increased number of points exhibiting weighted relative errors exceeding 10%. This indicates that training the prediction model with 1680 samples yields the most accurate prediction results. As the number of sample points decreases, the prediction results become worse.

In order to further compare the effects of different numbers of sample points on the accuracy of the prediction model, the average values of the weighted relative errors for all conditions under the three sample numbers were calculated, as shown in Table 3.

As can be seen from Table 3, the overall error rises as the number of samples decreases, and the prediction effect of the model decreases sharply when the quantity of data is reduced to 1344. This may be attributed to the uniform distribution of environmental loads considered in the simulation calculations of the pile leg model force applied to the jack-up platform. When the sample points are randomly selected, this random distribution is destroyed, resulting in a very sparse distribution of sample points at some environmental loads. This non-uniform distribution can make it difficult for the prediction model to accurately capture the distributional trends and details of the data, resulting in a reduced ability to fit the whole. Notably, when the sample size is decreased to 1344, this uneven distribution is further exacerbated, resulting in a significant decline in the model’s fitting ability.

4.4. Effect of Training/Testing Ratio on Precision

The training-to-testing ratios for the random forest algorithm were set to 60:40, 70:30, 80:20, and 90:10, respectively, with all other parameters remaining unchanged to train the models and compare their accuracies. For each condition with different training-to-testing ratios, the maximum weighted relative error and the average weighted relative error were calculated, and a heat map was plotted, as shown in Figure 13.

In order to more fully compare the effect of the train/test ratio on the predictive model, the maximum values of the maximum weighted relative error and the mean weighted relative error were compared, as shown in Table 4.

As can be seen from Table 4, the training/testing ratio has a significant effect on the accuracy of the prediction model. As the train/test ratio keeps increasing, the error of the prediction model keeps decreasing and the fit keeps improving. When the training ratio is 60%, the maximum weighted relative error over all samples is 11.375%, which is more than 10%, and the model predicts poorly. When the training ratio was increased to 70%, the average weighted error was reduced by 0.021% and the maximum weighted relative error was within the acceptable range. When the training ratio was further increased, to either 80% or 90%, the average weighted error was reduced by 0.019% and 0.104%, respectively. It can be seen that the accuracy of the predictive model continues to improve as the train/test ratio continues to increase. Training the model using a training set with a percentage of 70% or more makes it possible to obtain a more accurate model.

When the proportion of the training set was increased from 70% to 80%, the maximum weighted relative error decreased by 1.932%, and the model’s ability to fit extreme values was enhanced. Further, when the proportion of training data was increased to 90%, the model’s ability to fit both global and extreme points improved. However, excessive training data may lead to overfitting, thereby reducing the predictive model’s generalization ability. Furthermore, 10% of the testing set is insufficient for the goal of evaluating the model’s ability to fit in the global context, resulting in low confidence. Therefore, after considering the model’s fitting ability and generalization ability, it is more appropriate to choose 80% of the data as the training set for the target prediction model.

5. Conclusions

In this paper, a dynamic response prediction model based on simulation database and random forest algorithm is established for the safety prediction and damage prevention of pile leg structure of jack-up platform. After constructing the simulation database, the machine learning algorithm is employed to derive the predicted response value under the prevailing environmental load, which is then compared with the permissible material strength. Upon detection of a predicted value exceeding the threshold, the system triggers an early warning, prompting immediate inspection of the structure to identify and rectify issues promptly, thereby fulfilling the objective of early warning. The relevant conclusions are summarized as follows:

A predictive model for the dynamic response of the pile leg of a jack-up platform based on the random forest algorithm was developed. The R² determination coefficient of the model is 0.935, which provides a good description and prediction of the data with a high fitting performance. Comparing the predicted values of the model and the accurate values obtained from numerical calculation, the prediction model also has good prediction performance at extreme points and has high credibility in prediction.
The importance of each feature for the prediction model was studied. The height of the main hull was found to have the greatest influence on the prediction results, due to the fact that the huge main hull weight is supported by the pile legs located below the hull. The direction of the wind load had the second largest effect on the prediction results, which was due to the fact that the platform receives a large wind force and is very sensitive to the wind direction.
The effect of different sample sizes on the prediction accuracy of the model was studied. As the number of sample points decreases, the fitting performance of the model gradually deteriorates and the accuracy decreases. This is due to the fact that randomly drawing new samples from the original samples leads to the sample points becoming sparse in some places and the model cannot capture the characteristics of the data well.
The effect of the training-to-testing ratio on the model’s fitting ability was investigated. As the proportion of the training set rises, the predictive ability of the model gradually increases. After considering the factors of saving training resources of the model and avoiding the occurrence of overfitting, using too high a proportion of the training set should be avoided. For this prediction model, it is more appropriate to choose 80% of the data as the training set.

The findings of this study offer insights into the framework of the prediction model for pile leg structural forces in jack-up platforms, encompassing database establishment, algorithm selection, and parameter tuning. The method is capable of predicting the forces acting on the pile legs structures based on an established database. The prediction model is very responsive and is able to derive the structural stresses for the input condition loads within a few milliseconds. The predictions are globally accurate and reliable at the extremes of the forces. This prediction model can be combined with techniques such as digital twins and SHM to obtain a better information about the structural forces.

Author Contributions

Conceptualization, X.C. and B.Z.; methodology, B.Z. and H.L.; software, X.C.; validation, J.Z. and X.L.; formal analysis, J.Z. and Y.W.; investigation, Y.W. and X.L.; data curation, X.C.; writing—original draft preparation, X.C.; writing—review and editing, B.Z. and H.L.; validation, Y.W.; supervision, H.L.; project administration, B.Z.; funding acquisition, B.Z. and H.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by that National Natural Science Foundation of China (Grant No. 52071059, 52192692, 52061135107); Dalian Innovation Research Team in Key Areas (Grant No. 2020RT03); China Postdoctoral Science Foundation (Grant No. 2023TQ0041, 2023M7404771); Postdoctoral Fellowship Program of CPSF (Grant No. GZC20230347); and the Belt and Road Special Foundation of The National Key Laboratory of Water Disaster Prevention (No. 2022490211).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data and models used during this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

Author Yu Wang was employed by the company COSCO Shipping Heavy Industry (Dalian) Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Bienen, B.; Cassidy, M. Advances in the three-dimensional fluid–structure–soil interaction analysis of offshore jack-up structures. Mar. Struct. 2006, 19, 110–140. [Google Scholar] [CrossRef]
Bienen, B.; Cassidy, M.J.; Gaudin, C. Push-Over Response of a Jack-Up on Sand of Different Relative Densities. In Proceedings of the International Conference on Offshore Mechanics and Arctic Engineering, Honolulu, HI, USA, 31 May–5 June 2009; pp. 211–219. [Google Scholar]
Cassidy, M.; Taylor, R.E.; Houlsby, G. Analysis of jack-up units using a constrained NewWave methodology. Appl. Ocean Res. 2001, 23, 221–234. [Google Scholar] [CrossRef]
Cassidy, M.J. Offshore foundation systems for resource recovery: Assessing the three-dimensional response of jack-up platforms. KSCE J. Civ. Eng. 2011, 15, 623–634. [Google Scholar] [CrossRef]
Pisanio, F.; Schipper, R.; Schreppers, G.J. Input of fully 3D FE soil-structure modelling to the operational analysis of jack-up structures. Mar. Struct. 2018, 63, 269–288. [Google Scholar] [CrossRef]
Wang, Y.; Cassidy, M.; Bienen, B. Numerical pushover analysis of jack-up units in soft clay overlying sand. Ocean Eng. 2022, 258, 111762. [Google Scholar] [CrossRef]
He, L.; Xie, Y.; Du, Y.; Liu, Y.; Liu, M. Simplified mechanical calculation model for an entire jack-up platform system. Ocean Eng. 2024, 295, 116925. [Google Scholar] [CrossRef]
Zhang, P.; He, Z.; Cui, C.; Xu, C.; Ren, L. An edge-computing framework for operational modal analysis of offshore wind-turbine tower. Ocean Eng. 2023, 287, 115720. [Google Scholar] [CrossRef]
Nicoletti, V.; Arezzo, D.; Carbonari, S.; Gara, F. Dynamic monitoring of buildings as a diagnostic tool during construction phases. J. Build. Eng. 2022, 46, 103764. [Google Scholar] [CrossRef]
Bin, Z. Research on the Application of Machine Learning Classification Based on Data Reduction. Mod. Inf. Technol. 2018, 2, 2. [Google Scholar]
Bro, R.; Smilde, A.K. Principal component analysis. Anal. Methods 2014, 6, 2812–2831. [Google Scholar] [CrossRef]
Dorabiala, O.; Aravkin, A.; Kutz, J.N. Ensemble Principal Component Analysis. IEEE Access 2024, 12, 6663–6671. [Google Scholar] [CrossRef]
Smiti, A. A critical overview of outlier detection methods. Comput. Sci. Rev. 2020, 38, 100306. [Google Scholar] [CrossRef]
Schlkopf, B.; Williamson, R.C.; Smola, A.J.; Shawe-Taylor, J.; Platt, J.C. Support Vector Method for Novelty Detection. In Proceedings of the Advances in Neural Information Processing Systems 12, NIPS Conference, Denver, CO, USA, 29 November–4 December 1999. [Google Scholar]
Liu, F.T.; Ting, K.M.; Zhou, Z.H. Isolation-Based Anomaly Detection. ACM Trans. Knowl. Discov. Data 2012, 6, 1–39. [Google Scholar] [CrossRef]
Bing, Y.X.; Jun, Z. Decision Tree and Its Key Techniques. Comput. Technol. Dev. 2007, 17, 3. [Google Scholar]
Breiman, L. Random forests. Mach Learn 2001, 45, 5–32. [Google Scholar] [CrossRef]
Xu, X.Y.; Li, B.X. Research on the Effect of Selection of Dependent Variables on R² Statistic. J. Taiyuan Univ. Sci. Technol. 2007, 28, 363–365. [Google Scholar]
Cui, J.-X.; Bo, Y. Survey on Bayesian Optimization Methodology and Applications. J. Softw. 2018, 29, 23. [Google Scholar]
Ryan, E.G.; Drovandi, C.C.; Mcgree, J.M.; Pettitt, A.N. A Review of Modern Computational Algorithms for Bayesian Optimal Design. Int. Stat. Rev. 2016, 84, 128–154. [Google Scholar] [CrossRef]
Chu, X.; Ilyas, I.F.; Krishnan, S.; Wang, J. Data Cleaning: Overview and Emerging Challenges. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD ′16), San Francisco, CA, USA, 26 June–1 July 2016. [Google Scholar]
Gregorutti, B.; Michel, B.; Saint-Pierre, P. Correlation and variable importance in random forests. Stat. Comput. 2017, 27, 659–678. [Google Scholar] [CrossRef]

Figure 1. Predictive modelling framework.

Figure 2. Random forest flow.

Figure 3. Schematic diagram of pile leg structure.

Figure 4. Schematic diagram of the platform and detail view of the lifting mechanism.

Figure 5. Selection of working loads.

Figure 6. Reconstruction error of the data when the master composition score goes from 1 to 100.

Figure 7. Tenfold cross-checking process.

Figure 8. Comparison of true and predicted values for condition 901.

Figure 9. The score for the importance of the feature.

Figure 10. Scatterplot of prediction results at 1680 samples.

Figure 11. Scatterplot of prediction results at 1512 samples.

Figure 12. Scatterplot of prediction results at 1344 samples.

Figure 13. Heat map of maximum weighted relative error and average weighted relative error for each working condition at different training/testing scales.

Table 1. Part of the dataset.

Member	Joint	Fx (KN)	Fy (KN)	Fz (KN)	Mx (KN·m)	My (KN·m)	Mz (KN·m)
0113–0104	0113	−5.6071	0.0000	0.0000	0.0000	0.0000	0.0000
0113–0104	0104	−5.6066	0.0000	0.0000	0.0000	0.0002	−0.0002
……
0019–0118	0118	−150.1941	0.0000	0.0000	0.0000	0.0000	0.0000
0023–0121	0023	−556.2784	0.0000	0.0000	0.0000	0.0002	0.0002
0023–0121	0121	−556.2789	0.0000	0.0000	0.0000	0.0000	0.0000
0026–0125	0026	−71.8181	0.0000	0.0000	0.0000	0.0002	0.0002
0026–0125	0125	−71.8186	0.0000	0.0000	0.0000	0.0000	0.0000
0021–0117	0021	285.7962	0.0000	0.0000	0.0000	0.0002	0.0002
0021–0117	0117	285.7957	0.0000	0.0000	0.0000	0.0000	0.0000
0022–0122	0022	−119.4007	0.0000	0.0000	0.0000	0.0002	0.0002
0022–0122	0122	−119.4012	0.0000	0.0000	0.0000	0.0000	0.0000
0124–0025	0124	−150.1941	0.0000	0.0000	0.0000	0.0000	0.0000
……
4334-4303	4303	−21.4896	0.4720	1.3457	−0.1865	3.5807	0.8687
4335-4203	4335	244.2713	0.3376	0.2026	−0.0906	1.8755	−0.3908

Table 2. Random forest algorithm parameter value ranges and optimal parameters after Bayesian optimization.

Parameter	Value Ranges	Optimum Parameter
Max depth	(5, 100)	30
Max features	(1, 10)	5
Max leaf nodes	(50, 400)	300
Min samples leaf	(1, 20)	1
Min samples split	(2, 20)	2
n estimators	(50, 400)	200

Table 3. Mean of weighted relative errors for training models with different sample sizes.

Sample Size	Mean of Weighted Relative Errors
1680	0.41%
1512	0.77%
1344	1.77%

Table 4. Maximum values of maximum weighted relative error and mean weighted relative error for different train/test ratios.

Training/Testing Ratio	Items for Comparison	Value
6:4	Maximum weighted relative error	11.375%
6:4	Mean weighted relative error	0.445%
7:3	Maximum weighted relative error	7.209%
7:3	Mean weighted relative error	0.424%
8:2	Maximum weighted relative error	5.277%
8:2	Mean weighted relative error	0.413%
9:1	Maximum weighted relative error	4.399%
9:1	Mean weighted relative error	0.306%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cui, X.; Liu, H.; Lin, X.; Zou, J.; Wang, Y.; Zhou, B. Dynamic Response Prediction Model for Jack-Up Platform Pile Legs Based on Random Forest Algorithm. J. Mar. Sci. Eng. 2024, 12, 1829. https://doi.org/10.3390/jmse12101829

AMA Style

Cui X, Liu H, Lin X, Zou J, Wang Y, Zhou B. Dynamic Response Prediction Model for Jack-Up Platform Pile Legs Based on Random Forest Algorithm. Journal of Marine Science and Engineering. 2024; 12(10):1829. https://doi.org/10.3390/jmse12101829

Chicago/Turabian Style

Cui, Xiaohui, Hui Liu, Xiang Lin, Jiahe Zou, Yu Wang, and Bo Zhou. 2024. "Dynamic Response Prediction Model for Jack-Up Platform Pile Legs Based on Random Forest Algorithm" Journal of Marine Science and Engineering 12, no. 10: 1829. https://doi.org/10.3390/jmse12101829

APA Style

Cui, X., Liu, H., Lin, X., Zou, J., Wang, Y., & Zhou, B. (2024). Dynamic Response Prediction Model for Jack-Up Platform Pile Legs Based on Random Forest Algorithm. Journal of Marine Science and Engineering, 12(10), 1829. https://doi.org/10.3390/jmse12101829

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Dynamic Response Prediction Model for Jack-Up Platform Pile Legs Based on Random Forest Algorithm

Abstract

1. Introduction

2. Methods

2.1. Data Dimension Reduction

2.2. Abnormal Value Detection

2.3. Random Forest

2.4. Bayesian Optimization Algorithm

3. Database Construction and Model Training

3.1. Database Construction

3.2. Data Preprocessing

3.3. Training Models

4. Model Evaluation

4.1. Performance of the Prediction Model

4.2. Importance of Features

4.3. Effect of Sample Size of Dataset on Precision

4.4. Effect of Training/Testing Ratio on Precision

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI