1. Introduction
The oceans cover 71% of the Earth’s surface, and wave energy is a major part of ocean energy, with many advantages, including large reserves, high power density, and low pollution. In the 20th century, the large-scale use of fossil fuels, such as coal and oil, has led to global warming and the depletion of natural resources. Consequently, renewable energy, such as wave power, has come into focus [
1]. The global potential output of ocean wave energy is calculated to be 337 gigawatts [
2], with an energy density several times higher than traditional green energies such as solar and wind power [
3]. Against this backdrop, interest in harnessing electricity from waves using wave energy converters has surged. Wave energy has become a formidable contender among renewable energies, playing a crucial role in the industrial development of coastal cities. However, weather, seasons, sea surface temperature, and atmospheric pressure affect the output power of wave energy devices, leading to instability. The resulting randomness and intermittency pose challenges to the stable operation of the power grid [
4,
5]. Therefore, accurate power output prediction is of great significance for ensuring the safe and stable operation of wave energy devices once integrated into the grid.
Currently, researchers have conducted extensive studies on the prediction of wave energy converter, with the primary technical approaches categorized into three main types [
6]: physical modeling [
7], statistical methods [
1,
8,
9,
10], and machine learning techniques [
6,
11,
12,
13]. Among them, physical modeling methods rely on validated physical models to establish mathematical models without depending on extensive historical data, making them suitable for large-scale wave energy predictions. For example, Zheng et al. [
14] developed a wave forecast product for Chinese seas based on the WAVEWATCH-III (WW3) physical model. Similarly, Shao et al. [
15] developed a hexagonal array wave energy converter product based on wave power and predicted wave power generation efficiency using physical models.
Statistical methods use parameter estimation and historical data curve fitting to establish predictive models, offering robust, versatile, and fast modeling techniques. Wu et al. [
16] employed an Auto Regressive Integrated Moving Average (ARIMA) model to predict wave heights. However, the effectiveness of statistical methods is closely related to the quality of historical data. During the actual sampling process, some of the historical wave data may be missing or incorrect, resulting in significant degradation of the model performance. For this purpose, Discrete Wavelet Transform (DWT) is combined with ARMA to predict the true output power.
With advances in computer science and hardware, machine learning techniques are increasingly recognized for their advantages in renewable energy prediction. Forms of artificial intelligence and machine learning methods can be autonomous in learning the association between input features and output results [
17], Accurate predictions can be achieved by using deep learning methods on target objects. Yan et al. [
5,
18] reduced network energy consumption and minimized error through behavioral critique and deep reinforcement learning. These methods possess strong resilience to interference and high prediction accuracy and are now widely applied in solar [
19] and wind energy [
18,
20] forecasting. The development of ocean observation technology has led to the availability of abundant datasets of sea waves and oceanic meteorological data, providing a foundation for machine learning-based wave energy prediction. For instance, Elbisy et al. [
21] used a Support Vector Machine (SVM) optimized by the Licorne algorithm to predict wave parameters, demonstrating strong generalization ability and minimal prediction error. Feng et al. [
16] compared the performance of Recurrent Neural Networks (RNNs), Gated Recurrent Unit (GRU) networks, and Long Short-Term Memory (LSTM) networks in predicting wave parameters, finding that LSTM and GRU models significantly outperformed traditional RNN models. Additionally, Yang et al. [
22] applied a Convolutional Neural Network (CNN) combined with Seasonal-Trend Decomposition (STL) and Positional Encoding (PE) to predict significant wave heights. Building on these findings, this paper proposes a short-term wave power prediction method that integrates CNN, BiLSTM, and Deformable Efficient Local Attention (DELA) mechanisms, as well as a power-fitting matrix based on BiGRU. The main contributions of this paper include the following:
- 1.
Designing a BiGRU power fitting matrix that can convert wave parameters into wave energy power output using the wave power generation matrix obtained from simulation software when wave information is input.
- 2.
Utilizing CNN and BiLSTM to extract feature maps from multidimensional inputs, which are derived from time series data.
- 3.
Designing a new attention mechanism to enhance the feature extraction capability of BiLSTM, and comparing it with seven mainstream attention mechanisms, with results showing that the DELA attention mechanism has strong feature extraction capabilities.
- 4.
Comparing the proposed CNN-BiLSTM-DELA model with other established wave prediction models and generalization experiments using January–June 2024 data, and the results show that the model proposed in this paper outperforms the benchmark model in wave power prediction.
2. Power Conversion Module
This chapter analyzes the relationship between significant wave height, wave period, and power through simulation software, thereby constructing the power matrix for the point absorber wave energy converter. The BiGRU model was used for fitting to achieve high-accuracy power prediction.
2.1. Mathematical Model of Wave Energy Converter Device
This study utilizes a point absorber wave energy converter device, the structure of which is shown in
Figure 1. The floater is rigidly linked to a permanent magnet synchronous motor, and both perform synchronous heaving motion, with the motor rotor cutting through the magnetic field to generate electricity.
According to Newton’s second law, analyzing the vertical forces on the direct-drive wave energy converter system allows for the derivation of the floater’s time-domain hydrodynamic model:
In the equation, m is the total mass of the system’s moving parts; is the heave acceleration of the floater; is the wave excitation force; is the radiation force; is the hydrostatic restoring force; and is the counter-electromotive force.
When the float is in an equilibrium position, it can be expressed as:
In the equation,
r is the float radius;
K is the hydrostatic spring coefficient of the float. The radiation force can be expressed as:
In the equation,
is the additional mass of the system in the infinite frequency domain, and B is the damping coefficient caused by the radiation force. The wave excitation force and radiation force can be expressed as:
In the equation,
is the density of seawater; n is the unit vertical normal of the floater; S is the wetted surface area of the floater; and
,
, and
are the incident wave radiation potential, incident wave velocity potential, and incident wave diffraction potential, respectively. The wave excitation force
is the sum of the Froude–Krylov force and diffraction force derived from the incident and diffraction potentials. The electromagnetic damping force acting on the floater can be expressed as:
In the equation, is the electromagnetic damping coefficient.
By substituting (2) to (5) into (1), the motion equation of the entire system can be expressed as:
In the equation,
. The average electromagnetic power
generated by the direct driven point absorption wave energy converter system can be calculated by the following formula:
where T is the wave period.
The three power generation powers generated by the permanent magnet linear generator can be calculated using the following formula:
In the equation, , and are the three-phase output voltages generated by the permanent magnet linear generator; , and are the three-phase currents generated.
The average power generation can be calculated using the following formula:
2.2. Pearson Correlation Analysis of Factors Affecting Wave Energy Converter
Factors affecting power generation include significant wave heights, seawater temperatures, sea surface temperatures, wind speeds, atmospheric pressure, and wave periods. To determine the input variables for the model, the Pearson correlation coefficient is used to establish the correlation between the power generation of the wave energy converter and other factors. The formula is:
In the equation, and are the variables for correlation analysis, and are the means of these variables, n is the sample size, x is wave height, and y represents other factors besides wave height.
A heatmap of the wave feature correlations is shown in
Figure 2. Power generation (power) has the highest correlation with significant wave height (swh) and mean wave period (mwp). It also has a correlation greater than 0.4 with the wave drag coefficient (cdww), air density above the ocean (p140209), and mean sea level pressure (msl), while other factors are negatively correlated. Therefore, swh and mwp are selected as input features for the prediction model.
2.3. Simulation Model Establishment
Wave energy converter devices generally consist of three energy conversion parts: the wave energy capture system, the mechanical transmission system, and the generator [
23]. However, as this study uses a direct-drive wave energy generator, it eliminates the energy loss associated with the intermediate mechanical transmission part. To obtain the power conversion matrix for the direct-drive point absorber wave energy converter device, a simulation approach combining ANSYS AQWA and COMSOL was used, with the floater simulation model established first, as shown in
Figure 3a. The capture device has an outer diameter of 6 m, a height of 1.6 m, a draft of 0.68 m in a stationary state, and a model weight of 17 tons.
In the modeling process, the X-Y plane of the global coordinate system was aligned with the still water surface, defining the waterline height as 0.68 m below and 0.92 m above the waterline. The ANSYS AQWA simulation mesh tool was subsequently employed to generate separate grid elements for regions below and above the waterline. Given that AQWA calculations predominantly occur beneath the waterline, the maximum mesh size was set to 0.4 m above the waterline and 0.2 m below it, leading to a total of 3015 grid elements, as illustrated in
Figure 3b. The wave conditions were defined as regular waves, with
Figure 4a depicting the response curves of the floater subjected to waves of varying amplitudes and periods. Subsequently, a direct-drive generator model was developed in COMSOL, utilizing the response curves as input for the generator rotor. Finite element simulations were then conducted to derive the three-phase voltages.
Figure 4b displays the three-phase voltage waveforms generated by COMSOL, and the power output within a cycle was calculated using (9), as detailed in
Table 1.
2.4. Power Conversion Matrix
The power conversion matrix can be obtained by performing simulation experiments through the above steps, as shown in
Table 1. The unit of power is watts (W).
Table 1 illustrates the relationship between the wave energy conversion system, significant wave height, wave period, and power generation, where the empty entry indicates that the wave height exceeds the maximum value that can be calculated by the simulation software at this period and cannot be calculated. The relationship between these three variables can be observed more visually in
Figure 5.
As shown in
Figure 5, the relationship between the three variables is discrete, and when a polynomial is used for fitting, there are 146 discrete parameters in the table, which are fitted with a large error and do not accurately reflect the actual power generated by the current equipment. In this study, a deep learning method is used to fit the power conversion matrix.
2.5. Power Matrix Fitting Model
In this study, a Bidirectional Gated Recurrent Unit (BiGRU), a variant of Recurrent Neural Networks (RNN), was employed to address issues such as gradient vanishing or explosion when processing long sequence data. It also demonstrated superior performance in scenarios with limited data. By introducing update and reset gates, the BiGRU effectively enhanced accuracy under small data conditions. The core concept of the GRU is to use update and reset gates at each time step to determine which information should be passed to the next time step.
The equations for the BiGRU model are presented as follows [
24]:
In the equations,
represents the update gate,
denotes the reset gate,
is the candidate activation value, and
is the output, where
denotes the sigmoid activation function. The BiGRU further enhances GRU information processing capabilities by simultaneously handling both forward and backward information. Compared to traditional RNN models, it significantly improves prediction accuracy. The structure of the BiGRU is illustrated in
Figure 6.
The calculation formulas are as follows [
25]:
In the equations,
and
represent the forward and backward outputs of the hidden layer at time t, respectively;
and
denote the weights for the forward and backward state layers; and
represents the bias term.
Figure 7 illustrates the predictions obtained from the BiGRU model when the relationships between significant wave height, wave period, and the effective power of the wave energy conversion device are input. The blue curve represents the actual values, while the red triangles indicate the predicted values for each point.
3. Wave Energy Converter Power Prediction Model Based on CNN-BiLSTM-DELA
In the previous section, the BiGRU model was utilized to fit the power conversion matrix of the point absorber wave energy device. This section further explored the optimization of the power prediction method. A CNN-BiLSTM-DELA model incorporating a deformable efficient local attention (DELA) was proposed. This model integrated CNN and BiLSTM, enhancing the ability of BiLSTM to identify nonlinear local features through the DELA attention mechanism. The model took time series as input. Since parameters such as significant wave height and wave period were independent data sequences, the fitted wave energy parameters were input into the power prediction module. The power at each time step was represented by the associated wave factors.
In the CNN-BiLSTM-DELA prediction model, the CNN and BiLSTM-DELA modules were configured in a parallel structure. Wave parameters were separately input into the two frameworks. After a series of transformations, features from both modules were fused, and the final prediction for the wave energy converter device was obtained through a fully connected layer. A structure diagram is shown in
Figure 8.
3.1. Convolutional Neural Networks
Convolutional Neural Networks (CNNs) are a type of feedforward neural network designed to extract features from data. They primarily consist of an input layer, convolutional layers, pooling layers, and dropout layers. The convolutional layers perform convolution operations between the input and weights, and the formula for this operation is as follows:
In the equation, represents the output of the convolutional layer; denotes the bias; w stands for the CNN module weights; and is the input to the convolutional layer.
The pooling layer optimizes the input from the convolutional layer by reducing its dimensionality, enabling effective coupling with the features’ output by BiLSTM-DELA. Dropout, on the other hand, involves randomly removing a portion of neurons during training. Through backpropagation, the weights of the removed neurons are updated and retrained, helping to prevent overfitting in the model.
3.2. Bidirectional Long Short-Term Memory Networks
LSTMs are also a variant of Recurrent Neural Networks (RNNs), designed to address the limitations of RNNs in predicting long sequences. RNNs are constrained by their structure, which can lead to issues such as vanishing and exploding gradients during backpropagation, and they can learn only short-term dependencies [
26]. To overcome these limitations, LSTMs introduce memory cells, which consist of a cell state
, an input gate
, an output gate
, and a forget gate
. The structure of the LSTM is illustrated in
Figure 9.
In the figure,
represents the input at time t, while
and
denote the hidden states at time
t and
, respectively. Here,
is computed based on the output of the previous hidden state and the current input. When
is fed into the LSTM, it interacts with the previous hidden state
through the forget gate, input gate, and output gate. The formulas for the input gate, output gate, forget gate, and hidden state are as follows:
In the equations, W represents the weight matrix, U denotes the output matrix, and b is the bias vector. The subscripts f, i, and o correspond to the forget gate, input gate, and output gate, respectively. The symbol ⊗ indicates element-wise multiplication.
The cell state
is the core component of the LSTM, represented by the horizontal line at the top of
Figure 9. It features a minimal branch conveyor belt structure design, allowing the input information to flow through the cell with minimal alterations, and is regulated by the input, output, and forget gates. The formulas for the candidate cell state and cell state are as follows:
To enhance the nonlinearity of the network, the sigmoid and tanh functions are chosen as activation functions. Their formulas are as follows:
In time series prediction tasks, it is essential to fully consider both forward and backward temporal information patterns to significantly enhance prediction accuracy. The CNN-BiLSTM-DELA model primarily consists of three forward LSTMs and three backward LSTMs. Unlike standard LSTM, which transmit states unidirectionally from past to future, the BiLSTM framework incorporates both forward and backward data patterns, demonstrating significantly enhanced performance.
As illustrated in
Figure 9, the BiLSTM model comprises both forward and backward computations. The orange horizontal arrows represent the forward flow of time series information within the model, while the light blue arrows denote the backward flow of the same information. Additionally, the data information flows unidirectionally through the input layer, hidden layer, and output layer.
3.3. Attention Mechanism
The attention mechanism mimics the human brain’s focus on specific regions at particular moments, allowing for the selective acquisition of more pertinent information while disregarding irrelevant data [
27]. It achieves this by assigning different probabilistic weights to the hidden layer units of the neural network, thereby emphasizing the impact of critical information and improving the accuracy of the model’s predictions. To address this issue, this paper proposes a novel Deformable Efficient Local Attention (DELA) mechanism to tackle model overfitting caused by extended input sequences, which impedes the accurate learning of appropriate weights. DELA is a three-channel attention mechanism comprising spatial attention modules, channel attention modules, and local attention modules, as depicted in
Figure 10.
3.3.1. Spatial Attention Module
Given that the wave energy parameters involve long temporal sequences, identifying relevant features within these sequences is challenging when relying solely on LSTM. Consequently, a spatial attention module is introduced, which primarily focuses on weighting features across extended time steps. It also determines the contribution of different features at each time step to the power output of the generator. Through average pooling, all feature values across the time dimension are averaged to obtain global information about each feature across the entire time series. This step outputs a vector of size sizesizesize, representing the average value of each feature in the time series. Subsequently, a one-dimensional convolutional layer (Conv1D) generates a temporal attention map of the same size as the input features. This map is used to weight each time step according to the importance of different features at that step. Finally, a sigmoid function is applied to normalize the generated temporal attention map. The module’s formula is as follows:
In the equation, denotes the GroupNorm operation, while represents the Conv1D convolution operation. The spatial attention features are derived from the equation. The spatial attention mechanism identifies which features at specific time steps hold higher priority, thereby enhancing the ability of the model to capture temporal sequences more effectively.
3.3.2. Channel Attention Module
Predicting wave energy power involves handling various parameters such as wave period and significant wave height. To enhance the ability of the model to extract the relative importance of different features, this study introduces a channel attention mechanism. The mechanism determines the contribution of each parameter to the overall model and applies appropriate weighting to these parameters. Initially, global pooling is applied to obtain the global average of each feature channel, providing an overall representation of each feature across the entire time series. The input is first reduced by a factor of 1/8 through two linear layers and then restored. This dimensionality reduction and expansion process effectively captures the nonlinear relationships between feature channels. Finally, a sigmoid function is employed to generate the weights for each feature channel. The formula is as follows:
In the equation, denotes the LayerNorm operation, while and represent two linear layers, with performing dimensionality reduction and performing dimensionality expansion. The channel attention mechanism highlights the most critical features in the prediction task while reducing the influence of irrelevant features. This approach enhances the ability of the model to accurately capture valuable information when dealing with complex temporal data, thereby improving prediction accuracy and robustness.
3.3.3. Local Attention Module
In the processing of temporal data, recognizing the importance of long sequences is crucial; however, fine-grained attention to the combinations of specific time steps and features is equally important. Therefore, a local attention module is proposed to capture the interrelationships among significant wave height, wave period, and power generation across different time steps. Initially, an offset generation network is used to predict the offsets in convolution along both the time and feature dimensions, enabling dynamic capture of significant changes and anomalies at specific time steps. These offsets are generated through Conv2D, followed by a deformable convolution operation.As shown by the arrow in
Figure 10.
Unlike standard convolution, deformable convolution allows the kernel to dynamically adjust sampling locations along both the time and feature axes, thus capturing finer temporal features. When combinations of significant wave height and wave period at specific time steps significantly impact power generation, the local attention mechanism can focus specifically on these combinations, thereby enhancing the capture of critical time steps and features. Finally, a standard convolution layer integrates the local features to generate the final output feature, as illustrated by the following formula:
In the equation, denotes DeformConv, with subscripts indicating the first and second dimensions. The equation highlights the importance of weighting features to emphasize critical information regarding time steps and feature combinations, especially in the context of extreme weather conditions affecting wave height and period on power generation.
The DELA module integrates spatial, channel, and local attention mechanisms, as described by the following formula:
In the equation, , , and represent the outputs of the spatial, channel, and local attention modules, respectively. These outputs enhance the model’s ability to handle and predict complex temporal data, thereby improving the overall prediction performance of the module.
5. Conclusions
To mitigate the intermittency and stochastic nature of wave energy, which poses challenges to grid stability, a novel short-term wave energy forecasting model is proposed, integrating CNN, BiLSTM, and an innovative DELA mechanism. This model can accurately predict short-term wave power generation and support decision-makers in optimizing power dispatch, thereby enhancing the efficiency of wave energy conversion. This study achieved the following results:
- 1.
The relationship between wave height, wave period, and power output of point absorber wave energy converters was simulated. A power matrix was developed and optimized using a BiGRU model, allowing for the rapid estimation of power outputs across various marine environments.
- 2.
DELA is a three-channel attention mechanism that processes BiLSTM outputs through spatial, channel, and local attention mechanisms, merging their outputs. This mechanism outperformed seven established attention mechanisms in a comparative analysis.
- 3.
Outputs from the BiGRU model, derived from direct-drive wave energy converters and buoy-based wave parameters in the South China Sea, are fed into the CNN-BiLSTM-DELA model. Operating in parallel, the CNN component primarily identifies extreme wave conditions, while the BiLSTM-DELA component forecasts wave energy based on temporal data.
- 4.
Through comparative studies, the CNN-BiLSTM-DELA model showed the highest accuracy and goodness of fit, surpassing alternative models and demonstrating superior predictive performance.
In summary, the short-term wave energy forecasting model offers enhanced accuracy and adaptability, supporting decision-makers in optimizing scheduling strategies to maintain power system stability and economic efficiency.