1. Introduction
The world is currently facing the depletion of traditional energy resources such as oil and coal, and the challenges of climate change. At the same time, global energy demand continues to grow to meet economic and social needs. In response, clean energy is emerging as an alternative solution, releasing less carbon dioxide and offering unlimited energy sources to drive sustainable development [
1]. The initiative to achieve carbon neutrality by 2050, proposed in the Paris Agreement, has increased interest and investment in the renewable energy sector.
Wind energy, in particular, represents a viable alternative to fossil fuel power generation, using natural sources to produce electricity. The cost of producing energy with this technology is decreasing yearly, with this trend being expected to continue [
2]. Moreover, the continued development of wind turbines (WTs), driven by the integration of advanced technologies, is expected to significantly increase their efficiency and generating capacity.
However, wind power generation faces challenges regarding the availability, reliability, and lifetime of WTs. For this reason, manufacturers have focused their efforts on prolonging the lifetime of electrical and mechanical systems, which has resulted in reduced failures during operation and consistent production of high-quality power [
3].
Unexpected failures in these systems can negatively affect availability and production rate. Components such as blades [
4], generators [
5], power converters [
3], and gearboxes [
6] are especially vulnerable to failure due to harsh environmental and operating conditions, leading to extended downtime for maintenance. If unaddressed, these challenges could significantly impact the renewable energy industry.
Fault detection is a critical problem with two main aspects: accuracy and cost [
7]. The high vibration and noise levels of generators complicate the accurate measurement of faults. In addition, the diversity of faults depending on environmental conditions requires an accurate classification of each situation [
8]. Implementing multiple sensors and complex algorithms is costly, especially for large-scale wind farms, and challenging to apply in smaller installations. As WTs increase in size and number, the need for cost reduction in monitoring and repair becomes more urgent [
9,
10].
One of the main WT components is the converter. The converters in WTs play a crucial role in facilitating the control and efficient use of energy, especially in complex terrain and with lower maintenance requirements [
11]. The converter is a critical component in the operation of the WT, as it converts the variable frequency and voltage output from the generator into a stable, grid-compatible alternating current (AC) with a consistent frequency and voltage, ensuring that the electrical power generated is suitable for transmission and use [
12]. According to [
13,
14], converters are one of the components that fail most in WTs, responsible for approximately 23% of the failures; the authors indicate that humidity, contamination, and other factors play an essential role in the incidence of these failures. Additionally, with the increasing development of direct-drive WTs, the converter is one of the most failure-prone components. This aspect becomes especially critical in complex terrains, where environmental and operating conditions are significantly more challenging than in less demanding terrains [
15].
This study proposes an improved approach to fault detection using Supervisory Control and Data Acquisition (SCADA) system data. The goal is to detect faults in the converter of a WT located in complex terrain. This approach aims to optimize reliability and reduce the costs associated with operation and maintenance (O&M). The ultimate objective is to ensure the greater long-term viability and sustainability of wind energy.
Two common approaches to detecting WT faults are classification algorithms (supervised) and reconstruction models (unsupervised) [
16]. Classification algorithms train machine learning (ML) models, such as decision trees and k-nearest neighbors, using labeled data and testing them on new data. However, labeling data can be a complex and expensive process. An example of recent work in this area is that by Khan in 2022 [
17], who proposed a new classifier called AdaBoost, K-nearest neighbors, and logistic regression-based stacking ensemble (AKL-SE) to classify faults in WT monitoring systems, obtaining promising results.
The aim of reconstruction models, in contrast, is to understand the time series and reconstruct the variables to detect anomalies, using the reconstruction error as a measure to identify outliers. A notable example of this approach is the research by Xiang et al. [
18], which proposes a method for WT fault detection. This method, which combines a convolutional neural network (CNN) with a long- and short-term memory (LSTM) network based on an attention mechanism (AM), is designed to alert on generator and gearbox faults, making it highly relevant to the field of WT systems.
In addition, deep learning (DL) networks have also been used in this area. Liu et al. [
19] propose a new Deep Residual Network (DRN) for WT fault detection. In their method, raw data from the SCADA system are directly used as inputs to the DRN, which incorporates a Convolutional Residual Building Block (CRBB) with convolutional layers and compression and excitation units. This approach performs fault recognition and classification only when faults occur or are imminent, although, in practice, it is preferable to provide early warnings before a fault occurs.
A notable limitation is that most studies focus on detecting failures in mechanical or electrical components [
20,
21,
22], while only some address failures in electronic components using SCADA data [
23,
24]. In this context, one of the studies exploring converters is that by Xiao et al. [
25], who propose an improved structure called attention octave convolution (AOctConv), applied to the ResNet50 backbone (called AOC–ResNet50) for detecting faults in WT converters using SCADA data with 10 min intervals. In line with this research, we study this component for its recurrent failures, as demonstrated in
Section 2.3.
Meanwhile, hybrid models are more robust than their base models, as evidenced in recent studies [
26,
27]. However, a limitation of these approaches is that they tend to detect alarms and do not attempt to predict failures.
An emerging approach in this context is the use of attention mechanisms, as presented in the study by Wang et al. [
28]. The authors propose a novel method using an integrated AM with a multivariate query pattern for anomaly detection and underlying cause identification. The proposed anomaly detection model comprises multiple cascaded encoders and decoders with a multihead self-attention mechanism to extract correlations between variables. Inspired by this approach, we propose integrating a similar mechanism into our architecture.
1.1. Contribution of the Present Work
The aim of implementing an autoencoder (AE) model is to reconstruct the input data. However, the literature has shown that hybrid models yield superior performance. For this reason, this work implements LSTM and Multihead Attention (MA) layers in the AE architecture, specifically in the encoding and decoding layers, to capture more complex temporal relationships.
Specifically, we propose an architecture for fault prediction in WT converters using SCADA data, with the objective of improving reliability and configurability through fault prediction. This study implements an unsupervised learning approach using an LSTM-AE with multihead attention (LSTM-MA-AE), which incorporates temporal features from SCADA data.
This model is evaluated on a real dataset from a wind farm located in a complex terrain in southern Ecuador. We evaluate it with different architecture configurations to measure its performance in anomaly detection.
Therefore, the main contributions of this research are:
Development of a DL model that integrates autoencoders, LSTM networks, and MA for fault detection in WT converters.
Converter fault detection using SCADA data from a wind farm in complex terrain.
A system of prediction of converter failures in real WT operating conditions using unsupervised learning.
1.2. Background and Motivations
In [
29], reports, several methods for fault detection in WTs, which can be classified into model-based, signal-processing, and data-based approaches. Furthermore, ref. [
11] presents a review of the literature on converter faults; the authors include model-based, signal-based, and data-based methods and conclude that the data-based method has a high fault diagnosis capability.
Furthermore, in [
25], a study is presented for detecting faults in WT converters from SCADA data using CNN; the effectiveness of the research was verified through a comparative study. In addition, ref. [
30] is a study detailing the most common WT converter faults, and presents a Transfer Learning (TL)-based method; the results demonstrate the accuracy and efficiency of the TL method in diagnosing WT faults.
A strategy using wavelet transform, feature analysis, and a Back Propagation Neural Network (BPNN) to accurately identify open-circuit faults in WTs converters, is presented in [
31]; the results show that the proposed strategy can successfully classify converter faults. In addition, ref. [
32] describes a data-based approach to detect WT converter faults using an LSTM; the results show that the proposed method has powerful data processing capabilities and higher diagnostic accuracy. Similarly, early anomaly detection and root cause analysis in WTs using SCADA data are proposed in [
33]. For this purpose, a hybrid model using a LSTM-based asymmetric variational autoencoding Gaussian mixture model (LSTM-AVAGMM) is employed. The robustness and competitiveness of the model are demonstrated in two case studies.
Moreover, a hybrid DL model combining recurrent neural networks (RNN) and LSTM for WT condition monitoring is presented in [
34]. LSTM-AE is employed for data processing and feature extraction in the proposed model. SCADA data and simulated data are then used to provide a complete learning model of the WT behavior. The experimental results unequivocally demonstrate that the proposed model surpasses existing ML algorithms in terms of fault prediction accuracy.
Likewise, the approach proposed by [
35] is based on a supervised implementation of the variational AE model, which allows for the projection of the WT system in a low-dimensional representation space for early fault prediction. Another similar work is presented in [
36], where a deep EA (DAE), enhanced by fault cases, is developed for anomaly detection in WT. With the help of fault cases, the DAE can capture normal operation data patterns and acquire deep embedding features. Experimental results show that the method outperformed current AE-based methods in WT anomaly detection using multiple evaluation metrics.
In [
37], a technique is presented to monitor the aging of insulated-gate bipolar transistor (IGBT) modules in offshore WT converters using SCADA data and a hybrid AE and attention-based LSTM (AT-LSTM) model. The AT-LSTM model is used to learn from SCADA data and establish a temperature prediction. AE is used to detect anomalies. Experimental results validate the effectiveness of the proposed model.
Meanwhile, in a previous study, we determined that the scientific literature on using SCADA data to predict failures in WT converters located in complex terrain is limited [
24]. The Villonaco Wind Farm (VWF), located in Ecuador in a mountainous area at approximately 2700 masl, presents a challenging climate and irregular terrain, entailing unique characteristics that distinguish it from other wind farms [
38]. These unique conditions highlight the need for advanced fault detection approaches and underlie the motivation for this study.
In modern WTs, the converter controls the speed and torque of the generator in addition to the power transfer to the grid [
39]. Moreover, the study of the converter is a relevant research topic currently because a failure in this component can cause an unexpected WT shutdown, resulting in a decrease in energy production. This situation is even more critical in wind farms located in complex terrain, which can lead to long downtimes of the WT.
In addition, early fault detection is crucial for WT maintenance, as it saves significant time and costs. Implementing hybrid models and effectively using SCADA data can improve the reliability and efficiency of predictive maintenance, leading to a reduction in wind farm O&M costs.
Finally,
Table 1 summarizes key studies on failure prediction, from which some conclusions can be drawn. Most studies do not indicate how far in advance the models predict the failure before it occurs. Furthermore, there is a small body of scientific research focused on predicting WT converter failures, while more studies are needed to address the prediction of failures in WTs located in complex terrains, such as the wind farm studied in this work. Therefore, our motivation is to help fill the knowledge gap mentioned above.
The remainder of the paper is structured as follows:
Section 2 describes the methodological process, the model architecture, the failure prediction system, and the evaluation methods.
Section 3 details the results of the evaluation of the fault prediction model. Finally,
Section 4 offers conclusions and suggestions for future research.
2. Materials and Methods
Figure 1 shows the methodological process for failure prediction employed in this study. The figure describes the process from raw data acquisition to feature cleaning and selection, proposed model development, and fault prediction. The raw data undergoes a data cleaning and filtering process, with variables related to the target component then being selected. These variables are the input to the LSTM-MA-AE model, which reconstructs the output signal for each variable. Subsequently, the predicted signals enter a fault prediction system where they are compared with each actual signal, and the reconstruction error is used to calculate the abnormality score. This system generates an alarm when a significant discrepancy is detected, allowing it to predict and alert of the possibility of a failure.
2.1. Description of the Data Set and Study Area
This paper uses SCADA data from the VWF, located in Ecuador, between UTM coordinates 693,030 E 9,558,392 N and 693,526 E 9,556,476 N. The VWF is located in a mountainous area at approximately 2720 m.a.s.l., and includes 11 GOLDWIND GW70/1500 WTs of 1.5 MW nominal power with Direct Drive technology [
10,
15]. Based on the available information, there are few wind farms located at that altitude or higher.
The SCADA operational data and fault records correspond to WT2 of the VWF, recorded between 1 January 2014, and 31 October 2021, representing 386,288 records and 69 variables.
2.2. Data Processing
For data processing, raw files in .txt and .tmp format were initially taken from the SCADA system, processed, and concatenated into a unified file. The steps performed for data cleaning and filtering to ensure data integrity are described below.
Variables with null values exceeding 10,000 records were removed from the dataset, as these could potentially skew the analysis.
Variables that, based on domain knowledge, do not contribute significant features to the analysis were eliminated.
Null values per complete row were imputed, representing a maximum of 7 non-significant records in the WT2 variables.
Data were filtered to include only those with an active power greater than 0 and less than 1600, ensuring the WT is within the guaranteed operating range.
The operating mode of the WT must be 5, which indicates, in this case, that it is not in power limitation or maintenance.
The wind speed range was set between 3 m/s and 25 m/s, as these are the start and cut-off values of the GOLDWIND GW70/1500 WT, ensuring the data reflects typical operating conditions.
Temperature outliers were eliminated: specifically, IGBT temperatures higher than 120 °C, considering the average is 60 °C.
Temperatures should be greater than 0 °C, since no sub-zero temperatures have been recorded in the study area, which has an average historical temperature of approximately 15 °C.
This approach ensures that the data used for analysis and model training are consistent and high-quality, which is essential for accurate and reliable results.
After data processing, the final set was reduced to 317,323 records and 55 variables. That is, 17.85% of the data were eliminated.
2.3. Target Component
The converter failure analysis was performed following the recommendations of VWF operating technicians. A review of the maintenance records and SCADA alarms confirmed that the converter has the highest failure rate at 89.3% (see
Figure 2a). This percentage differs from the 23% mentioned in [
14] for converter failures, since the cited article includes failures of other components of the WT, such as the generator, gearbox, pitch system, among others. In our case, the WT is relatively new, so no failures in these other components have been recorded, resulting in a higher percentage of failures attributed to the converter.
Additionally, SCADA system alarms on the converter account for 86.4% of the total, significantly outnumbering those on the pitch system (see
Figure 2b). In this study, only the converter and pitch components were analyzed because both required replacement as critical parts of the WT.
In addition, it is worth noting the conceptual difference between failures and alarms. A failure involves the shutdown of the WT, and generally requires the replacement of the affected component, while an alarm may cause a temporary shutdown of the wind turbine, but does not necessarily lead to the replacement of the component. Failures can stop production and cause significant losses. It is thus essential to predict when a failure will occur, allowing preventive measures to be taken to minimize the impact on production.
2.4. Feature Selection
As mentioned above, the SCADA dataset used in this study consists of 55 variables. The variables were reduced to prevent the model from becoming too complex and to thus improve the learning of the features related to the target component. For this purpose, the variables were selected based on Pearson’s correlation.
Figure 3 depicts a correlation matrix for several variables related to the operation of the wind turbines and their components, particularly the converter and the IGBT. This matrix shows the strength and direction of the relationships between pairs of variables, represented by colors, where red indicates a high positive correlation, and blue has a high negative correlation.
The variable igbt_temperature_max shows a strong positive correlation with several other variables, such as wind_speed_avg, grid_active_power_avg, generator_speed_avg, and ambient_temperature_avg. This suggests that the IGBT temperature is closely linked to the overall WT operation and environmental conditions.
Three converter faults were identified in the preliminary analysis, which prevented us from observing significant correlations between the IGBT failure variable and the other variables. To address this challenge, the 10,000 data prior to each failure were labeled under the assumption that the component degrades over time and indications of failure may exist in such data. This allowed the data set to be balanced and correlations to be observed. This procedure is similar to that employed in [
24]. In addition, although the correlations observed are small, they provide insight into the variables that could be involved in the faults.
Despite the low direct correlation between IGBT failures and the other variables, including tags in the pre-failure data provides valuable insight into understanding conditions that could contribute to converter failures.
In summary, this study chose variables based on their correlation with the target variable, correlation with converter failures, correlation with converter alarms, and the authors’ domain knowledge. Thus, nine variables related to the target component were selected, and are shown in
Table 2.
This selection ensures that the model effectively captures the operating conditions of the target component, anticipates possible failures, and minimizes the detection of failures in other components, improving the WTs reliability and efficiency.
2.5. Data Splitting
In this study, our dataset consists of 317,323 records and 9 variables after initial filtering and feature selection. During the time series analysis of the variables, we identified a problem with sensor reconfiguration in April 2017, which changed the measurement range in some cases. Therefore, we only considered data from after this date because including earlier data could have a negative impact on model training. After removing the affected data, we retained 176,707 records and 9 variables, which accounts for 55.69% of the original data.
For data splitting, a period of approximately one year of normal operation, between April 2017 and April 2018, was selected for training, which allows capturing seasonal patterns and short and long term variations. Subsequently, the testing phase was performed using the data labeled with failures.
It is important to note that no failure data were included in the training, as the model focuses on reconstructing the input depending on the relationships with other variables. The objective is that the model attempts to reconstruct the expected behavior, with a high discrepancy potentially indicating an anomaly.
2.6. Data Scaling
To improve the performance of a DL model, it is essential to standardize or normalize the data before feeding it into the model. These preprocessing techniques help the model learn more efficiently and effectively.
Standardization transforms each input variable
into a distribution with mean zero and standard deviation one, as shown in Equation (
1). In this equation,
represents the mean of the variable
, and
represents the standard deviation of
.
where
is the mean of the original data.
This technique ensures that all features have a comparable scale, which can significantly improve the convergence and performance of the DL model. In this study, the StandardScaler function from the Scikit-learn library was used to perform data standardization.
After the model’s predictions were obtained, it was necessary to reverse the standardization process to interpret the results in their original scale. This step, known as back-standardization or rescaling, is crucial for understanding the model’s outputs in a practical context. This rescaling was implemented using the inverse transform function of the StandardScaler from the Scikit-learn library.
2.7. Long Short-Term Memory (LSTM) Network
LSTM networks can improve the fusion of temporal features from the state of different parts. This study combines our model with LSTM to extract temporal features from WT data. The LSTM contains an input gate, an output gate, and a forget gate. Its structure is shown in
Figure 4.
The information to be forgotten is controlled by forget gate
, defined as:
where
is the sigmoid function, and
and
represent the weight matrix and the bias of the forget gate, respectively. In the next step, input gate
receives
and
to determine the new data that should be stored in the cell state. At the same time, a vector of candidate values
is created by a hyperbolic tangent layer (tanh), which returns a value between −1 and 1. The previous cell state
is then updated to the new cell state
, as described in Equations (
3)–(
5) [
40].
Finally, output gate
determines the hidden state
as follows:
where
and
are the weight matrix and the output gate bias, respectively. Output
is obtained according to the updated state of cell
.
2.8. Autoencoder (AE)
The AE is an unsupervised deep neural network that reconstructs the input data with minimal error based on the encoded data. The input data are encoded in this network by mapping it to a low-dimensional space [
41]. It then attempts to minimize the loss between the input and decoded data. As illustrated in
Figure 5, the AE generally consists of two parts: an encoder and a decoder.
The encoder transforms the input data
into the encoded representation
(i.e., code layer). According to the depicted architecture, the mathematical formulation for the code layer encoding process is given by Equation (
8); thus,
where
and
are the weight matrix and the network bias vector, respectively. Next, the decoder attempts to reconstruct the input data from code layer
with the smallest discrepancy between the original input
and the reconstructed output
, calculated based on the following equation:
where
is the activation function.
2.9. Multi-Head Attention (MA)
AM has become a vital component in neural networks for handling long sequential data. By computing attention weights, the network learns to focus on the most significant parts of the input. An important innovation was introduced with MA, proposed by Vaswani et al. [
42]. This approach improves attention by using multiple parallel attention layers or “heads” to focus on different input segments. MA greatly improves the modeling of complex dependencies in the data and increases model performance.
Unlike conventional attention models, which compute attention scores between a query vector and key-value pairs from the input, self-attention generates the query, key, and value vectors through transformations of the input itself. This allows the model to effectively extract important features and relationships within the data through self-reference. Specifically, self-attention compares different items within a single input sequence against each other. It computes interaction scores between a query matrix
Q, a key matrix
K, and value matrices
V from the input data using Equation (
10).
where
is the dimension of the key matrix. MA extends this by employing multiple independent attention heads, with each head learning distinct patterns. Multiple heads essentially allow for parallelization within the attention layer, providing a richer representation of the input.
As depicted in
Figure 6, the input sequence is linearly projected into
Q,
K, and
V by learned weight matrices
,
, and
. Then,
Q and
K are multiplied and scaled to obtain the attention scores. The attention weights are multiplied by
V to obtain the output values of each head
. These output values, which constitute the attention of each head
, are concatenated and linearly projected using a learned matrix
to obtain the final attention output of multiple heads
. This process is described by Equations (
11) and (
12).
While masking is an optional component that can be applied to the attention scores before the softmax to prevent certain positions from attending to others, it was not used, as the full sequence context was preferred to capture all global dependencies and might exclude patterns helpful in detecting small changes in the entire sequence.
2.10. Positional Encoding (PE)
PE is an essential technique in Transformers that incorporates information about the position of input sequences. Unlike RNNs, which process sequence data in an ordered fashion, Transformers, with their AM, process sequences in a parallel fashion, which means that the position of elements within the sequence is not implicit in the model [
43]. To address this limitation, PE is introduced, which adds positional information to the input vectors to the attention layer. This encoding is performed using trigonometric functions (sine and cosine) to generate a unique vector for each position, which makes it easier for the model to capture the positional relationship of the elements in the sequence.
Equations (
13) and (
14) are used in this process.
where
is the position in the sequence,
i is the dimension, and
is the size of the feature vector.
2.11. Hybrid Model LSTM-MA-AE
This model efficiently captures the spatial and temporal characteristics of SCADA data, improving fault prediction and diagnosis capabilities. The LSTM-MA-AE model developed in this study processes the input signals to reconstruct the time series and predict faults before they occur.
The LSTM-MA-AE model has two main parts: the encoder and the decoder. The structure of the model is presented in
Figure 7. In the encoder, a combination of LSTM layers, PE, and MA processes the input sequences. The decoder uses a similar structure to the encoder to reconstruct the time series from the encoded representations. This means that two LSTM layers with 64 hidden units each are employed in the encoder, which are responsible for capturing complex short-term temporal dependencies in the data sequence, embedding patterns relevant to the reconstruction task. These layers are followed by normalization and a dropout of 0.1 to prevent overfitting. At the output of the LSTMs, a PE is integrated for the 8-head MA mechanism to understand the positions of the time series data. These attention features are added to a residual connection from the previous output so that no data information is lost. Each head of the MA allows the model to focus on different aspects of the input sequence simultaneously, enhancing the capability of long-term feature embedding and temporality.
Furthermore, in the decoder, we initiate the extraction of the encoder embedded features using an 8-head MA mechanism supplemented with two additional 64-layer hidden LSTM layers. Initially, zero tensors are used as the query in the MA mechanism, whereas the encoder embeddings are employed as a key and value. These are then passed to the LSTM layers, which process the historical values of the dataset along with the attention output, allowing the relevant sequential structure to be extracted and the first value to be predicted through a fully connected layer. This output is subsequently used as feedback in the query the MA mechanism for subsequent predictions. In this way, the predictions generated by the decoder are iteratively employed as a query, achieving accurate reconstruction throughout the entire sequence. Finally, we apply normalization and dropout to optimize training and prevent model overfitting.
The input to the model is a time series of length
T data with
d characteristics:
The temporal sequence is then encoded through a 2-layer LSTM with
dropout in between and a normalization layer
LayerNorm at the end of the second layer, to obtain the encoded representation, as represented in the following Equations (
16) and (
17):
Output series
has
PE added to it so that MA can understand the position of the time series, as shown in Equation (
18).
Once PE is added, the MA is calculated to capture complex relationships in the time sequence, as shown in Equations (
19)–(
22).
where
,
,
are matrices of learnable weights of query
Q, key
K, and value
V.
To avoid losing data information, a residual connection is made from the output of
to the output of the MA, using Equation (
23).
The encoded representation
is first decoded by means of an MA, which uses
where
is the previous hidden layer of the LSTM unit at the end of the decoder, which at the beginning is a tensor of zeros. Next, the MA output of the decoder is concatenated with the historical data
to be reconstructed:
A fully connected layer
FL is then used to learn even more of the data representation, and the output of this is used as input to two LSTM layers, followed by
LayerNorm:
where
feeds as a
Q back to the MA to obtain the reconstructed predicted data. Finally, the decoder output is projected to the original dimension with a
FL to reconstruct the
time series, using Equation (
27).
The pseudocode for LSTM-MA-AE is presented in Algorithm 1.
Algorithm 1 Training procedure for LSTM-MA-AE with Early Stopping |
- 1:
Input: SCADA data, where , T is the time series length and v is the number of variables. - 2:
Initialize: - 3:
Load configuration parameters: hidden layers , batch size , epochs e, patience p - 4:
Initialize model, loss criterion , optimizer Adam, and learning rate - 5:
Initialize variables: best_val_loss = ∞, patience_counter = 0, - 6:
Load , , , where e is the number of sequences and q is the sequence length. - 7:
Create DataLoader iterators for training and testing data with - 8:
for each epoch e do - 9:
Training Phase: - 10:
Set model to training mode - 11:
Initialize training loss to zero - 12:
for each batch in do - 13:
Zero the parameter gradients - 14:
= LSTM-MA-AE() - 15:
Compute loss - 16:
Concatenate - 17:
Compute regularization term - 18:
Perform backpropagation - 19:
Update model parameters - 20:
Accumulate training loss - 21:
end for - 22:
Compute average training loss - 23:
Validation Phase: - 24:
Set model to evaluation mode - 25:
for each batch in do - 26:
Disable gradient computation - 27:
= LSTM-MA-AE() - 28:
Compute loss - 29:
Accumulate validation loss - 30:
end for - 31:
Compute average validation loss - 32:
Early Stopping: - 33:
if best_val_loss then - 34:
Update best_val_loss - 35:
Save model parameters and reset patience_counter - 36:
else - 37:
Increment patience_counter - 38:
end if - 39:
if patience_counter then - 40:
Break training loop early - 41:
end if - 42:
end for - 43:
Test the model using - 44:
Reconstruction data = LSTM-MA-AE() - 45:
System of prediction faults - 46:
Loss of each variable - 47:
Smoothing - 48:
Apply threshold with EWMA and sliding window anomaly detection - 49:
Fault warning;
|
2.12. Exponential Weighted Moving Average (EWMA)
Within the failure prediction mechanism, EWMA is used. This technique applies greater weight to more recent data with higher errors. This technique is defined by Equation (
28) [
44].
where
is the smoothing factor,
is the value of the reconstruction loss in time
t and
is the value of EWMA in time
.
The threshold value at the upper bound of EWMA is calculated as shown in Equation (
29).
where
is the standard deviation of the EWMA values and
is a constant that defines the position of the threshold. When training with normal data,
can be adjusted to set an appropriate threshold.
2.13. Fault Prediction System
Initially, the model reconstructs all input variables to capture deviations in each one. As depicted in
Figure 8, the fault prediction system calculates the reconstruction score for each data point for each variable. A smoothing vector is then used to distinguish between anomalies possibly caused by degradation or situations where the WT is operating at its power limits. Next, this new score is passed through a threshold with EWMA. If the score exceeds this threshold, a label vector is created, where 1 indicates that the threshold has been exceeded and 0 indicates that it has not.
This label vector is processed through a sliding window anomaly detector to improve anomaly labeling. This detector takes a binary vector as input and identifies if there are more than anomalies within a time window , then it labels the entire window between the first and last anomaly within the window as anomalous (1). This method assumes that nearby anomalies within a maximum window size of can represent the same fault.
Once each variable’s final label vector is obtained, all vectors are concatenated and added. This time, the number of variables with anomalies that would indicate a failure warning is determined using a fixed threshold.
For each variable in the data set, we perform the following:
2.14. Model Hyperparameters
In this study, the early stopping technique is adopted during the training process to avoid overfitting the model. Overfitting can prevent the model from adequately understanding the temporal and spatial relationships between variables, as it fits too closely to the true values and loses generalization ability. Early stopping ensures that the model fits with small errors in normal cases. In contrast, in anomalous situations, the model cannot fit the input features, resulting in a significant discrepancy with the normal data. This improves the model’s ability to detect anomalies effectively.
The hyperparameters with which the model achieved the best performance are shown in
Table 3.
2.15. Model Evaluation Metrics
To evaluate the proposed model, we used different error metrics, common in prediction analysis, such as the root mean square error (RMSE), the mean absolute error (MAE), and the coefficient of determination (
). Equations (
38)–(
40) define these metrics.
where
represents the actual values,
the predicted values,
n is the number of samples, and
is the mean value of
.
In addition, we use the Precision (
), Recall (
), and F1-Score (
) metrics, which are commonly employed in anomaly detection [
45]. These metrics are defined by Equations (
41)–(
43).
where True Positive (
) represents alarms that detected a fault correctly within a specific time window, False Positive (
) corresponds to alarms that do not result in an actual fault, and False Negative (
) indicates faults that were not correctly detected.
To conclude this section, the experiments were conducted using the Python programming language in the PyTorch library within a Google Colab environment. The environment had an Intel(R) Xeon(R) CPU @ 2.20 GHz, 51.00 GB of RAM, a Tesla K80 accelerator, and 12 GB of GDDR5 VRAM.
4. Conclusions
Since electronic converters are critical components, the ability to predict failures months in advance is crucial to reducing O&M costs. Therefore, this study addresses the prediction of converter failures in WTs located in complex terrain.
We developed a hybrid model using SCADA data that combines LSTM, Multi-head Attention, and Autoencoder, using their strengths to learn temporal and spatial characteristics between variables. The LSTM provides better generalization, while the Multi-head Attention mechanism captures complex patterns inherent in the data. The Autoencoder mechanism allows the model to reconstruct features with spatial and temporal information, facilitating the detection of malfunctions in variables related to the target component.
The LSTM-MA-AE model can predict failures an average of approximately 3.3 months in advance and with an average F1 of 90% in the evaluated WTs, showing a low false positive rate. These medium-term failures could be indicative of converter degradation.
This method can facilitate early fault detection and provides a robust mechanism for predictive maintenance in wind farms, ensuring greater reliability and generation efficiency. Moreover, we are confident that the methodological process we have developed could be successfully replicated in other wind farms. In fact, we believe it could even be applied to other components of the WT, not just the converter, further extending the reach of our research.
As a future work, it is proposed to use the parallelization capabilities of the Transformer model and the strengths of the self-attention mechanism to determine the variables that are intrinsically related to the component failure and to determine the root cause.