1. Introduction
With the improvement of technology and the decrease of cost, lithium-ion batteries have gradually been promoted. However, their performance will accelerate to fade with the increase of usage times. This fade is the consequence of multiple coupled degradation mechanisms, including electrode–electrolyte interface side reactions, lithium deposition, electrode material structure damage and electrolyte decomposition [
1]. Battery capacity is considered the leading indicator for evaluating battery performance. The battery capacity fade curve refers to the trend curve as the battery capacity gradually decreases with the increase of battery usage time. Generally, the battery capacity fade curve shows an exponential fade trend, that is, the longer the battery usage time, the faster the loss of battery capacity. Predicting the battery capacity fade curve can help people understand the battery life and performance, so that they can better manage, use, and maintain the battery. Therefore, it is necessary to develop models that can predict the trend of battery capacity fade using limited data.
Various models have been proposed by researchers to predict the fade trend of battery capacity. These models fall into two main categories: electrochemical models and data-driven models based on battery measurements such as capacity, voltage [
2]. Electrochemical models [
3] integrate pseudo-two-dimensional (P2D) models [
4] or single-particle models [
5] with battery aging mechanism models to predict battery capacity fade trends. This approach utilizes partial differential equations [
6] to describe physical phenomena like mass transfer, charge transfer, thermodynamics, and chemical kinetics [
7,
8]. Although electrochemical models can provide highly accurate prediction results in some cases, there are many uncertainties in the electrochemical process under different usage conditions, such as temperature, pressure, depth of charge–discharge, etc. These factors can affect the internal structure and chemical reactions of the battery that result in modeling failure.
Data-driven models have been widely applied. Some researchers used historical capacity data as input and fitted the capacity fade situation by using mathematical models such as exponential [
9] and double exponential [
10]. Other researchers consider more diverse inputs and employ deep learning models to capture capacity fade trends [
11,
12,
13,
14]. The aforementioned studies typically required sufficient charge–discharge data to accurately predict future capacities, which has limited the application of battery capacity prediction techniques in many scenarios.
Li et al. [
15] implemented the earliest prediction of the entire capacity fade trend using a capacity-based method. This method relies solely on the capacity series data as input. They used data up to the first 100 cycles as input and used a sequence-to-sequence model based on a cyclic neural network. While this method predicts the future capacity fade curve in one-time, it fails to consider the impact of various charging conditions. Furthermore, certain high-performance batteries exhibit no capacity deterioration within the first 100 cycles [
16]. Therefore, relying solely on capacity as a feature is insufficient for accurately estimating future capacity fade trajectories. It is necessary to extract more comprehensive aging features from data such as voltage, charge quantity, and other data.
Currently, considering the impact of fast charging strategies on batteries and the absence of capacity fade in the initial charge–discharge cycles, some methods have been successfully applied to predict end of life (EOL) [
16,
17,
18], remaining useful life (RUL) [
19], and rollover cycle (knee point) prediction [
20]. These methods extract richer aging features from voltage, charge quantity, and other data, enabling the prediction of EOL, RUL and rollover cycle, and promoting the development of capacity degradation trajectory prediction methods. Building upon the various aging features validated by the aforementioned methods, researchers have demonstrated methods for predicting the capacity fade curve by using data from early charge–discharge cycles. Liu et al. [
21] proposed an elastic net-based method to estimate parameters in an empirical power function that describes the capacity fade curve over more than 100 cycles. Similarly, Saurabh et al. [
22] trained a 2D convolutional model to predict unknown parameters in the capacity fade curve parameter equation. Both of these methods can predict the capacity degradation trajectory. However, the parameter equation cannot fit all the original data without bias. This creates inherent errors in the constructed predicted labels, which can result in a low level of accuracy in the final outcomes.
Strange and dos Reis [
23] used two different nine-layer 1D-CNN neural networks to predict the coordinates of multiple sampling points on the capacity degradation curve. Similarly, Herring et al. [
24] used the first 100 cycles of data to build a multitask linear model that predicted the capacity fade curve. Training machine learning models using small amounts of battery data still presents a challenge, despite the absence of errors in label construction. Overfitting commonly plagues these approaches due to three primary reasons stemming from the nature of the data itself. Firstly, there are significant differences between samples, leading to complexities in modeling. Secondly, each sample exhibits high dimensions, with input features comprising multiple time series and the output label consisting of a single time series. Moreover, the resource-intensive nature of full-cycle experiments for batteries results in limited available training data. Consequently, fitting the training data necessitates larger models. However, without considering the distribution characteristics of the data, these models often lack adequate generalization ability. This predicament raises the unresolved issue of how to develop models with strong generalization capabilities in the face of minimal battery data.
In order to address the aforementioned issues, a battery capacity degradation trajectory prediction method based on the TM-Seq2Seq (Trend Matching—Sequence-to-Sequence) model is proposed. This method predicts the future capacity fade curve in one-time without iteration. Firstly, features are extracted during the first 100 charge–discharge cycles. Secondly, the capacity fade curve is sampled at equal capacity intervals, and the sampled sequence is set as labels. Next, features are extracted using dilated convolution and SE-net, and the features are enhanced using a GRU decoder. Finally, a GRU decoder is used to predict the labels. Additionally, a trend feature matching loss function is designed based on the data characteristics of capacity fade curves. This loss function constrains the output from the intermediate layers of the model, ensuring that the intermediate features follow a certain pattern similar to the capacity fade curve. By controlling the learning space in this way, the model is able to enhance its robustness and improve its learning capabilities. Finally, a parameter equation for the capacity degradation curve is designed to fit the predicted sample points and obtain the complete capacity degradation curve. The proposed method is verified on a public dataset with 132 battery cells and multiple fast charging strategies.
3. One-Time Prediction Method for Capacity Fade Curves
In this section, we explore a comprehensive approach to predict the capacity fade curve. This section comprises three subsections, each addressing a specific aspect of the methodology.
Section 3.1 provides an overview of the feature extraction method as well as the method of quantifying the capacity fade curve into labels.
Section 3.2 delves into the principles of the TM-Seq2Seq model, which serves as the core framework for our prediction methodology. Lastly,
Section 3.3 presents the capacity fade curve parameter equations used to fit the labels into smooth and accurate representations of the actual fade curves.
3.1. Feature Structure and Label Structure
Although multiple fast charging strategies greatly affect the rate of battery capacity degradation, most batteries do not exhibit significant capacity fade within the first 100 charge–discharge cycles, as shown in
Figure 3a. Relying solely on capacity data cannot reflect the battery’s aging rate. Therefore, it is necessary to identify features that reflect the battery’s aging rate and trend and map them onto future capacity fade curves. The voltage–discharge capacity curve is an important curve that reflects the discharge voltage strength of the battery at different states of charge. Many studies have extracted battery aging features through Q(V) (discharge capacity–voltage) curves and conducted a series of prediction tasks. As the battery ages, the Q(V) curve of the battery will undergo changes.
Figure 3b illustrates the differences between the Q(V) curves of the battery during the first cycle and the 100th cycle. The horizontal axis of the graph represents the variance in capacity between the first cycle and the 100th cycle, while the vertical axis depicts the discharge voltage of the battery.
The extracted features need to reflect the changes in the discharge Q(V) curve. Considering this aspect, two sets of temporal features are extracted. The specific feature extraction method is as follows.
Firstly, to ensure accurate feature extraction, it is important to establish a specific voltage range. The high-rate constant current discharge often leads to a reduction in the initial voltage, which can create difficulties in measuring voltages within the 3.5 V to 3.6 V range. Consequently, to address these challenges, we have opted to focus on extracting features within the voltage span of 2.0 V (discharge cut-off voltage) to 3.5 V. The discharge capacity values between 2.0 V and 3.5 V on the Q(V) curve are sampled at equal voltage intervals, generating a vector composed of the discharge capacity values. In this vector, the is the cycle number and is the discharge capacity at a voltage of .
Next, the difference of vector
for each cycle relative to the 100th cycle is calculated using Equation (1). This difference serves to reflect the variations in the Q(V) curve.
Lastly, we use the mean and variance of the vector
to construct two features. Specifically, the variance feature is calculated as shown in Equation (2), while the mean feature is computed following Equation (3).
where
and
is the features,
j represents the
j-th battery, which also corresponds to the
j-th sample. These features reflect the speed and trend of battery aging within the first 100 cycles.
When the actual capacity of the battery reaches 80% of its rated capacity, the battery life ends. The end-of-life capacity of the battery can be calculated based on the rated capacity of the battery, as shown in Equation (4).
where
is the nominal capacity of the battery, which is
, and
is the end-of-life (EOL) capacity of the battery.
To achieve a reliable prediction of the capacity decay trajectory from the 100th cycle to end of life, we establish the target label that accurately reflects this progression. In order to maintain uniform label length across samples, we choose to evenly select 10 points along the capacity decay curve. This decision is driven by the intention to mitigate the risk of overfitting the model due to high label dimensions, particularly when dealing with limited sample data. Therefore, 10 points are uniformly sampled between and . These points can be obtained using Equation (5).
Next, to determine the coordinates of these 10 capacity values, we simply need to acquire their respective cycle numbers. The cycle numbers corresponding to these 10 capacity values are set as the labels for prediction, as shown in Equation (6).
where
is the number of cycles to reach the capacity value of
(specifically,
, and
is the EOL). Vector
is the label for prediction.
J represents the
j-th battery, which also corresponds to the
j-th sample.
The initial measured capacity of different batteries deviates from the nominal capacity, resulting in variations in the numerical range of Vector
for different batteries. Additionally, the elements in Vector
and Vector
affect each other mutually. Given the above reasons, vector
is used to predict the label
. In summary, the features to be inputted into the model and the label to be predicted are shown in Equation (7).
3.2. The Structure of TM-Seq2Seq Model
The original feature sequences used to predict the capacity fade curve represent a high-dimensional input for the sequence-to-sequence regression task. However, directly feeding these sequences into a Recurrent Neural Network (RNN) could hinder the GRU’s ability to retain information over extended periods. Furthermore, the time series features being inputted into the model hold diverse physical interpretations, necessitating a careful evaluation of their relative importance for improving prediction accuracy. Therefore, to address these challenges, the TM-Seq2Seq model structure was specifically crafted with a focus on the sequence-to-sequence architecture [
26] as shown in
Figure 4. Firstly, the original features are learned and dimensionality-reduced through strided convolutions. Following this, an SE-net is added after the convolutional neural network to assign weights to features derived from various convolutional channels. Then, an encoder composed of GRU is used to encode the input sequence. Finally, a decoder composed of GRU is used to predict the labels
. In addition, the elements in Vector
and Vector
affect each other mutually. Given the above reasons, vector
is used as input for the decoder. Furthermore, considering that the capacity fade curve has characteristics approximating exponential fade, the trend features of different sample encoding vectors are constrained. The specific roles and principles of SE-net, GRU, and trend matching are introduced below.
3.2.1. SE-net
SE-Net (Squeeze-and-Excitation Network) [
27] is an attention mechanism used to enhance the performance of convolutional neural networks (CNN). By introducing channel attention mechanism, SE-Net enables the network to dynamically learn the importance of each channel. First, for the input features, SE-Net performs global average pooling operation to transform the features of each channel into a single value. This value represents the global statistical information of that channel’s feature. Second, SE-Net introduces a fully connected two-layer network to learn the weight of each channel. Finally, by multiplying the learned channel attention weights with the original features, the effect of strengthening Important features and weakening useless features is achieved. The formula for the “squeeze” operation is shown in Equation (8).
where
is a set of input features,
is an input sample, and
is global statistical information.
The “excitation” operation is shown in Equation (9). The channel dependency is obtained through a simple gating mechanism.
where
and
are the weight matrix to be optimized.
represents the channel weight vector.
The output of the SE-net is shown in Equation (10).
where
represents the multiplication operation channel-wise.
3.2.2. GRU
GRU [
28] (Gated Recurrent Unit) comprises two gates: the update gate and the reset gate. The reset gate determines the influence of the old hidden state on the current state, while the update gate controls the influence of the new input on the current state. The updated equation is as follows:
where
,
, and
are input data, output hidden states, and candidate hidden states at time t. The parameters in the weight coefficient matrices, including
,
, and
for the input data and
,
, and
for the hidden states, are essential for the training of the network. The biases
,
, and
are included for each corresponding component. Compared to LSTM, GRU utilizes fewer parameters, which results in faster training and testing speeds. Additionally, the simpler structure helps alleviate the problem of vanishing gradients, making GRU potentially perform better than LSTM when dealing with long-sequence data.
3.2.3. Trend Matching
To effectively fit complex and diverse input–output data, it is crucial to impose appropriate constraints on the model design, given the limited amount of training data available and the significant differences between samples. In designing a method to constrain the model, it is crucial to consider the factors that can impact its learning space. The constraints implemented should not alter the model’s structure but rather enhance its generalization ability by limiting the range of parameters it can learn from the data. In many domains, special loss functions are used in generalization techniques to limit the output data from the intermediate layers of the network, thus restricting the learning space of the model [
29,
30]. The loss function plays a pivotal role in this process as it must be carefully designed to correspond to the data conditions and expected objectives. Given these factors, the method to constrain the model is designed as follows:
Firstly, based on the commonality that the capacity degradation curve roughly follows an exponential fade trend, Equation (15) is designed to quantify the trend feature of a vector.
where
represents the label and
represents the trend feature of the label.
Secondly, the labels of different batteries were compared and the trend features of these labels were analyzed, as shown in
Figure 5. Despite significant variations in the labels of different batteries, they exhibit similar trends. By imposing constraints on different feature encodings, we are able to follow certain patterns with relatively low levels of disorder in the different labels. This approach can help constrain the confusion level associated with various feature encodings, thus making the relationship between feature encodings and labels clearer and more effective. As a result, by enhancing the clarity and effectiveness of this relationship, we can significantly improve the performance and learning capability of the model.
Finally, Equation (16) is used as the loss function during model training. The purpose of this loss function is to minimize the differences in trend features of feature encodings across different samples. In addition, Equation (19) is also used as a loss function to fit the labels.
where
is the feature extractor consisting of CNN, SE-net, and GRU,
is the GRU decoder,
is the encoding vector of the
i-th battery.
3.3. Parameter Equation of Capacity Fade Curve
When parameterizing the capacity fade curve, a reasonable parameter equation should be selected according to the actual situation to ensure accuracy and reliability. The exponential model is a common mathematical model that is used by many studies to fit battery capacity curve. In the dataset of this paper, some batteries experience capacity fade in the first 100 cycles, and full utilization of the known information from these cycles is necessary, such as the initial capacity fade rate. Therefore, we use the sum of the initial capacity fade rate
and the exponential model to represent the change in battery capacity fade rate, where
is the capacity fade rate of the battery in the 100th cycle, and the exponential model can describe the change in battery capacity fade rate. Specifically, the change in capacity fade rate is shown in Equation (20).
where
represents the number of cycles that are experienced by the battery after 100 cycles,
is the capacity fade rate of the battery in the 100th cycle,
represents the capacity fade rate of the battery in the
th cycle, and
,
, and
are three parameters to be optimized. Integrating Equation (20) yields the relationship between
and
as shown in Equation (21).
When
,
(
is the capacity in the 100th cycle). Substituting into Equation (21) can determine the constant
:
The capacity of the first 100 cycles is used to determine the initial capacity fade rate. Afterwards, the unknown parameters in Equation (24) are optimized using the nonlinear least squares method, so that Equation (24) can better fit the predicted 10 future capacity fade curve sampling points from the TM-Seq2Seq model.
4. Results and Discussion
In this section, the predicting results of our model and the fitting results of the capacity fade curve are presented. The prediction errors of different models are then compared.
To comprehensively evaluate the prediction performance of capacity fade curves, root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) are utilized as error quantification metrics to assess the accuracy of the estimation results. Additionally, the predicted label
reflects the rate and trend of capacity fade, and
represents the battery’s end of life (EOL). The prediction result for EOL is crucial for formulating maintenance and recycling plans for the battery. Therefore, the prediction errors of the sampling sequence and EOL are separately calculated as shown below:
where
n represents the total number of test samples. RMSE considers the square of estimation errors, thus penalizing larger errors more heavily and better reflecting the differences between model estimations and actual observations. MAE calculates the absolute value of estimation errors, directly measuring the average difference between estimated values and observed values. MAPE converts estimation errors into percentage form, intuitively reflecting the magnitude of estimation errors relative to observed values, facilitating the evaluation of model performance on different datasets.
To demonstrate the effectiveness of the model, comparative experiments were conducted against other popular machine learning architectures. We used the programming software ‘Python 3.9’ to build, train, and test all models. The participating models for comparison are as follows:
Seq2Seq: The Seq2Seq model consists of an encoder and a decoder. The encoder transforms the input sequence into a fixed-length vector, which is then used by the decoder to generate the output sequence. This model has demonstrated excellent performance in tasks such as predicting capacity fade curves [
15].
CNN: The CNN model captures local features through convolution operations and reduces the number of parameters and computational complexity through pooling operations. It is also proficient in learning time series data. This model has been widely applied to tasks such as battery EOL prediction [
18] and capacity fade curve prediction [
22,
23].
CNN-BI-LSTM: BI-LSTM is a type of recurrent neural network model used for natural language processing tasks such as named entity recognition and sentiment analysis. The model comprises both forward and backward LSTM layers, allowing it to utilize contextual information at each time step and better capture long-term dependencies. This model has been applied to tasks of battery SOH prediction [
31].
SE-CNN-LSTM: SE-CNN-LSTM is a model that combines Squeeze-and-Excitation (SE) module, Convolutional Neural Network (CNN), and Long Short-Term Memory (LSTM). It leverages the SE module to enhance the feature representation of CNN, which allows the model to capture more informative features.
Each model is trained and tested under similar conditions to ensure a fair comparison. We conducted 10 experiments. In each of the 10 experiments, we randomly divided 132 sets of batteries into three groups: a training set (~60%), a validation set (~20%), and a test set (~20%). The model parameters were trained using the training set, and the parameters with the highest accuracy on the validation set were saved to ensure the model’s performance on unseen data (generalization). Following this, the test data were input into the saved model to obtain prediction results. Finally, we calculated the average error from the 10 experiments for model comparison. The comparative results are presented in
Table 1.
To justify the rationale behind incorporating the trend matching method and SE-Net, ablation experiments were performed on SE-Net and the trend matching method. The experimental results are shown in
Table 2.
In addition, to visually represent the error distribution of different batteries, a distribution chart of the MAPE of all batteries is plotted, as shown in
Figure 6. The width of each bar represents the corresponding error interval, and the height represents the number of batteries in that error interval.
Figure 7 shows the predicted results of the EOL.
The researchers ranked the 27 batteries from the test set, with the first-ranked prediction being the most accurate and the predictions with lower rankings having larger errors. To showcase the fitting results of the capacity fade curve, nine batteries were selected for demonstration purposes, which are the 1th, 4th, 7th, 11th, 14th, 17th, 20th, 24th, and 27th batteries, as shown in
Figure 8.
The TM-Seq2Seq model achieves the best overall performance in predicting both the capacity degradation trajectory and EOL. The ultimately predicted capacity degradation curve accurately reflects the true rate and trend of capacity fade. Additionally, incorporating both SE-net and trend matching methods improves the overall performance of the model. However, adding SE-net alone increases prediction errors. SE-net introduces channel importance weights, which enhance the model’s learning capability but also increase the risk of overfitting.
5. Conclusions
In conclusion, our study focuses on the prediction of capacity fade curves in lithium-ion batteries under different fast charging strategies. We proposed a battery capacity degradation trajectory prediction method. This method uses data from the first 100 cycles to predict the future capacity fade curve and EOL (end of life) in one-time. First, features are extracted from the discharge voltage–capacity curve. Secondly, the TM-Seq2Seq model based on CNN, SE-net, and GRU is designed. Finally, a trend matching loss function is designed based on the common characteristics of capacity fade curves. Through experimentation, the following conclusions were discovered: The SE-net enhances the model’s fitting capability, allowing it to capture more intricate patterns in the data. Furthermore, the trend matching is suitable for the data conditions in this study, and it can enhance the model’s performance on unknown test data. Ultimately, the TM-Seq2Seq model improves the accuracy of one-time predictions for battery capacity fade curves under various fast charging strategies.
In the future, we will further optimize this method to improve prediction performance and generalization ability. We will also extend the idea of trend feature distribution matching to other machine learning and deep learning methods to further improve the accuracy and stability of capacity fade curve prediction. The method proposed in this paper is not only applicable to battery data processing but can also be applied to similar problems in other fields, with strong versatility and scalability. This method can improve the practicality and reliability of models and is of great significance for solving practical problems.