1. Introduction
Lithium-ion batteries have found extensive applications in electric vehicles and renewable energy storage systems [
1,
2] owing to their high energy density, prolonged cycle life, minimal self-discharge, and cost-effectiveness. However, lithium-ion batteries are constantly charged and discharged during use, resulting in the inevitable aging phenomenon of lithium-ion batteries [
3], which is manifested by the decrease in capacity and the increase in internal resistance. Monitoring the state of health (SOH) of lithium-ion batteries is crucial for ensuring their stability and safety. SOH is defined as the ratio of the current maximum discharge capacity to the initial capacity [
4,
5] and serves as a widely adopted metric to assess battery performance.
Due to the nonlinear, time-varying nature and complex dynamics of lithium-ion batteries, the SOH cannot be obtained directly from observational measurements. Current SOH estimation methods are typically categorized into model-based and data-driven approaches. Model-based methods involve constructing accurate equivalent battery models to simulate battery behavior, such as equivalent circuit models [
6], electrochemical models [
7], and empirical degradation models [
8]. These established models are usually combined with filter techniques [
9,
10] to achieve precise SOH estimation. However, model-based approaches necessitate specialized knowledge, rely heavily on accurate parameterization for model predictions, and are typically applicable only to specific types of lithium-ion batteries, limiting their generalizability across battery types.
Unlike model-based approaches, data-driven approaches can accurately estimate the SOH of lithium-ion batteries from historical data alone, without having to master the complex electrochemical mechanisms inside lithium-ion batteries. The data-driven approach establishes the mapping of HI to SOH through an algorithm, the accuracy of which is largely dependent on the quality of the selected health indicator (HI) and the learning ability of the training algorithm. The prominent data-driven techniques encompass support vector machines [
11,
12], Gaussian process regression [
13,
14,
15], and neural network algorithms [
16,
17,
18,
19,
20,
21,
22,
23,
24]. Literature [
11] developed an SVM model with radial basis functions as kernel functions to estimate the SOH. Gaussian process (GP) regression was proposed in the literature [
13] for SOH estimation, highlighting GP’s advantages over other prediction methods. For the neural network methods, Lin et al. [
23] proposed a method to estimate SOH based on the BP neural network. Recurrent neural network (RNN) excels at processing time series data. And as a variant of RNN, LSTM solves the problem of gradient vanishing and gradient explosion. Literature [
21] proposed an LSTM-based SOH estimation method. Literature [
18] utilized convolutional neural networks (CNN), which are good at adaptively extracting important features, to build a SOH estimation model. Literature [
19] combined the advantages of CNN and GRU to estimate SOH. Literature [
24] proposed a graph convolutional network (GCN)-based approach to estimate SOH. In addition to these, neural network algorithms such as transformer [
20], deep belief network (DBN) [
17], etc. are also used in SOH estimation.
With a powerful nonlinear modeling model, extracting the appropriate HI is the key to accurately estimating SOH. Among these, the more stable charging process of lithium-ion batteries is considered a significant source of HIs, as opposed to the erratic discharging behavior influenced by driving patterns and road conditions. Xiong et al. [
25] identified the charging capacity within a specific voltage interval during constant-current charging as the HI. Literature [
26] proposed to use the maximum radius of curvature of the current in the constant voltage stage as the HI. Literature [
27] proposed the amount of isochronous voltage change during constant current charging and the amount of isochronous current change during constant voltage charging as the HI. Zhu et al. [
28] extracted the relaxation voltage from post-charge data and used its variance, maximum, skewness, among other metrics, as HIs. Shen et al. [
18] utilized the discrete values of voltage, current, and capacity during the charging cycle as HI.
In recent years, several studies and literature have identified incremental capacity analysis (ICA) as a highly effective method for estimating SOH. Especially in the plateau region of charging and discharging, the IC curves provide a high resolution. And it can be used to analyze the information of the battery operation process from the peak value, peak position, envelope area, and location of the curve [
29,
30,
31,
32]. Li et al. [
33] used incremental capacity curves to extract features and find the best estimated HI through grey relational analysis. Literature [
34] estimated the SOH of lithium-ion batteries based on the incremental capacity curve and the GPR. These feature extraction methods often rely on the complete charging process. However, in practical scenarios, due to the presence of numerous stochastic charging and discharging behaviors, the initial state of charge (SOC) is always arbitrary when the vehicle is charging. Consequently, obtaining comprehensive data on full charging and discharging cycles is challenging. Some researchers have used only partial IC curves and input them directly into the algorithmic model. Literature [
23,
35] used GPR and BP neural networks, respectively, to input partial incremental capacity curves directly to estimate SOH. Literature [
36] and literature [
37] input partial incremental capacity curves into LSTM and CNN, respectively, and explored the effect of the length and location of partial incremental capacity curves on SOH estimation. These studies preliminarily verified the feasibility of estimating SOH using partial incremental capacity curves.
However, current SOH estimation based on such data-driven approaches still faces critical challenges: the first is that the model accuracy heavily relies on a specific and large amount of charging and discharging data; the initial SOC during vehicle charging is arbitrary, leading to inconsistencies between collected and trained partial segment data. For example, if the voltage range of the training data is in the range from 3.6 V to 3.8 V while the collected data is in the range from 3.8 V to 4.0 V, this inconsistency can significantly impact the accuracy of SOH estimation. This issue is further exacerbated by constraints stemming from data limitations. The second is that models often lack generalizability. There are often significant domain differences between training and test data, such as variations in battery types, charging and discharging rates, and ambient temperatures. This disparity makes it difficult to ensure the accuracy of model predictions across domains. Regrettably, these issues commonly coexist in practical applications. Addressing these challenges—specifically, leveraging limited partial data and enhancing generalizability in SOH estimation for lithium-ion batteries—is the primary emphasis of this paper.
To address these issues, this paper adopts a transfer learning approach to solve the domain difference problem between test data and training data to realize cross-domain distribution SOH estimation with only a limited portion of charging data. The main contributions are as follows:
(1) A method is proposed for estimating the SOH of lithium-ion batteries using partial incremental capacity (IC) curves. The study investigates the effectiveness of deriving input features from various voltage segments. This approach circumvents the necessity to analyze specific features of the IC curve and diminishes the requirement for voltage and current measurements to encompass a particular SOC range.
(2) With a transfer learning approach, it is possible to estimate the SOH across diverse battery types and different voltage segments under varying conditions, using only a small subset of the segmented data.
(3) Bayesian optimization is employed to determine the number of layers to be frozen for transfer learning while optimizing the other hyperparameters of the model in order to improve model generalization and learning ability.
3. Methodology
The schematic representation of the methodology proposed in this paper is illustrated in
Figure 4, comprising three main components:
(1) Extraction of features: In this study, initial physical quantities of lithium-ion batteries, such as voltage and current, are detected. In the proposed method, the voltage segment can be just part of the charging data. Subsequently, after gathering voltage and current data, the partial IC curve is computed, followed by smoothing and filtering for further analysis.
(2) The pre-training phase: The study employs a stacked bidirectional gated recurrent unit (SBiGRU) model to extract features, which are then fed into a fully connected layer for direct estimation of SOH. Additionally, Bayesian optimization and K-fold cross-validation are utilized to optimize hyperparameters, thereby enhancing the model’s learning ability and generalizability.
(3) The transfer learning phase: A portion of the model parameters from the pre-training phase are frozen, and the model is fine-tuned using a limited amount of target domain data to eliminate the distributional differences of lithium-ion battery data for diverse battery types and different voltage segments under varying conditions. Bayesian optimization will adjust the number of frozen layers and the regularization factor α to optimize the performance of the transfer learning model.
3.1. The Gated Recurrent Unit
After extracting input features from the IC curves, a SOH estimation model must be constructed to establish the correlation between these features and SOH. This paper utilizes the gated recurrent unit (GRU), a variant of recurrent neural networks introduced by Cho et al. in 2014 [
38]. GRU addresses issues such as gradient vanishing and explosion in traditional models by incorporating two control gates. It excels in modeling time series data by capturing correlations across different time scales.
Figure 5 illustrates the internal structure of the GRU.
When an input sequence
x = [
x1,
x2,
x3,
...,
x19,
x20] consisting of partially segmented charging data first enters the reset gate
Rt,
Rt helps GRU determine whether or not to retain previous information to capture short-term dependencies and is implemented through a sigmoid layer [
38]:
σ represents the sigmoid activation function,
denotes the hidden state from the previous time step, and
,
are the weights and bias of the update gate, respectively. After the activation function,
Rt is between 0 and 1. The next step is to obtain the candidate hidden state
through reset gate:
where tanh is the activation function of hyperbolic tangent. After the activation function tanh, the candidate hidden state is between −1 and 1. The parameter
Rt governs the extent to which information from the preceding hidden state is preserved (closer to 1) or discarded (closer to 0). This retention mechanism influences the amount of past information carried forward to the current state. The candidate hidden state and the hidden state from the previous time step are then combined through the update gate to produce the updated hidden state. The update gate
Zt controls how much of the current input is merged into the hidden state of the output, while weighing the previous long-term memory and capturing the long-term dependencies of the sequence. The update gate
and hidden state
expressions are as follows:
where
and
denote the weights and bias of the update gate, respectively. As the value of
Zt gets closer to 0, the current hidden state becomes increasingly reliant on the previous hidden state, and as the value of
Zt gets closer to 1, the current hidden state is predominantly influenced by the candidate hidden state.
In this paper, the stacked bidirectional gated recurrent unit (SBiGRU) will be used for modeling.
Figure 6 illustrates its bidirectional architecture [
39]. This structure integrates information from both past and future contexts, achieved by stacking two independent GRUs: one processing the forward input sequence and the other the reverse input sequence. The final output is jointly determined by the hidden states of these two GRUs:
where
,
denote the forward hidden state and the backward hidden state, respectively. The bidirectional structure enhances the learning ability of the network and the ability to retain information from more distant moments.
Stacking GRUs can construct a deeper GRU model. Within the framework of the stacked model, the output of the bidirectional hidden layer not only propagates over time but also inputs into the subsequent bidirectional hidden layer. This enables the model to capture correlations between input features across different time scales more effectively, thereby establishing nonlinear relationships between HI and SOH.
The proposed SBiGRU model is illustrated in
Figure 7. In this model, the output of the final hidden layer is passed through a fully connected layer to estimate SOH using the output regression layer.
Precautions against overfitting are necessary due to the depth of the proposed SBiGRU model. The dropout [
40] and weight decay [
41] techniques are combined to solve the problem. The dropout randomly masks some of the hidden neurons, which do not affect the forward propagation in the training phase, but all the neurons are involved in the output when it comes to the final testing phase. Weight decay is a regularization technique wherein the complexity of the model is reduced by penalizing smaller values of model weights. This is achieved by adding a penalty term to the loss function, typically in the form of the L2 norm of the model weights. The final loss function
is represented as follows:
where
L0 is the original loss function value,
λ is a hyperparameter responsible for controlling the strength of weight decay, and
is the model weight parameter.
3.2. Hyperparameter Optimization
In neural networks, the hyperparameters are usually set manually before training. However, different combinations of hyperparameters can profoundly impact the model’s learning ability and generalization. Manual adjustments inevitably reduce efficiency, and the results obtained may not be optimal. Bayesian optimization is widely recognized as an effective approach to address the black-box problem of identifying optimal hyperparameter combinations. Unlike some other optimization algorithms, Bayesian optimization focuses on the global optimum rather than the local optimum. In this paper, the hyperparameters to be optimized can be divided into those for the pre-training phase and those for the transfer learning phase. Pre-training phase hyperparameters include the number of units in each GRU hidden layer, the dropout ratio, the batch size, the learning rate, the number of epochs, and the weight decay coefficient. Transfer learning phase hyperparameters encompass the learning rate, the number of epochs, the batch size, the number of frozen layers, and the regularization term α.
Bayesian optimization obtained its name from Bayes’ theorem and uses
to denote the hyperparameters to be optimized. It is assumed that it obeys the prior probability distribution
p(
w), and the likelihood distribution of
k observations at parameter
is
, so the posterior distribution probability of
is
:
where
denotes the observed set, the subscript 1:
k denotes from 1 to
k times and
denotes the marginal likelihood distribution. Bayesian optimization comprises two main frameworks, which are the probabilistic agent model and the acquisition function. In this study, the probabilistic agent model is de-modeled by Gaussian process (GP), and the objective function
f(
x) represents the loss function used in the SBiGRU model. It is assumed that target observation
f (
x1:k) satisfies a Gaussian distribution [
42]:
where
,
is mean function and covariance function (kernel) with learnable parameters. Then the distribution of the posterior distribution function obtained after n observations is distributed as follows:
The expressions for its mean function and variance are as follows:
With the posterior distribution function, how to choose the next possible optimal value of the observation point
xk+1 requires the use of the collection function. This paper adopts the strategy of expected improvement (EI), which assesses the potential improvement from the next observation relative to the current optimal solution to avoid local optima. Assuming the current optimal observation is
f*k and
E(
X) denotes the acquisition function, then expected improvement
can be characterized as follows:
The next observation point is then obtained from the following equation:
In addition, to enhance the model’s generalization, K-fold cross validation is implemented alongside Bayesian optimization. During training, the training data are divided into K segments for K-fold cross-validation. Each iteration uses K − 1 segments for training and the remaining segment for validation. This process is repeated K times. Specifically, in this study, K is set to 4 to ensure equal sizes for both the validation and test datasets. This approach helps in obtaining optimal hyperparameters that improve both predictive performance and model generalization.
3.3. Transfer Learning
Accurately estimating the state of health of lithium-ion batteries typically demands extensive, specific aging data for training a dedicated model. Therefore, if the distribution of this data changes, the model’s estimation accuracy diminishes. Retraining the model is a common approach to mitigate this issue. This is due to variations in internal mechanisms and charging/discharging schemes among different battery types, resulting in differing data distributions. Moreover, the input features in this study comprise IC curves from specific charging segments, leading to variability in the collected data’s charging segments. Direct application of the trained model for SOH estimation can result in discrepancies in domain between the source (trained model) and target domains (new SOH estimation task), leading to inaccurate capacity estimates. In this scenario, retraining the model necessitates the prior collection of sufficient new data. However, obtaining this necessary data not only consumes time but also incurs significant expenses, rendering it challenging to achieve in industrial settings. Transfer learning (TL) is more in line with practical industrial applications than traditional machine learning schemes.
Despite the domain differences between source and target data, there is always a similarity between them; leveraging information that is pertinent to the target domain in the source domain remains valuable for enhancing capacity estimation performance with limited target data. Therefore, this paper proposes a transfer learning method employing a fine-tuning strategy to pre-train the SBiGRU under the source domain and eliminate the differences between the source and target domains through parameter fine-tuning. Moreover, the robust learning capability of the base model, initially cultivated with ample source data during pre-training, can effectively enhance performance in the target domain via transfer learning.
By first freezing a part of the model parameters that have been pre-trained, training only a part of the parameters can accelerate the model convergence and effectively utilize the information gleaned from the source domain, saving training time and computational resources. Based on the specific prediction task and data distribution, the determination of which model layers to freeze (denoted as n) is automatically optimized using Bayesian optimization as described in the preceding section. Once n layers are identified, their parameters remain static throughout training to prevent updates via backward gradient propagation. Additionally, due to limited data availability in the target domain, samples from the source domain are integrated into training to help the transfer learning process. To mitigate overfitting to the source domain and ensure alignment with the target task, a regularization term α is incorporated into the loss function, resulting in the following expression:
4. Discussion and Analysis of Experimental Results
To validate the effectiveness of the proposed partial incremental capacity curve and transfer learning for SOH estimation, simulation experiments will be conducted in this section using the above dataset. The hardware and software used include an Intel Core i5-12400F CPU, an NVIDIA GeForce RTX 3060Ti GPU, 32 GB of RAM, the Windows 10 operating system, Python 3.9.12, and the Pytorch environment. Prediction performance of the model will be evaluated using root mean square error (RMSE), mean absolute percentage error (MAPE), and R
2. These metrics are defined as follows:
Here, and are the actual and model-predicted values of SOH for the ith sample, respectively. denotes the sample mean and N is the number of samples. The closer the MAPE and RMSE values are to 0, the better the model prediction. Similarly, the closer the R2 is to 1, the better the model prediction.
4.1. Pre-Training Phase
Dataset 1 is partitioned into training and test sets using the stratified sampling method, maintaining a ratio of approximately 4:1 across all experimental conditions. Additionally, a 4-fold cross-validation is employed correspondingly. Specifically, 52 cells are randomly assigned to the training set from dataset 1, while the remaining 14 cells form the test set.
4.1.1. Comparison Results of Pre-Trained Models with Different Voltage Segments
The highly nonlinear nature of the IC curve necessitates an investigation into the impact of extracting partial segment data from different voltage ranges on SOH estimation. The starting voltage of the charging process increases gradually with the number of cycles. In this study, the starting voltage is consistently set at 3.4 V, with a cutoff voltage of 4.2 V. Each voltage segment spans 0.2 V to ensure comprehensive data capture. These segments are categorized into the following ranges: 3.4–3.6 V, 3.6–3.8 V, 3.8–4.0 V, and 4.0–4.2 V. Comparative results are detailed in
Table 2.
The findings from
Table 2 clearly indicate that using the voltage segment of 3.6–3.8 V as the input yields the most favorable outcomes across all three evaluation metrics, thereby affirming the superiority of this proposed input voltage segment. Additionally, it is noteworthy that the voltage segments of 3.8–4 V and 4–4.2 V exhibit comparatively lower effectiveness in contrast to the segments of 3.4–3.6 V and 3.6–3.8 V. This observation can be attributed to the overlapping phenomenon observed in the IC curves beyond 3.8 V.
Figure 8 illustrates the estimated results for a randomly selected cell from the five cases in the test set. For clarity in presentation, 30 cycles of SOH values are depicted for each cell.
4.1.2. Comparison Results of Different Input Features
The input proposed in this study focuses on a partial segment of the capacity increment, specifically the IC value within the voltage range of 3.6–3.8 V. To ascertain the superiority of these proposed input features, they are compared against other inputs, encompassing IC values across the entire charging process as well as current and voltage data for both the entire charging duration and partial sections thereof. The results of the comparison are detailed in
Table 3.
The results presented in
Table 3 demonstrate that using partial incremental capacity as input yields superior performance across all three evaluation metrics, thereby confirming the efficacy of the proposed input features. Furthermore, it is observed that features extracted from partial curves consistently outperform those from complete curves. This superiority may stem from redundant information in excessively long segments, which can obscure critical details in peak regions.
Figure 9 illustrates the estimation outcomes for a randomly selected cell among the five cases in the test set.
4.1.3. Comparison Results of Different Models
To validate the efficacy of the proposed model, it is compared with other deep learning models. Features are extracted using CNN models, as well as LSTM and RNN models with the same structure as the proposed model. Furthermore, comparisons were conducted with machine learning models such as SVR and XGBoost, utilizing features extracted from the IC curves, including maximum, variance, and skewness. To ensure fairness, all models underwent hyperparameter optimization and were subjected to 4-fold cross-validation to enhance their robustness. Evaluation metrics for each model in the test are detailed in
Table 4. Notably, the proposed model demonstrated outstanding performance across all metrics, achieving a MAPE of 0.0029, an RMSE of 0.0033, and an R2 of 0.9968, underscoring its effectiveness.
Figure 10 depicts estimation outcomes for a randomly selected cell among the five cases in the test set.
In addition, the proposed method in the pre-training phase is compared with other voltage characteristics and other state-of-the-art models proposed in the literature [
28], as shown in
Table 5, showing that the proposed method has better accuracy.
4.2. Transfer Learning Phase
This subsection aims to validate the effectiveness of the proposed SBiGRU-TL method on the target domain dataset. The same stratified sampling method is used to divide dataset 2 into training and testing sets. To underscore the benefits of transfer learning, we designated only 10% of the 10 cells in dataset 2 as the test set, and the proportion of training samples is only 1.6% of the total samples, so as to simulate scenarios with limited data. Three methods are first compared in the transfer learning experiment:
(1) TL: The method proposed in this paper.
(2) Zero-TL: Following training on the source domain data, the base model weights are not changed, and the trained model is directly applied to estimate the SOH in the target domain.
(3) No-TL: The model undergoes direct re-training using the limited target domain dataset, after which it is utilized to estimate SOH.
As shown in
Table 6, the transfer learning method achieves the smallest error with a MAPE of 0.0033, an RMSE of 0.0039, and an R2 of 0.9952. In contrast, Zero-TL exhibits the worst performance, indicating the inability to directly estimate SOH in the target domain using the base model. The No-TL approach shows acceptable overall performance; however, it suffers from significant estimation errors initially, attributed to insufficient training data leading to suboptimal early-stage performance.
Figure 11 presents the estimation outcomes for a randomly selected cell among the three cases in the test set.
In addition, the literature [
28] proposed to add a linear transformation layer before the base model to implement transfer learning. Comparing the RMSE metrics of the proposed method in this paper, as shown in
Table 7, it can be found that the proposed method also has better accuracy in the transfer learning phase.
In real industrial applications, obtaining desired data is not always feasible. Transfer learning offers a solution when training and test sets exhibit varying voltage ranges. Specifically, IC values from the 3.6–3.8 V voltage segment in dataset 1 serve as source domain data, while the 3.4–3.8 V, 3.6–3.8 V, 3.8–4.0 V, and 4.0–4.2 V voltage segments in dataset 2 act as target domain data.
Table 8 displays the evaluation metrics for each voltage range, indicating successful estimation of SOH across varying voltage segments.
Figure 12 displays the estimation outcomes for a randomly selected cell across the three cases in the test set. Notably, at a temperature of 25 degrees Celsius, voltage segments 3.4–3.6 V and 4.0–4.2 V show slightly inferior performance compared to others. This discrepancy arises from significant distribution differences between these voltage segments and the source domain data. Nevertheless, the deviation for these segments remains within 0.01.
4.3. Discussion on Bayesian Optimization
The proposed model in this paper utilizes Bayesian optimization to optimize hyperparameters during both the pre-training and transfer learning phases. To validate the effectiveness of Bayesian optimization, the GRU is maintained with identical configuration as in the proposed model but without Bayesian optimization.
Table 9 lists the hyperparameters used during the pre-training phase.
The hyperparameters of the transfer learning phase are shown in
Table 10.
Table 11 presents the evaluation metric values before and after optimization for both the pre-training and transfer learning phases. MAPE showed a 47%, improvement and RMSE exhibited a 52% improvement. In the transfer learning phase, MAPE demonstrated a 71% improvement, while RMSE showed a 72% improvement.
Figure 13 visualizes the estimation outcomes for a randomly selected cell at different phase.