1. Introduction
As human society pays more attention to climate and the environment, the use of lithium-ion batteries has grown significantly. Lithium-ion batteries themselves have advantages such as high energy density and long life cycles [
1], and are widely used in various electric devices, including electric vehicles. With the popularization of lithium-ion battery applications, research on the state of lithium-ion batteries and related parameters has gained attention. With the popularization of lithium-ion battery applications, research on the state of lithium-ion batteries and related parameters has gained attention. Take the health status of lithium-ion batteries as an example, As the mileage continues to increase, the number of charge and discharge cycles also increases, repeating electrochemical reactions inside the battery. These reactions will lead to irreversible loss of materials, resulting in gradual performance degradation, most notably reflected in the reduction of maximum available capacity. The state of health (SOH) of a battery is typically defined as the ratio between the current maximum available capacity to that of a new battery. This ratio reflects the potential output power of the battery. Therefore, an accurate assessment of the battery status is of great significance to ensure the safety and stability of its operation [
2]. Research on battery SOH has gone through different stages of development. There are currently three primary ways to estimate battery SOH, namely experimental method, model building, and data-driven method.
The experimental method involves conducting tests on lithium-ion batteries under specific conditions, typically charge-discharge and pulse experiments, and obtaining the battery capacity through experimental data [
3]. While this method is applicable across various battery types, it is constrained by the limitations of the experimental environment [
4]. For instance, Scipioni et al. [
5] obtained the aging process of lithium-ion batteries by analyzing the battery structure including a positive electrode, negative electrode, electrolyte, etc. Liu et al. [
6] studied the measurement of battery SOH under conditions that do not affect the integrity of lithium-ion batteries.
Model-based methods for estimating battery SOH typically involve equivalent circuit models [
7] and electrochemical models [
8]. These models simulate the chemical changes, current, and voltage changes during the operation of lithium batteries. They are used to studying the relationship between key factors such as material concentration and reaction rate and battery SOH. In this process, optimization methods such as Kalman filtering are openly applied to improve the robustness of prediction and reduce data noise. Lüders et al. [
9] conducted in-depth research on the process of lithium plating and lithium stripping and built models based on this process. Lai et al. [
10] analyzed a variety of equivalent circuit models and performed global optimization based on different parameter identification. Yang et al. [
11] used constant voltage charging current measurement to assess battery SOH online. Eddahech et al. [
12] used impedance spectroscopy measurement combined with recurrent neural networks to monitor the health status of lithium-ion batteries. Although model-based methods offer explainability of battery behavior, it is difficult to construct itself, the calculation is relatively complex, and it has high requirements for data accuracy and quality. Consequently, their practical implementation can be challenging.
Data-driven methods have received extensive attention and research due to their simplicity and accuracy. Since a series of historical data related to battery recycling includes the entire battery degradation process, a series of health features extracted from them are used as neural network inputs to predict the battery SOH through neural networks. These health features are generally categorized into direct features and introductory features [
13].
Traditional machine learning methods for battery SOH include Backpropagation (BP) neural networks [
14], Support Vector Machine (SVM), extreme learning machines, random forests and annealing algorithms, etc. Building on these approaches, deep learning methods such as Convolutional Neural Network (CNN) [
15], Long Short-Term Memory (LSTM) [
16,
17], Gated Recurrent Unit (GRU) [
18], etc. have been combined to achieve more accurate predictions of battery SOH. Weng et al. [
19] used support vector regression for incremental capacity analysis to predict the health status of vehicle batteries. Dong et al. [
20] introduced particle filters into support vector regression to predict battery health status. Sbarufatti et al. [
21] introduced particle filters into radial basis functions to predict battery SOH. Lipu et al. [
22] conducted a comprehensive analysis of deep learning for battery state estimation in a battery management system (BMS). Chen et al. [
23] focused on the relevant features of the constant voltage stage and combined features to complete the health status assessment of lithium-ion batteries. Raman et al. [
24] studied recurrent neural networks (RNNs) and their variants to predict the battery SOH. Bao et al. [
25] employed swarm intelligence optimization techniques to improve the accuracy of prediction.
Different studies vary significantly in the selection of input health features for battery parameter prediction. Li et al. [
26] extracted features such as voltage change differences at the same time as input based on the charge and discharge curves. Jia et al. [
27] introduced indirect features for Gaussian process regression to improve the accuracy of prediction. Lu et al. [
28] extracted health features with slope as the main object as input. In the process of using neural networks, many different health feature extraction methods are employed.
As more and more methods are proposed and improved in the data-driven field, researchers will verify the performance of the methods by comparing the accuracy of the final prediction results. Data-driven methods have become prevalent in various fields due to their impressive predictive capabilities. A comprehensive review of previous studies on battery-related parameters reveals a common trend among data-driven research: despite employing novel algorithms or optimization methods, such studies frequently rely on identical input data when conducting comparative analyses with other methods. For example, Zhang et al. [
29] demonstrated their findings on battery State of Health (SOH) prediction, as shown in
Table 1. Similarly, Lin et al. [
30] reported their results on battery capacity prediction, which are summarized in
Table 2.
Despite their success, these models often have low interpretability, leading to their categorization as “black box” approaches. These characteristics highlight the importance of understanding how different network architectures interact with input data. Given the inherent differences in their internal structures, different machine learning models and neural networks exhibit varying dependencies on specific features, This implies that the optimal input configuration for different networks may vary, and the best input setup for one model may not align with the requirements of another. Therefore, comparing the performance of different networks under identical input conditions may lead to biased or incomplete conclusions. However, there has been a lack of comprehensive comparative studies addressing this issue. In previous research, almost all papers on algorithm optimization have employed identical input conditions, including the types of input features, their dimensionality, and data volume, among other factors.
For instance, from the perspective of input dimensionality, certain networks may exhibit overfitting as the dimensionality increases, whereas others demonstrate greater adaptability to higher-dimensional input data, achieving more accurate results as the dimensionality grows.
From the perspective of health indicator categories, most researchers rely on Pearson correlation coefficients to assess the strength of the relationship between health features and the target parameter, implicitly assuming a linear correlation. Some researchers employ Grey Relational Analysis (GRA) for feature evaluation, with the common goal of identifying features most closely related to the predicted parameter. Nonetheless, such criteria may not be universally applicable to all networks, underscoring the need for differentiated studies. This study investigates the dependency of different networks on correlation coefficients, aiming to determine the most suitable correlation criterion for each type of network.
From the perspective of input data volume, some networks are expected to achieve relatively accurate experimental results with a smaller amount of data, provided that a certain application standard is met. Investigating and analyzing the performance differences of various networks under different data input volumes can provide valuable insights into optimizing the time and process of data collection in practical applications.
Battery capacity can be used to calculate the battery health status and can intuitively reflect the relevant physical characteristics of the battery. This study explores the battery capacity prediction problem by comparing the performance differences of various networks when selecting different input features. The focus of the research lies in identifying the optimal input configuration for each type of network, with an emphasis on three key influencing factors: the correlation of input features, input dimensionality, and input data volume.
This study represents an investigation into input variability. So we selected both classical machine learning algorithms and advanced deep learning models. This will give a more representative result. For machine learning, the Particle Swarm Optimization Backpropagation (PSO-BP) algorithm and the Support Vector Machine (SVM) algorithm were selected, representing well-established approaches known for their effectiveness in handling structured and moderately sized datasets. PSO-BP leverages particle swarm optimization to enhance the convergence and accuracy of the backpropagation network, while SVM excels in solving classification and regression problems with clear boundaries.
On the deep learning front, the study utilized several hybrid architectures designed to capture complex relationships in high-dimensional data. These included the CNN-LSTM-Attention network, the CNN-GRU-Attention model, and the CNN-BiLSTM-Attention network. The CNN layers in these models extract spatial features from raw data, while the LSTM and BiLSTM layers specialize in processing temporal sequences, capturing long-term dependencies. The Attention mechanism further refines the models by assigning dynamic weights to input features, enabling the networks to focus on the most relevant aspects of the data. This combination of feature extraction, sequence modeling, and attention-based optimization allows the deep-learning models to achieve enhanced accuracy and robustness in capacity estimation.
The main contributions of this paper are as follows:
- (1)
The input health features were divided into three different categories. The classification was based on data availability and computational complexity. The Person and Spearman correlations between each health feature and the battery capacity were calculated and used to create various input groups with varying correlation coefficients. Study the dependency differences of different networks on various correlation coefficients.
- (2)
Study the dependency of different networks on input dimensions and identify the optimal input dimensions for different networks. Investigate the changes in results caused by variations in input dimensions across different networks. This analysis demonstrates that different networks have differences in their optimal input dimensions.
- (3)
Evaluate the input training data requirements for different networks, focusing on the amount of data needed to achieve a specified level of predictive accuracy. Study the dependency of different networks on input data volume.
4. Experiment
This part constructs three different types of experiments to study the impact of different inputs on different neural networks and determine the optimal input for different networks.
4.1. Correlation Experiment
Now in the field of battery parameter prediction, almost all studies use the Pearson correlation coefficient of health features as the main reference for evaluating health characteristics.
This article employs both the Pearson correlation coefficient and the Spearman correlation coefficient to evaluate each health feature. Pearson correlation coefficient assesses the linear correlation between two data sets, while Spearman correlation coefficient does not care whether the two data sets are linearly correlated but monotonically correlated. Spearman correlation coefficient is also called rank correlation or rank correlation. Pearson correlation coefficient requires that the statistical data must be continuous variables and conform to normal distribution, while the Spearman correlation coefficient does not have this requirement; Pearson correlation coefficient has poor stability and is not very reliable when singular values or long-tail distribution appear, while Spearman is relatively more robust.
In this experiment, the input data was first grouped based on Pearson and Spearman correlation coefficients, respectively. The study then analyzed whether the impact of different input groups on various networks was consistent. This approach aims to identify the dependency differences of different networks on correlation coefficients, serving as the basis for subsequent experiments.
The experiment groups the experimental data according to two different correlation coefficients of battery capacity, and first studies the difference in the performance of different network models for different input groups. Obtain the dependence of different networks on different correlation coefficients. The following table is the correlation coefficient data:
From
Table 3, it is evident that the Pearson correlation of each health feature is HF5, HF2, HF6, HF3, HF8, and HF1 from high to low, and the Spearman correlation is HF5, HF9, HF7, HF8, HF3, and HF1 from high to low.
Figure 9 shows the coefficient differences between different plots.
To verify the degree of dependence on different networks with different correlations of health features. This step of the experiment divides the input data into two groups, one group is the three health features with the highest Pearson correlation, and the other group is the three health features with the highest Spearman correlation. These two groups of data are used to train PSO-BP, SVM, CNN-LSTM-Attention network, CNN-LSTM-Attention, and CNN-BILSTM-Attention respectively. The training set accounts for 80% and the test set accounts for 20%. All experiments are completed on Matlab2022 [
35]. All the experiments were conducted on a Dell G16 7630 laptop, equipped with a 13th Gen Intel
® Core™ i7-13650HX processor operating at 2.60 GHz. The graphics card utilized is an Nvidia GeForce RTX 4060 Laptop GPU.
Because the machine learning algorithm is relatively simple, MAE and MSE are selected for analysis. B5 is the experimental group and B18 is the control group. The results are presented in
Table 4 and
Table 5.
Then, the deep learning algorithm experiment was conducted and selected MAE, MSE, RMSE, and R
2 for evaluation of the experimental results and obtained the following experimental
Table 6 and
Table 7. B5 is the experimental group and B18 is the control group.
Through the analysis of experimental results from
Table 4,
Table 5,
Table 6 and
Table 7, for PSO-BP, SVM, CNN-LSTM, and CNN-BiLSTM networks, the input of health features with high Spearman correlation coefficients can obtain more accurate results. For CNN-GRU networks, the input of health features with relatively high person coefficients can obtain more accurate results. And this influence has different degrees of influence on different networks. It has a greater impact on the PSO-BP and CNN-GRU networks, with a 45% reduction in MAE, while it has a smaller impact on other networks. The SVM network has the least impact, with an MAE deviation of about 10%. The results can be clearly shown in
Figure 10. The construction of neural networks has different algorithmic logic for processing different tasks. Therefore, different networks have different correlation emphases on the input health features, which has a greater impact on the accuracy of the results. In practical applications, selecting an appropriate correlation coefficient based on the characteristics of the network can enhance the accuracy of the results.
4.2. Input Dimension Experiment
In current research in related fields, researchers often compare the performance of different neural networks using the same input features. However, different neural networks may have different optimal input dimensions.
Building on the first experiment, a second experiment was conducted by gradually increasing the number of input features for each network. This step is aimed at observing the relationship between the degree of overfitting and the increased input numbers for different algorithms and obtaining the following table. In terms of the order of increasing the dimension, when increasing to the sixth dimension, the difference in type was introduced, and the first, second, and third types of features were added respectively. B5 is the experimental group and B18 is the control group. The results of BSO-BP and SVM were obtained in
Table 8,
Table 9,
Table 10 and
Table 11.
The B5 results of BSO-BP and SVM were obtained in
Figure 11. The comparison results show that the overfitting phenomenon of the PSO-BP network and the SVM network is the smallest when the input health feature is three, and the MAE and RMSE values are also the smallest. The accuracy of these two networks decreases as the input dimensionality increases. Therefore, in practical applications, it is recommended to select three features as input to achieve optimal performance.
For the PSO network, the results caused by the input of different dimensions are relatively different, and the MAE even has a difference of about 30%, while the input deviation of different dimensions of the SVM network is small, and the deviation of MAE is about 10%.
The experimental results analyzed using the CNN-GRU network showed that the optimal input dimensionality for these networks is also three, this is similar to the PSO-BP and SVM networks. The experimental results are presented in
Table 12 and
Table 13. Under which the results achieve the highest accuracy. The comparison results of the three networks are shown in
Figure 12.
Then the experiment of deep learning network was carried out, and the number of health features was gradually increased to study the relationship between the degree of overfitting and the increased dimension. In terms of the order of increasing the dimension, when the input size number is 6, the first, second, and third types of features that were previously classified are used differentially when the sixth input feature is added. B5 is the experimental group and B18 is the control group.
For the CNN-LSTM network, the experimental results are obtained in
Table 14 and
Table 15, and the multi-dimensional CNN-LSTM network experiment results are displayed with a line chart, as shown in
Figure 13. The analysis of the experimental results for the multi-dimensional CNN-LSTM network reveals that accurate results can be obtained when the number of input health features is 4. However, as the input dimensionality gradually increases from 6, significant overfitting becomes apparent. Comparing the results of the experimental group and the control group, the optimal outcomes in both cases were observed when the input dimensionality was 6.
For 6-dimensional input, the variations among the three types of features were minimal, indicating that under multi-dimensional input conditions, reasonably accurate results can be achieved by adding features that are easier to obtain. In both groups, the highest accuracy was achieved when the input dimensionality was 6 and the sixth input feature belonged to the second type.
The specific fitting graph for the experimental group is shown in
Figure 14. Therefore, in practical applications, using either 4-dimensional or 6-dimensional input for the CNN-LSTM network is recommended to achieve the most accurate results. This differs significantly from the PSO-BP and SVM networks.
The dimensionality experiment results for the CNN-biLSTM network are presented in
Table 16 and
Table 17 and
Figure 15. The specific fitting graph for the experimental group is shown in
Figure 16. When the number of the input health features ranges from 3 to 5, there is no significant overfitting phenomenon observed. When the input dimensionality ranges from 3 to 5, the differences in results are not significant. However, when the input health features become 6, The accuracy of the results decreased significantly, with the MAE value increasing by 34% compared to the input size of 3 and by 35% compared to the input size of 5. The optimal input dimensionality lies between 3 and 5, where variations in input within this range have minimal impact on result accuracy. Therefore, the best input choice for the CNN-biLSTM network should be between three and five dimensions, and it is recommended not to exceed this range.
The second experiment demonstrates that the optimal input dimensionality varies among different networks, and the dimensionality of health features significantly influences the prediction accuracy of the results. Hence, in practical applications, it is essential to determine the most suitable input dimensionality for the selected network to improve the accuracy of prediction outcomes.
4.3. Input Data Volume Experiment
All previous experiments were conducted under the condition that the input data set accounted for 80% and the test set accounted for 20%. Because the algorithm needs to be fully analyzed during the training process, the data set should be shuffled before the training data set is divided. The experiment conducted partial tests on the shuffled data set and the non-shuffled data set, and the results also proved that the prediction accuracy after the shuffled data set was the highest. Building on these findings, the third step of the experiment was completed. For different networks, the training set proportion was gradually adjusted to 70%, 60%, and 30%, respectively, to observe the dependency of each network on the input data volume. The conclusions drawn from the experiment are presented in
Table 18,
Table 19,
Table 20 and
Table 21. The drawn image is shown in
Figure 17.
From the data in the tables, it can be observed that when the total input data volume exceeds 60%, the reduction in input data has a relatively smaller impact on the CNN-BiLSTM network compared to the GRU algorithm. This indicates that the CNN-BiLSTM network requires less data to achieve a certain level of accuracy.
When the total number of samples is sufficient, it is unnecessary to input all the data to obtain relatively accurate conclusions. In practical applications, controlling the input sample size within a specific accuracy threshold can significantly reduce processing time while maintaining prediction accuracy. This not only improves efficiency but also simplifies the data management process.
Although the training time of different networks fluctuates, the specific calculation time range is shown in
Table 22.