KAN–CNN: A Novel Framework for Electric Vehicle Load Forecasting with Enhanced Engineering Applicability and Simplified Neural Network Tuning

Pei, Zhigang; Zhang, Zhiyuan; Chen, Jiaming; Liu, Weikang; Chen, Bailian; Huang, Yanping; Yang, Haofan; Lu, Yijun

doi:10.3390/electronics14030414

Open AccessArticle

KAN–CNN: A Novel Framework for Electric Vehicle Load Forecasting with Enhanced Engineering Applicability and Simplified Neural Network Tuning

by

Zhigang Pei

¹,

Zhiyuan Zhang

¹,

Jiaming Chen

¹,

Weikang Liu

¹,

Bailian Chen

^2,*,

Yanping Huang

²,

Haofan Yang

² and

Yijun Lu

²

¹

State Grid Shaoxing Power Supply Company, Shaoxing 312000, China

²

School of Design, Hunan University, Changsha 410082, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(3), 414; https://doi.org/10.3390/electronics14030414

Submission received: 21 December 2024 / Revised: 13 January 2025 / Accepted: 14 January 2025 / Published: 21 January 2025

(This article belongs to the Special Issue Applications of Data Analytics and Artificial Intelligence in Electric Vehicles)

Download

Browse Figures

Versions Notes

Abstract

:

Electric Vehicle (EV) load forecasting is critical for optimizing resource allocation and ensuring the stability of modern energy systems. However, traditional machine learning models, predominantly based on Multi-Layer Perceptrons (MLPs), encounter substantial challenges in modeling the complex, nonlinear, and dynamic patterns inherent in EV charging data, often leading to overfitting and high computational costs. To overcome these limitations, this study introduces KAN–CNN, a novel hybrid architecture that integrates Kolmogorov–Arnold Networks (KANs) into traditional machine learning frameworks, specifically Convolutional Neural Networks (CNNs). By combining the spatial feature extraction strength of CNNs with the adaptive nonlinearity of KAN, KAN–CNN achieves superior feature representation and modeling flexibility. The key innovations include bottleneck KAN convolutional layers for reducing parameter complexity, Self-Attention Kolmogorov–Arnold Network with Global Nonlinearity (Self-KAGN) Attention to enhance global dependency modeling, and Focal KAGN Modulation for dynamic feature refinement. Furthermore, regularization techniques such as L1/L2 penalties, dropout, and Gaussian noise injection are utilized to enhance the model’s robustness and generalization capability. When applied to EV load forecasting, KAN–CNN demonstrates prediction accuracy comparable to state-of-the-art methods while significantly reducing computational overhead and simplifying parameter tuning. This work bridges the gap between theoretical innovations and practical applications, offering a robust and efficient solution for dynamic energy system challenges.

Keywords:

electric vehicles; load forecasting; Kolmogorov–Arnold Network; Multi-Layer Perceptrons; accuracy

1. Introduction

With the rapid proliferation of Electric Vehicles (EVs), their impact on Active Distribution Networks (ADNs) has become increasingly significant. This significance lies not only in the expanding load scale driven by the growing number of EVs [1], but also in the peak load superposition effects of charging behavior, which can lead to local voltage fluctuations [2], line overloads [3], and other operational challenges. Furthermore, the stochastic and spatiotemporal variability [4] of EV charging and discharging patterns disrupts traditional assumptions of load stability and imposes new demands on existing power dispatch and optimization strategies. However, this uncertainty also highlights the potential of EVs as flexible loads and distributed energy storage resources, providing new opportunities for demand response and active scheduling. Therefore, accurately predicting EV load behavior is not only fundamental to ensuring the secure operation of distribution networks but also critical to optimizing resource allocation, improving demand response efficiency [5], and advancing the development of future smart grids.

Traditional EV load forecasting primarily relies on model-driven approaches that utilize mathematical frameworks and predefined assumptions about charging patterns. For instance, the Monte Carlo method has been employed to simulate the spatiotemporal distribution of EV load demand [6], while the Gaussian mixture model and decision tree have been used to predict the load on charging stations [7]. Additionally, probability-based models incorporating detailed characteristics of different vehicle types and destinations have been developed to enhance forecasting accuracy [8]. While these methods perform adequately under predictable conditions, they often struggle to address the dynamic, nonlinear, and stochastic nature of real-world EV charging behavior. As EV adoption grows increasingly complex and diverse, the limitations of model-driven methods become apparent, underscoring the necessity of adaptive data-driven approaches to handle such uncertainties effectively.

Data-driven methods, propelled by advancements in machine learning (ML), have demonstrated significant promise in electric vehicle load forecasting. Among these, the Multi-Layer Perceptron (MLP) is widely employed due to its flexibility and ability to generalize. References [9,10,11,12] have applied various techniques, such as neural networks, deep learning networks, GRU-RNN, and transformers, to forecast EV load based on charging demand and user behavior constraints. However, the inherent structure of MLP is limited in capturing time dependencies and contextual relationships in time-series data, leading to suboptimal performance when modeling dynamic and complex EV load patterns. To address these challenges, many researchers have combined multiple data-driven models. For example, Reference [13] utilized Variational Mode Decomposition (VMD) for load decomposition and then applied Bi-LSTM for forecasting, while reference [14] combined models like ConvLSTM and BiConvLSTM architectures to enhance prediction accuracy. Despite improvements in forecasting accuracy, these methods often demand significant computational resources and are highly sensitive to parameter tuning, limiting their practical applicability. This issue stems from the inherent structure of MLP, which remains difficult to optimize for these challenges, emphasizing the urgent need for a new prediction framework that simplifies the structure without compromising predictive performance, thereby improving the engineering feasibility of EV load forecasting.

To address the limitations of traditional MLP-based approaches, the Kolmogorov–Arnold Network (KAN) framework [15] has recently emerged as a promising alternative. KAN is grounded in the Kolmogorov–Arnold representation theorem, which provides a mathematical foundation for approximating complex multi-dimensional functions [16]. This network effectively captures intricate patterns and dependencies in data, offering enhanced modeling capabilities compared to MLPs. KAN excels in scenarios where data exhibits strong non-linear relationships and dynamic variations, as it adapts better to such characteristics, making it particularly well suited for EV load forecasting. Unlike purely data-driven models, KAN leverages its structured design to incorporate theoretical insights into the learning process [17]. This integration helps overcome challenges arising from insufficient or unbalanced training data, allowing the model to generalize effectively across diverse and uncertain system behaviors. Additionally, KAN can be combined with Convolutional Neural Networks (CNNs) [18] to further improve time-series forecasting. While CNNs are proficient at extracting spatial and local temporal features, their performance on long-term dependencies often requires additional refinement. By integrating KAN’s robust functional approximation with CNN’s feature extraction capabilities, the hybrid KAN–CNN architecture balances flexibility and computational efficiency. This combination not only retains the advantages of CNN in feature learning but also benefits from KAN’s ability to model intricate relationships, making it a compelling solution for EV load forecasting. The proposed framework improves prediction accuracy while reducing computational costs, offering practical value for real-world applications, especially in complex, uncertain environments such as active distribution networks.

In summary, this paper introduces the Kolmogorov–Arnold Network (KAN) to replace the traditional MLP structure, addressing inherent limitations in existing machine learning methods such as sensitivity to parameter tuning and high computational resource requirements. The proposed model integrates KAN convolution layers into the self-attention mechanism, enhancing feature representation while controlling model complexity through L1 or L2 regularization. The KAN–CNN model is applied to EV load forecasting, marking the first application of this architecture in complex power system load prediction. This innovative approach demonstrates the potential of KAN structures in advancing power system modeling. A comparative analysis with established machine learning techniques, including Bi-LSTM and Bi-GRU, reveals that KAN–CNN achieves comparable prediction accuracy while significantly reducing computational time and resource requirements. Its simplified parameter tuning further enhances its engineering practicality, positioning it as a valuable tool for real-world applications. This study highlights the promise of KAN-based architectures in bridging the gap between theoretical advancements and practical implementations in energy system modeling.

2. Preliminaries

This section lays the theoretical groundwork for understanding the proposed hybrid neural network framework. We begin by introducing the structure of MLPs in Section 2.1, detailing their fundamental principles, advantages, and inherent limitations when modeling complex functions. Section 2.2 transitions to the KAN, highlighting its unique structure that addresses the shortcomings of MLPs through learnable activation functions and reduced parameter requirements. In Section 2.3, we provide a comparative analysis of KANs and MLPs, focusing on their differences in structure, flexibility, and efficiency. Finally, Section 2.4 explores the structure of Convolutional Neural Networks (CNNs), emphasizing their local connection properties and hierarchical feature representation. By systematically presenting these foundational concepts, this chapter aims to bridge the gap between traditional neural network architectures and the proposed integration of KANs into CNNs. This sets the stage for advancing the performance, interpretability, and efficiency of deep learning models in subsequent discussions.

2.1. Structure of MLPs

The core of a Multi-Layer Perceptron (MLP) [19] lies in its ability to capture complex patterns and features in data through a combination of linear transformations and nonlinear activation functions. The structure is as illustrated in Figure 1.

Figure 1 depicts the process where a two-dimensional input variable is combined with the learnable parameters and passed through the activation functions to generate the output, as in Equation (1).

f (X) \approx a_{i} σ (W_{i} \cdot x + b_{i})

(1)

where the

f (X)

means the output of the MLP and

x

is the input data of the MLP;

W_{i}

is the learnable parameters of the MLP and is used to perform a linear transformation on the input;

σ (\cdot)

is the fixed activation functions, such as sigmoid and ReLU;

b_{i}

is the bias vector of the hidden layer, providing greater flexibility for the model; and

a_{i}

is the weight factor of each neuron, which sums the output of each neuron weighted by a certain ratio.

According to the approximation principle, any function can be approximated by sufficiently stacking the aforementioned single-layer structure [19], effectively forming a large feedforward neural network.

2.2. Structure of KAN

As demonstrated by the research findings of Soviet mathematicians Vladimir Arnold and Andrey Kolmogorov, any multivariable continuous function f can be represented as a combination of a finite number of single-variable continuous functions, as expressed in Equation (2) [15].

f (X) = f (x_{1}, x_{2}, \dots, x_{n}) = \sum_{q}^{2 n} Φ_{q} (\sum_{p = 1}^{n} ϕ_{q, p} (x_{p}))

(2)

where

f (X)

is the original multivariable continuous function;

x_{p}

is the p-th variable of the input variable

X

;

ϕ_{q, p} (x_{p})

is a single-variable continuous function that operates on the input variable

x_{p}

; the indices q and p represent the specific single-variable function being applied, where q corresponds to the outer function combination and p corresponds to the input dimension;

\sum_{p = 1}^{n} ϕ_{q, p} (x_{p})

represents an intermediate linear combination for the q-th outer function; and

Φ_{q}

(

\cdot

) provides higher-order non-linear mappings. The idea of 2n states that any n-dimensional continuous function can be expressed using exactly 2n outer single-variable functions (

Φ_{q}

).

Namely, Equation (2) can be interpreted as a two-layer neural network, as shown in Figure 2; however, unlike traditional neural networks, it does not rely on linear combinations. Instead, it directly activates the input through non-linear functions. Additionally, these activation functions are not fixed but learnable, distinguishing it from standard MLPs.

This approach decomposes complex high-dimensional functions into combinations of simple one-dimensional functions. By optimizing these one-dimensional functions individually rather than tackling the entire multivariate space, KAN effectively reduces both the complexity and the number of parameters required for accurate modeling. It should be noted that in contrast to traditional neural networks that employ fixed activation functions, KAN utilizes learnable activation functions positioned at the network’s edges. This innovative design replaces each weight parameter with a single-variable function, typically parameterized as a spline function.

In this way, KAN can be expressed by Equation (3):

K A N (X) = (Φ_{n} \circ Φ_{n - 1} \circ {, \dots, \circ Φ}_{1}) (x)

(3)

Moreover, the

ϕ_{q, p} (x_{p})

is the B-spline formula, as shown in Equation (4):

ϕ_{q, p} (x_{p}) = s p l i n e (x_{p}) = \sum_{i} c_{i} B_{i} (x_{p})

(4)

where

s p l i n e (x_{p})

represents a spline function. During training,

c_{i}

denotes the coefficients that are optimized, while

B_{i} (x_{p})

represents the B-spline basis functions defined on a grid. The grid points determine the intervals where each basis function

B_{i} (x_{p})

is active and significantly influences the shape and smoothness of the spline. These grid points can be viewed as hyperparameters that impact the network’s accuracy. A denser grid allows for finer control and higher precision but also requires learning a greater number of parameters.

During training, the spline parameters

c_{i}

(the coefficients of the basis functions

B_{i} (x_{p})

are optimized to minimize the loss function. This process adjusts the shape of the spline to best fit the training data. The optimization typically involves techniques like gradient descent, where the spline parameters are iteratively updated in each step to reduce prediction error.

2.3. Comparison Between MLP and KAN

Thus, we summarize the differences between MLPs and KANs, as shown in Table 1.

2.4. Structure of CNN

While CNNs and MLPs differ in their structural design, they are all based on neural networks. CNNs employ local connections, where each neuron is connected to a specific region of the previous layer, whereas MLPs utilize fully connected structures, linking each neuron to all neurons in the preceding layer. Despite these architectural distinctions, both frameworks are fundamentally designed to approximate functions using learnable weights and fixed activation functions. CNNs are particularly well suited for tasks involving spatial or temporal correlations. Their structure includes several key components: a convolutional layer, a pooling layer, and a fully connected layer. Below is a detailed breakdown of each component and their roles. The structure is as Figure 3 shows [20].

Convolutional Layer

Convolution layers are the core components of CNNs, designed to extract local features from input data by applying small, learnable filters (kernels) across the input. These layers perform element-wise multiplication between the filter and a localized region of the input, followed by summation, generating feature maps that highlight patterns such as edges or textures, as shown in Equation (5).

z_{i j} = \sum_{m = 1}^{k} \sum_{n = 1}^{k} K_{m n} \cdot X_{(i + m) (j + n)} + b

(5)

where

z_{i j}

is the output value in the feature map at position (i,j);

K_{m n}

is the weight of the kernel/filter at position (m,n), which is learned during training;

X_{(i + m) (j + n)}

is the input value from the localized region of the input corresponding to the filter position; and

b

is the same as in the MLPs.

This operation is repeated across the input, producing a feature map that retains the spatial relationships within the data while reducing computational complexity compared to fully connected layers.

Pooling layer

Pooling layers reduce the spatial dimensions of feature maps by aggregating information within local regions, improving computational efficiency and providing translation invariance. Common pooling operations include selecting the maximum value (max pooling) or computing the average value (average pooling) from each local region:

y_{p q} = p o o l (\{F_{(p + m) (q + n)}| m, n \in [0, s - 1]\})

(6)

where

y_{p q}

is the output value of the pooled feature map at position (p,q),

F_{(p + m) (q + n)}

is the input value from the feature map within the pooling window, and

p o o l (\cdot)

is the pooling operation (e.g., max for max pooling or mean for average pooling).

Pooling reduces the resolution of feature maps while retaining important information, helping to prevent overfitting and reducing the computational burden of subsequent layers.

Fully connected layers

Fully connected layers serve as the final stage in most neural network architectures, transforming the extracted features into task-specific outputs. Each neuron in a fully connected layer is connected to every neuron in the previous layer, enabling the network to combine and interpret global information.

o_{k} = σ (\sum_{j = 1}^{N} W_{k j} \cdot v_{j} + b_{k})

(7)

where

o_{k}

is the output of the k-th neuron in the layer;

v_{j}

is the input value from the previous layer’s j-th neuron;

W_{k j}

is the weight connecting the j-th input neuron to the k-th output neuron, learned during training; and N is the number of input neurons from the previous layer.

Fully connected layers combine features extracted by previous layers into a comprehensive representation, allowing the network to perform classification, regression, or other specific tasks. While computationally intensive due to the large number of parameters, these layers excel at learning high-level representations by integrating global features.

3. Forecasting Based on KAN–CNN

Electric Vehicle (EV) load forecasting requires models capable of capturing complex nonlinear dynamic characteristics while optimizing parameter efficiency and computational resource usage. However, traditional CNNs or MLPs face notable limitations: they rely on fixed activation functions (e.g., ReLU, Sigmoid) and linear transformations to model input–output relationships. While this design works well for basic spatial features, it lacks the flexibility and expressiveness needed to handle the intricate and nonlinear dynamics often encountered in electric vehicle load forecasting. Moreover, these models typically require a large number of parameters, making them prone to overfitting and demanding significant computational resources, especially in deeper networks. KAN addresses some of these challenges by introducing learnable spline functions (e.g., B-splines) in place of traditional linear weight matrices, reducing parameter requirements and enhancing generalization capabilities. But KAN’s high computational complexity and lack of local connection and weight-sharing mechanisms limit its ability to handle spatially correlated load forecasting data.

KAN–CNN combines the spatial feature extraction capabilities of CNN with the flexible nonlinear representation of KAN. By embedding KAN’s learnable activation functions into CNN’s convolutional layers, KAN–CNN significantly enhances the modeling of complex dynamic load patterns while reducing the parameter count and computational complexity. Its local connection and weight-sharing mechanisms address the scalability issues of traditional KAN, improving adaptability to high-dimensional dynamic data. Furthermore, KAN’s learnable activation functions enhance model interpretability and robustness against data perturbations, making KAN–CNN an efficient and reliable solution for electric vehicle load forecasting.

The structure is as Figure 4 shows.

As shown in Figure 4, KAN–CNN differs from a traditional CNN in several key ways. First, the fixed activation functions in CNN layers are replaced with learnable KAN activation functions, enabling the model to adaptively capture nonlinear and dynamic patterns in the data. Second, in the C-KAN convolutional layer, the incorporated compression and expansion convolution are introduced to reduce parameter complexity and improve computational efficiency. Additionally, standard convolution operations in the attention and modulation mechanism are replaced with KAN convolutions, enhancing the network’s ability to model long-range dependencies and refine critical features, thereby uncovering the relationships in EV load prediction over extended time scales. Finally, to mitigate overfitting, Gaussian noise and regularization techniques are applied to enhance the model’s robustness. The detailed operations are as follows.

3.1. C-KAN Convolutional Layer

The traditional convolution operation, as described in Section 2.4, Equation (5), is inherently a linear transformation. This approach necessitates the use of numerous convolutional layers to adequately model the complexity of electric vehicle load data, leading to a parameter matrix dimensionality issue. In KAN–CNN, this challenge is addressed by replacing the dense weight matrix in traditional convolution with the compact and learnable basis functions of KAN, as illustrated below:

\begin{matrix} z_{i j} = ϕ (\sum_{m = 1}^{k} \sum_{n = 1}^{k} K_{m n} \cdot X_{(i + m) (j + n)} + b) \\ = \sum_{q = 1}^{Q} Φ_{q} (\sum_{m = 1}^{k} \sum_{n = 1}^{k} K_{m n} \cdot X_{(i + m) (j + n)} + b) \\ = \sum_{q = 1}^{Q} c_{q} B_{q} (\sum_{m = 1}^{k} \sum_{n = 1}^{k} K_{m n} \cdot X_{(i + m) (j + n)} + b) \end{matrix}

(8)

With the above approach, the traditional activation functions in CNNs are replaced with KAN’s learnable basis functions, significantly enhancing the learning capability and reducing the required depth of convolutional layers. Consequently, the number of parameters in the improved architecture is determined by

Q

, which is substantially smaller than the dimensionality of the traditional convolutional layer’s linear weight matrix W, i.e.,

Q ≪ \dim (W)

.

To further reduce computational parameters, the C-KAN convolution layer incorporates compression [21] and expansion convolution [22] techniques. These methods effectively decrease the number of parameters and computational complexity while preserving the model’s expressive capacity. The detailed operations are as follows:

Incorporates compression

As the name suggests, compression convolution utilizes a lightweight encoder to perform convolution operations on the input data. By employing smaller convolution kernels or reducing the number of feature channels, it compresses the dimensionality of the input, extracting meaningful features while minimizing redundant information:

z_{c o m p} = \sum_{m = 1}^{r} \sum_{n = 1}^{r} K_{c o m p, m n} \cdot X_{(i + m) (j + n)} + b_{c o m p}

(9)

where

z_{c o m p}

is the output after the compression is incorporated,

K_{c o m p, m n}

is the weights of the compression convolution kernel,

r \times r

represents the reduced kernel size, and

b_{c o m p}

is the bias term.

After the compression is incorporated, the data are passed through the learnable activation function of C-KAN, enabling the simulation of complex electric vehicle charging behavior curves.

Expansion convolution

To restore the processed features to their original dimensions and ensure compatibility with subsequent network layers, the output data from the C-KAN activation function is processed using an expansion convolution:

z_{e x p} = \sum_{m = 1}^{r} \sum_{n = 1}^{r} K_{e x p, m n} \cdot ϕ (z_{c o m p}) + b_{e x p}

(10)

3.2. Self-KAGN Attention

Given the volatility of electric vehicle charging data, the convolution layer in the traditional self-attention mechanism is replaced with the KAN convolution layer after the data passes through the pooling layer, enhancing the model’s ability to capture global dependencies. It is worth noting that the structure of Self-KAGN Attention is consistent with the traditional self-attention mechanism, as illustrated in Figure 5, with the key difference being that the linear transformations are replaced by KAN convolution operations. The specific transformations are as follows:

A t t e n t i o n (Q, K, V) = S o f t m a x (\frac{Q \cdot K^{T}}{\sqrt{d}}) \cdot V

(11)

where

Q

,

K

, and

V

are the query matrix, key matrix, and value matrix.

In Self-KAGN Attention, the linear transformations in the computation of

Q

,

K

, and

V

are replaced with KAN convolution operations to introduce adaptive nonlinearity and reduce parameter complexity, namely,

Q = ϕ (W_{Q} \cdot X)

,

K = ϕ (W_{K} \cdot X)

, and

V = ϕ (W_{V} \cdot X)

.

3.3. Focal KAGN Modulation

Focal KAGN Modulation enhances the traditional focal modulation layer in CNNs by replacing its standard convolution operations with KAN convolutions. This step, positioned after Self-KAGN Attention, further refines features by focusing on key elements and suppressing irrelevant or noisy information.

\begin{matrix} F (x) = \sum_{q = 1}^{Q} w_{q} \cdot ϕ (x_{q}) \\ W h e r e, E q a u t i o n (3) \end{matrix}

(12)

3.4. Regularization Techniques

To prevent overfitting and enhance the robustness of KAN–CNN, regularization techniques are applied throughout the network, particularly in fully connected layers and KAN convolution layers. The following strategies are employed:

Weight and Activation Penalties

During training, L1 and L2 regularization terms [23] are added to the loss function to constrain the complexity of the weights and activations, preventing overfitting.

L_{t o t a l} = L_{t a s k} + λ_{1} {‖W‖}_{1} + λ_{2} {‖W‖}_{2}^{2}

(13)

where

L_{t a s k}

is the task-related loss (e.g., MSE loss for regression tasks),

{‖W‖}_{1}

is the L1 regularization of the weights,

{‖W‖}_{2}^{2}

is L2 regularization of the weights, and

λ_{1}

and

λ_{2}

are the hyperparameters controlling the strength of the regularization terms.

Dropout

Dropout is applied to fully connected layers and some convolutional layers, randomly deactivating a proportion of neurons to prevent over-reliance on specific features.

y = D r o p o u t (h, p)

(14)

where

h

is the activation values of the input, and

p

is the dropout rate, representing the proportion of randomly deactivated neurons.

Additive Gaussian Noise

Gaussian noise is added to activations and weights to improve the model’s robustness and adaptability to noisy data.

\tilde{z} = z + N (0, σ^{2})

(15)

where

z

is the original activation or weight value, and

N (0, σ^{2})

represents the Gaussian noise, with a mean of 0 and a variance of

σ^{2}

.

3.5. Summary of the KAN–CNN

Key improvements in KAN–CNN include replacing fixed activation functions with learnable KAN activation functions, which enhance the flexibility and adaptability of feature representation. The bottleneck KAN convolutional layers reduce the parameter count and computational complexity without compromising the model’s expressive ability. Self-KAGN Attention further enhances the network’s ability to capture global dependencies, while Focal KAGN Modulation dynamically refines critical features, ensuring that important data patterns are emphasized. Additionally, regularization techniques such as L1/L2 penalties, dropout, and Gaussian noise injection improve the model’s generalization and robustness against noise and overfitting.

Through these advancements, KAN–CNN not only maintains comparable prediction accuracy but also significantly reduces computational overhead and parameter sensitivity, making it a highly efficient and practical solution for real-world applications. This comprehensive design demonstrates its potential to bridge the gap between theoretical innovation and engineering implementation in EV charging load dynamic prediction tasks.

4. Case Studies

4.1. Experimental Environment

The experiment utilizes historical load data from all charging stations in a specific region of Zhejiang Province, spanning from 1 September 2023, 00:00, to 31 August 2024, 24:00, for modeling and prediction. The data are collected at 15 min intervals, comprising a total of 35,136 time points, and cover the charging loads of seven regions (A–G). The number of charging stations in each region ranges from 2487 to 12,986, effectively reflecting the diverse load characteristics of different areas. The hardware platform for model training and testing includes an Ubuntu 22.04 operating system, an AMD Ryzen 9 7950X 16-core 32-thread CPU, and an NVIDIA GeForce RTX 4090 GPU with 24 GB memory, 64 GB DDR5 RAM, and 2 TB NVMe SSD storage. The software environment comprises PyTorch 2.0 for deep learning and Python 3.10 for data preprocessing and analysis. This robust hardware and software configuration ensures efficient model training and accurate prediction results. The KAN architecture consists of 2 hidden layers with 28 and 42 neurons, respectively; the grid size for KAN is set to 3, and the spline order is 3. The model was trained for 400 steps, with a regularization coefficient of 0.00001.

4.2. Evaluation Metrics

To comprehensively evaluate the performance of the models, the following evaluation metrics are employed:

Root Mean Square Error (RMSE)

RMSE is the square root of the Mean Square Error (MSE), making it particularly sensitive to large errors. As a result, it effectively highlights significant deviations in the prediction results. A smaller RMSE value indicates lower prediction errors and better model performance. Moreover, since its unit is consistent with the original data, RMSE offers excellent interpretability. It is calculated as:

R M S E = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}}

(16)

where

y_{i}

and

{\hat{y}}_{i}

represent the actual and predicted values, respectively, and n is the total number of samples.

Mean Absolute Percentage Error (MAPE)

MAPE quantifies prediction errors as a percentage, providing a normalized metric that facilitates the comparison of predictive performance across different load ranges. A smaller MAPE value indicates better model prediction accuracy. The formula for MAPE is as follows:

M A P E = \frac{1}{N} \sum_{i = 1}^{N} |\frac{y_{i} - {\hat{y}}_{i}}{y_{i}}| \times 100 %

(17)

Coefficient of Determination (R²)

R² assesses the goodness of fit, indicating how well the predictions approximate the actual data. Its value range is [0, 1], where values closer to 1 indicate better alignment and the model’s ability to effectively capture data variations. It is defined as:

R^{2} = 1 - \frac{\sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{N} {(y_{i} - {\bar{y}}_{i})}^{2}}

(18)

where

{\bar{y}}_{i}

is the mean of the actual values.

4.3. Prediction of Typical Data Across Different Seasons

The dataset is divided into a training set (90%), a validation set (5%), and a test set (5%) based on time, ensuring that the training data chronologically precedes the validation and test sets in the context of time-series forecasting. To evaluate the model’s adaptability to seasonal variations, representative dates from different seasons are selected for load prediction, including spring (25 January 2024), summer (17 May 2024), autumn (6 August 2023), and winter (1 December 2023). The test set is carefully curated to ensure it contains challenging seasonal variations, providing a robust evaluation of the model’s performance under diverse conditions. The model leverages a sliding window approach, with input features comprising load data from the previous 120 h and a prediction target of the short-term load for the next 24 h. The forecast results are presented in Table 2.

The KAN–CNN model exhibits the best prediction performance during winter. Metrics such as RMSE, MAPE, and R² show that the model achieves the lowest error and the highest level of fit during this season. The low RMSE and high R² values indicate that the model accurately captures EV load patterns in winter, likely due to the more stable and less volatile nature of winter loads. Moreover, the computation time is the same in different seasons, because the input data are the same (load data from the previous 120 h).

Autumn exhibits the largest prediction errors among all seasons, with the highest RMSE and MAPE values, coupled with relatively lower R² values. This indicates that the model struggles to capture the fluctuating nature of EV loads during autumn. The higher variability in autumn loads likely introduces significant challenges for the model, leading to reduced accuracy and stability.

The KAN–CNN model demonstrates strong performance in EV load forecasting, particularly during winter and for the G category, highlighting its capability to handle stable load patterns and simpler data structures effectively. However, the model faces challenges during autumn, where higher variability and complexity in load characteristics reduce its predictive accuracy. These findings underline the importance of targeted optimizations for both seasonal and categorical complexities in EV load forecasting.

4.4. Evaluation of Predictive Performance Under Complex and Fluctuating Scenarios

To evaluate the predictive capability of the algorithm under scenarios characterized by complex and drastic fluctuations—which are atypical for electric vehicle loads—virtual data with higher complexity are generated using a fixed random seed prior to incorporating real data. This approach provides an intuitive demonstration of the effectiveness of the proposed prediction method. In this process, real data are used as the basis for generating prediction data, with random fluctuation techniques applied to enhance complexity. The prediction results are presented in Figure 6.

Figure 6 illustrates a marked increase in data complexity after introducing a fixed random seed, presenting challenges for general prediction algorithms to achieve precise forecasting. In contrast, the proposed KAN–CNN algorithm demonstrates its ability to capture the intricate underlying patterns of the data under significant fluctuations. This advantage stems from KAN–CNN’s hybrid structure, which integrates kernel attention mechanisms for enhanced feature extraction and convolutional layers for capturing spatial–temporal correlations. The model effectively identifies critical features in the fluctuating data, allowing it to maintain predictive accuracy even under high complexity. As shown in the figure, the KAN–CNN algorithm not only aligns closely with the actual trends but also outperforms conventional methods, highlighting its robustness and suitability for complex load prediction scenarios.

4.5. Comprehensive Prediction and Comparison in the Real World

In the previous section, it was observed that autumn exhibited the highest fluctuation, leading to lower prediction accuracy compared to other seasons. Considering the inherent correlations among different regions, this section revisits the prediction for the autumn dataset (6 August 2023). The input data remain those from the past 120 h, but the historical data from all seven regions are incorporated simultaneously. The prediction results are compared with those of the Bi-LSTM [24] and Bi-GRU [25] algorithms. The prediction trends are illustrated in Figure 7, Figure 8 and Figure 9, and the comparative statistical analysis is presented in Table 3.

From the results in Table 3, the prediction performance of KAN–CNN, Bi-LSTM, and Bi-GRU is compared across seven regions (A–G) using the autumn dataset (6 August 2023). The key performance metrics include RMSE, MAPE, R², and computation time. Overall, KAN–CNN demonstrates superior performance compared to the other two algorithms. Due to space constraints, a representative region is selected for detailed analysis of each metric.

RMSE analysis
In Region A, KAN–CNN achieves an RMSE of 530.220 MW, which is the closest to the actual value of 525.260 MW compared to Bi-LSTM (505.260 MW) and Bi-GRU (545.290 MW). Although Bi-LSTM has a slightly lower RMSE, its predictions deviate from the overall trend, especially in scenarios with complex load fluctuations, leading to a higher risk of error amplification over time. Bi-GRU, on the other hand, shows a significantly higher RMSE, indicating its limited sensitivity to load variations in Region A. KAN–CNN balances accuracy and robustness by leveraging its convolutional architecture to better capture both local and global patterns in the load data, making it the most reliable model for RMSE-based evaluations.
MAPE analysis
In Region D, KAN–CNN achieves a MAPE of 6.583%, outperforming both Bi-LSTM (8.965%) and Bi-GRU (6.963%). MAPE reflects the percentage error in predictions, and a lower value indicates more stable and accurate predictions relative to the actual load. KAN–CNN’s lower MAPE highlights its ability to minimize prediction deviations across varying load ranges. This is particularly important in Region D, where the load exhibits significant fluctuations. By accurately capturing both overall trends and local variations, KAN–CNN demonstrates superior adaptability to volatile data. Its reduced error margin makes it a more practical choice for real-world applications, such as energy optimization and resource allocation.
R² analysis
In Region C, KAN–CNN achieves an R² of 0.983, which is significantly higher than Bi-LSTM (0.942) and Bi-GRU (0.967). The R² metric quantifies how well the model explains the variability in the data, with values closer to 1 indicating better fit. KAN–CNN’s high R² demonstrates its superior ability to capture the underlying load trends and fluctuations in Region C. Bi-LSTM’s lower R² suggests limited capability in modeling complex load patterns due to its sequential processing limitations. While Bi-GRU performs better than Bi-LSTM, it still falls short of KAN–CNN due to its reduced effectiveness in extracting fine-grained local features. KAN–CNN’s convolutional operations and attention mechanisms enable it to achieve a near-perfect fit, making it highly effective for regions with diverse load variations.
Computation time analysis
KAN–CNN demonstrates a significant advantage in computation time compared to Bi-LSTM and Bi-GRU across all regions. For example, in Region A, KAN–CNN completes predictions in 15.6 s, while Bi-LSTM and Bi-GRU require 36.7 s and 38.6 s, respectively. This trend is consistent across other regions, where KAN–CNN reduces computation time by more than 50%. This efficiency is attributed to KAN–CNN’s architectural design, including bottleneck KAN convolutional layers and parameter-efficient techniques, which minimize the number of learnable parameters without sacrificing prediction accuracy. In contrast, Bi-LSTM and Bi-GRU rely on recursive sequential processing, which increases computational complexity and time, especially for long input sequences.

Across all three metrics—RMSE, MAPE, R², and computation time—KAN–CNN consistently outperforms Bi-LSTM and Bi-GRU. Its ability to balance prediction accuracy, robustness, and adaptability to varying load conditions makes it the most suitable model for electric vehicle load forecasting. The results highlight KAN–CNN’s potential for practical applications in complex and dynamic energy systems.

4.6. KAN–CNN Parameter Adjustment Advantage Analysis

The experimental results presented in Table 4 provide a comparative analysis of KAN–CNN and traditional models in terms of training efficiency, accuracy, parameter complexity, and hyperparameter tuning difficulty.

The comparison highlights KAN–CNN’s efficiency and compactness over Bi-LSTM and Bi-GRU. KAN–CNN achieves the shortest training time (2 h) and the lowest parameter count (8288) due to its compact representation and learnable activation functions, significantly reducing computational complexity. In contrast, Bi-LSTM and Bi-GRU require extensive parameter tuning with grid search, resulting in longer training times (5.3 and 4.5 h) and higher parameter counts (87,424 and 66,624). KAN–CNN’s low hyperparameter complexity further simplifies implementation, making it a superior choice for efficient and scalable applications.

4.7. Evaluating the Capability of EV Load Forecasting to Provide Actionable Insights Under Uncertainty

The accuracy of Electric Vehicle (EV) load forecasting directly impacts grid scheduling and operational planning [26,27]. To evaluate the practical effectiveness of the proposed forecasting models in real-world grid applications, this section is divided into two parts: the impact on grid performance and improvements in fleet operational scheduling.

Part I, EV load data from a specific region of Zhejiang Province, is utilized, and the IEEE 33-bus standard topology is adopted for grid simulation. To reflect uncertainties in real-world scenarios, user behavior fluctuations, weather variations, and sudden peak demands are modeled using random simulation techniques. A total of 1000 random scenarios are generated, and the statistical average results are analyzed. Table 5 summarizes the impact of different models (KAN–CNN, Bi-LSTM, and Bi-GRU) on grid performance metrics.

Table 5 quantitatively demonstrates the impact of different forecasting models on grid performance under uncertainty. KAN–CNN achieves the lowest voltage deviation at 2.5%, significantly outperforming Bi-LSTM (4.1%) and Bi-GRU (3.8%), indicating better grid stability. For overload rate, KAN–CNN reduces it to 5.2%, compared to 9.4% for Bi-LSTM and 7.8% for Bi-GRU, showing its effectiveness in mitigating risks during peak load scenarios. In terms of peak load reduction, KAN–CNN achieves the highest reduction at 12.5 MW, surpassing Bi-LSTM (9.8 MW) and Bi-GRU (10.2 MW), highlighting its capability in peak shaving. Additionally, KAN–CNN provides the smoothest load curve, with an index of 0.92, compared to Bi-LSTM (0.85) and Bi-GRU (0.88), ensuring more balanced grid operations. These results validate KAN–CNN’s superior forecasting performance, which directly contributes to improved grid stability, reduced operational risks, and enhanced efficiency in demand management under uncertain conditions.

Part II evaluates the impact of EV load forecasting models on improving fleet operational scheduling. The experiment assumes that all vehicles behave in a fully rational manner, strictly adhering to the optimal scheduling directives provided by the grid. The study utilizes data from Region A, which is estimated to have approximately 9000 electric vehicles. The results are shown in Table 6.

Table 6 demonstrates the impact of different forecasting models on fleet operational scheduling for a fleet of 9000 EVs. KAN–CNN exhibits superior performance across all evaluated metrics. Specifically, KAN–CNN achieves the lowest total operation cost at CNY 9180, reducing costs by 9.7% compared to Bi-LSTM (CNY 10,170) and by 6.4% compared to Bi-GRU (CNY 9810). This highlights its ability to minimize energy consumption and optimize charging schedules effectively.

In terms of charging wait time, KAN–CNN significantly reduces the average waiting time to 10.2 min, compared to Bi-LSTM (15.5 min) and Bi-GRU (13.8 min), indicating better congestion management and efficient scheduling. The station utilization rate is highest for KAN–CNN, at 86.5%, outperforming Bi-LSTM (78.4%) and Bi-GRU (81.2%) and reflecting improved resource allocation. Finally, KAN–CNN delivers the highest fleet travel efficiency of 47.8 km/h, a notable improvement over Bi-LSTM (42.5 km/h) and Bi-GRU (44.1 km/h).

These results validate KAN–CNN’s effectiveness in optimizing fleet operations, reducing costs, and improving efficiency, making it a valuable tool for large-scale EV fleet management.

5. Conclusions

This study proposed a novel KAN–CNN model for Electric Vehicle (EV) load forecasting, integrating the adaptive nonlinearity of Kolmogorov–Arnold Networks (KANs) with the spatial feature extraction capabilities of Convolutional Neural Networks (CNNs). The designed architecture incorporates bottleneck KAN convolutional layers, Self-KAGN Attention, and Focal KAGN Modulation to enhance feature representation, reduce computational complexity, and improve global dependency modeling. Regularization techniques such as L1/L2 penalties, dropout, and Gaussian noise were employed to further improve model robustness.

The experimental results demonstrated that KAN–CNN consistently outperformed traditional models such as Bi-LSTM and Bi-GRU across key evaluation metrics, including RMSE, MAPE, R², and computation time. For instance, KAN–CNN achieved significantly better prediction accuracy while reducing computational overhead by over 50% compared to the baseline methods. Furthermore, the model showed strong adaptability to complex load patterns and high robustness to seasonal variations and regional fluctuations.

As the first application of the KAN–CNN structure in EV load forecasting, this study validates its feasibility and practical engineering value. The model can also be effectively extended to other time-series forecasting tasks in the energy sector, such as renewable energy generation forecasting or grid stability analysis. Future research will focus on further integrating more domain knowledge and multimodal data, exploring the broader application potential of KAN–CNN in complex energy scenarios.

Author Contributions

Conceptualization, Z.P. and Z.Z.; methodology, Z.Z.; software, J.C.; validation, W.L. and B.C.; formal analysis, Y.H.; investigation, H.Y.; resources, Y.L.; data curation, Z.P.; visualization, B.C.; supervision, B.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the North China Electric Power Research Institution Co., Ltd. (project name: Research on the Temporal and Spatial Distribution Forecast of Electric Vehicle Charging Piles and Grid Interaction of Power Grid, grant number: H202494408300).

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

Authors Zhigang Pei, Zhiyuan Zhang, Jiaming Chen, and Weikang Liu were employed by State Grid Shaoxing Power Supply Company. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Su, Y.; Teh, J.; Chen, C. Optimal dispatching for AC/DC hybrid distribution systems with electric vehicles: Application of cloud-edge-device cooperation. IEEE Trans. Intell. Transp. Syst. 2024, 25, 3128–3139. [Google Scholar] [CrossRef]
Liu, Q.; Ma, J.; Zhao, X.; Zhang, K.; Meng, D. Online diagnosis and prediction of power battery voltage comprehensive faults for electric vehicles based on multi-parameter characterization and improved K-means method. Energy 2023, 283, 129130. [Google Scholar] [CrossRef]
Lai, C. The Influence of Battery Exchange Electric Vehicle and Transmission Line Age and Loading Related Failures on Power System Reliability. IEEE Syst. J. 2023, 17, 5774–5785. [Google Scholar] [CrossRef]
Wang, S.; Chen, A.; Wang, P.; Zhuge, C. Predicting electric vehicle charging demand using a heterogeneous spatio-temporal graph convolutional network. Transp. Res. Part C Emerg. Technol. 2023, 153, 04205. [Google Scholar] [CrossRef]
Zhong, J.; Li, Y.; Wu, Y.; Cao, Y.; Li, Z.; Peng, Y.; Qiao, X.; Xu, Y.; Yu, Q.; Yang, X.; et al. Optimal operation of energy hub: An integrated model combined distributionally robust optimization method with stackelberg game. IEEE Trans. Sustain. Energy 2023, 14, 1835–1848. [Google Scholar] [CrossRef]
Yang, N.; Xun, S.; Liang, P.; Ding, L.; Yan, J.; Xing, C.; Wang, C.; Zhang, L. Spatial-temporal Optimal Pricing for Charging Stations: A Model-Driven Approach Based on Group Price Response Behavior of EVs. IEEE Trans. Transp. Electrific. 2024. Early Access. [Google Scholar] [CrossRef]
Genov, E.; De Cauwer, C.; Van Kriekinge, G.; Coosemans, T.; Messagie, M. Forecasting flexibility of charging of electric vehicles: Tree and cluster-based methods. Appl. Energy 2024, 353, 121969. [Google Scholar] [CrossRef]
Liu, H.; Xing, Z.; Zhao, Q.; Liu, Y.; Zhang, P. An Orderly Charging and Discharging Strategy of Electric Vehicles Based on Space–Time Distributed Load Forecasting. Energies 2024, 17, 4284. [Google Scholar] [CrossRef]
Liu, J.; Lin, G.; Rehtanz, C.; Huang, S.; Zhou, Y.; Li, Y. Data-driven intelligent EV charging operating with limited chargers considering the charging demand forecasting. Int. J. Electr. Power Energy Syst. 2022, 141, 108218. [Google Scholar] [CrossRef]
Rasheed, T.; Bhatti, A.R.; Farhan, M.; Rasool, A.; El-Fouly, T.H.M. Improving the efficiency of deep learning models using supervised approach for load forecasting of electric vehicles. IEEE Access 2023, 11, 91604–91619. [Google Scholar] [CrossRef]
Xia, M.; Shao, H.; Ma, X.; de Silva, C.W. A stacked GRU-RNN-based approach for predicting renewable energy and electricity load for smart grid operation. IEEE Trans. Ind. Inform. 2021, 17, 7050–7059. [Google Scholar] [CrossRef]
Koohfar, S.; Woldemariam, W.; Kumar, A. Performance comparison of deep learning approaches in predicting EV charging demand. Sustainability 2023, 15, 4258. [Google Scholar] [CrossRef]
Li, C.; Liao, Y.; Sun, R.; Diao, R.; Sun, K.; Liu, J.; Zhu, L.; Jiang, Y. Prediction of EV charging load using two-stage time series decomposition and DeepBiLSTM model. IEEE Access 2023, 11, 72925–72941. [Google Scholar] [CrossRef]
Mohammad, F.; Kang, D.-K.; Ahmed, M.A.; Kim, Y.-C. Energy demand load forecasting for electric vehicle charging stations network based on convlstm and biconvlstm architectures. IEEE Access 2023, 11, 67350–67369. [Google Scholar] [CrossRef]
Danish, M.U.; Grolinge, K. Kolmogorov–Arnold recurrent network for short term load forecasting across diverse consumer. Energy Rep. 2025, 13, 713–727. [Google Scholar] [CrossRef]
Gong, Y.; Zhang, Y.-Z.; Fang, S.; Liu, C.; Niu, J.; Li, G.; Li, F.; Li, X.; Cheng, T.; Lai, W.-Y. Artificial intelligent optoelectronic skin with anisotropic electrical and optical responses for multi-dimensional sensin. Appl. Phys. Rev. 2022, 9, 021403. [Google Scholar] [CrossRef]
Koenig, B.C.; Kim, S.; Deng, S. KAN-ODEs: Kolmogorov–Arnold network ordinary differential equations for learning dynamical systems and hidden physics. Comput. Methods Appl. Mech. Eng. 2024, 432, 117397. [Google Scholar] [CrossRef]
Alshingiti, Z.; Alaqel, R.; Al-Muhtadi, J.; Haq, Q.E.U.; Saleem, K.; Faheem, M.H. A deep learning-based phishing detection system using CNN, LSTM, and LSTM-CNN. Electronics 2023, 12, 232. [Google Scholar] [CrossRef]
Lago, J.; De Ridder, F.; De Schutter, B. Predicting spot electricity prices: Deep learning approaches and empirical comparison of traditional algorithms. Appl. Energy 2018, 221, 386–405. [Google Scholar] [CrossRef]
Xie, Y.; Sun, W.; Ren, M.; Chen, S.; Huang, Z.; Pan, X. Stacking ensemble learning models for daily runoff prediction using 1D and 2D CNNs. Expert Syst. Appl. 2023, 217, 119469. [Google Scholar] [CrossRef]
Dantas, P.V.; da Silva, W.S.; Cordeiro, L.C.; Carvalho, C.B. A comprehensive review of model compression techniques in machine learning. Appl. Intell. 2024, 54, 11804–11844. [Google Scholar] [CrossRef]
Wang, S.; Liu, Z.; Chen, Y.; Hou, C.; Liu, A.; Zhang, Z. Expansion spectral–spatial attention network for hyperspectral image classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 6411–6427. [Google Scholar] [CrossRef]
Yang, M.; Lim, M.K.; Qu, Y.; Li, X.; Ni, D. Deep neural networks with L1 and L2 regularization for high dimensional corporate credit risk prediction. Expert Syst. Appl. 2023, 213, 118873. [Google Scholar] [CrossRef]
Mughees, N.; Mohsin, S.A.; Mughees, A.; Mughees, A. Deep sequence to sequence Bi-LSTM neural networks for day-ahead peak load forecasting. Expert Syst. Appl. 2021, 175, 114844. [Google Scholar] [CrossRef]
Munawar, S.; Javaid, N.; Khan, Z.A.; Chaudhary, N.I.; Raja, M.A.Z.; Milyani, A.H.; Azhari, A.A. Electricity theft detection in smart grids using a hybrid BiGRU–BiLSTM model with feature engineering-based preprocessing. Sensors 2022, 22, 7818. [Google Scholar] [CrossRef]
Jarvi, P.; Climent, L.; Arbelaez, A. Smart and sustainable scheduling of charging events for electric buses. TOP 2024, 32, 22–56. [Google Scholar] [CrossRef]
Al-Ogaili, A.S.; Hashim, T.J.T.; Rahmat, N.A.; Ramasamy, A.K.; Marsadek, M.B.; Faisal, M.; Hannan, M.A. Review on scheduling, clustering, and forecasting strategies for controlling electric vehicle charging: Challenges and recommendations. IEEE Access 2019, 7, 128353–128371. [Google Scholar] [CrossRef]

Figure 1. Structure of MLPs.

Figure 2. Structure of the KAN.

Figure 3. Structure of a CNN.

Figure 4. Structure of KAN–CNN.

Figure 5. Stureture of Self -KAGN Attention.

Figure 6. Electric vehicle load prediction results: illustrative example based on complex fluctuation virtual data generated from real data.

Figure 7. EV load prediction results using KAN–CNN.

Figure 8. EV load prediction results using Bi-LSTM.

Figure 9. EV load prediction results using Bi-GRU.

Table 1. Comparison between MLP and KAN.

Feature	Kolmogorov-Arnold Network (KAN)	Multi-Layer Perceptron (MLP)
Activation function	Learnable, often parameterized as spline or polynomial functions, providing high flexibility.	Fixed, such as ReLU, Sigmoid, or Tanh, limiting adaptability to complex patterns.
Parameter representation	Replaces weight parameters with single-variable functions, enabling compact and efficient representation.	Uses weight matrices for linear transformations, leading to high-dimensional parameter spaces.
Model complexity	Reduces the number of parameters by leveraging basis functions, making it efficient for high-dimensional tasks.	Relatively higher parameter requirements, especially as network width or depth increases.
Non-linear approximation	Decomposes high-dimensional functions into combinations of one-dimensional continuous functions for enhanced modeling.	Relies on stacked layers and fixed activation functions for non-linear transformations, often requiring deeper layers.
Interpretability	Offers better interpretability, as learnable activation functions can be analyzed explicitly.	Limited interpretability due to fixed activation functions and large parameter matrices.
Scalability	Efficient for high-dimensional problems due to compact function-based representation.	Scaling to high-dimensional problems requires significantly more parameters, leading to increased complexity.
Training complexity	Slightly higher due to optimization of spline or polynomial parameters but often results in fewer total parameters.	Generally simpler to train but can suffer from high resource demands in large networks.
Flexibility	High flexibility due to learnable basis functions, enabling adaptation to diverse problem types.	Less flexible, as the fixed nature of activations may not generalize well across varied problem types.
Robustness	More robust to input perturbations due to parameterized and adaptive activation functions.	May be sensitive to adversarial attacks and input noise, especially in deep architectures.

Table 2. Forecast results based on KAN–CANN across different seasons for a specific region in Zhejiang Province.

Different Seasons		A	B	C	D	E	F	G
Spring (25 January 2024)	RMSE (MW)	439.209	342.012	121.751	545.996	255.367	221.072	82.321
	MAPE (%)	7.542	7.254	7.187	6.951	7.016	6.758	6.691
	R²	0.947	0.962	0.953	0.976	0.973	0.960	0.963
	Time (s)	13.2	12.8	11.2	12.5	11.9	11.2	10.9
Summer (17 May 2024)	RMSE (MW)	452.582	323.136	130.693	492.005	266.058	192.342	77.639
	MAPE (%)	7.215	7.103	7.077	6.637	7.038	6.792	6.709
	R²	0.950	0.965	0.973	0.981	0.971	0.968	0.975
	Time(s)	13.2	12.8	11.2	12.5	11.9	11.2	10.9
Autumn (6 August 2023)	RMSE (MW)	524.730	411.027	153.194	600.952	348.692	268.693	106.583
	MAPE (%)	7.563	7.302	7.351	6.980	7.231	6.950	6.902
	R²	0.935	0.941	0.946	0.939	0.942	0.941	0.949
	Time (s)	13.2	12.8	11.2	12.5	11.9	11.2	10.9
Winter (1 December 2023)	RMSE (MW)	403.052	281.005	106.167	421.009	246.317	198.766	68.107
	MAPE (%)	7.185	6.984	7.051	6.711	6.956	6.612	6.583
	R²	0.953	0.968	0.978	0.971	0.978	0.976	0.956
	Time (s)	13.2	12.8	11.2	12.5	11.9	11.2	10.9

Table 3. Comparative statistical analysis of prediction results across algorithms for the autumn dataset (6 August 2023).

		A	B	C	D	E	F	G
Bi-LSTM	RMSE (MW)	505.260	443.031	171.086	642.550	374.347	295.792	146.728
	MAPE (%)	8.991	8.065	8.150	8.965	8.215	8.354	8.316
	R²	0.918	0.935	0.942	0.916	0.938	0.939	0.942
	Time (s)	36.7	32.5	31.6	36.7	32.3	31.8	31.4
Bi-GRU	RMSE (MW)	545.290	381.037	137.006	582.502	314.370	225.601	126.781
	MAPE (%)	6.962	6.556	6.382	6.963	6.453	6.395	3.367
	R²	0.953	0.965	0.967	0.955	0.960	0.958	0.968
	Time (s)	38.6	36.5	35.2	38.5	36.7	35.9	35.0
KAN–CNN	RMSE (MW)	530.220	401.083	148.183	612.609	328.329	250.622	99.723
	MAPE (%)	6.573	6.392	6.306	6.583	6.352	6.340	6.059
	R²	0.975	0.982	0.988	0.976	0.980	0.983	0.991
	Time (s)	15.6	14.5	13.6	15.5	14.3	13.9	13.2

Table 4. Comparison of model efficiency and parameter complexity.

Model	Parameter Tuning Method	Training Time (Hours)	Parameter Count	Hyperparameter Complexity
Bi-LSTM	Grid search and cross-validation	5.3	87,424	High
Bi-GRU	Grid search and cross-validation	4.5	66,624	High
KAN–CNN	Grid search and cross-validation	2	8288	Low

Table 5. Impact of different models on grid performance.

Model	Voltage Deviation (%)	Overload Rate (%)	Peak Load Reduction (MW)	Load Curve Smoothness (Index)
KAN–CNN	2.5	5.2	12.5	0.92
Bi-LSTM	4.1	9.4	9.8	0.85
Bi-GRU	3.8	7.8	10.2	0.88

Table 6. Impact of Forecasting Models on EVs Scheduling (9000 EVs) in region A.

Model	Total Operation Cost (CNY)	Average Charging Wait Time (min)	Station Utilization Rate (%)	Fleet Travel Efficiency (km/h)
KAN–CNN	9180	10.2	86.5	47.8
Bi-LSTM	10,170	15.5	78.4	42.5
Bi-GRU	9810	13.8	81.2	44.1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Pei, Z.; Zhang, Z.; Chen, J.; Liu, W.; Chen, B.; Huang, Y.; Yang, H.; Lu, Y. KAN–CNN: A Novel Framework for Electric Vehicle Load Forecasting with Enhanced Engineering Applicability and Simplified Neural Network Tuning. Electronics 2025, 14, 414. https://doi.org/10.3390/electronics14030414

AMA Style

Pei Z, Zhang Z, Chen J, Liu W, Chen B, Huang Y, Yang H, Lu Y. KAN–CNN: A Novel Framework for Electric Vehicle Load Forecasting with Enhanced Engineering Applicability and Simplified Neural Network Tuning. Electronics. 2025; 14(3):414. https://doi.org/10.3390/electronics14030414

Chicago/Turabian Style

Pei, Zhigang, Zhiyuan Zhang, Jiaming Chen, Weikang Liu, Bailian Chen, Yanping Huang, Haofan Yang, and Yijun Lu. 2025. "KAN–CNN: A Novel Framework for Electric Vehicle Load Forecasting with Enhanced Engineering Applicability and Simplified Neural Network Tuning" Electronics 14, no. 3: 414. https://doi.org/10.3390/electronics14030414

APA Style

Pei, Z., Zhang, Z., Chen, J., Liu, W., Chen, B., Huang, Y., Yang, H., & Lu, Y. (2025). KAN–CNN: A Novel Framework for Electric Vehicle Load Forecasting with Enhanced Engineering Applicability and Simplified Neural Network Tuning. Electronics, 14(3), 414. https://doi.org/10.3390/electronics14030414

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

KAN–CNN: A Novel Framework for Electric Vehicle Load Forecasting with Enhanced Engineering Applicability and Simplified Neural Network Tuning

Abstract

1. Introduction

2. Preliminaries

2.1. Structure of MLPs

2.2. Structure of KAN

2.3. Comparison Between MLP and KAN

2.4. Structure of CNN

3. Forecasting Based on KAN–CNN

3.1. C-KAN Convolutional Layer

3.2. Self-KAGN Attention

3.3. Focal KAGN Modulation

3.4. Regularization Techniques

3.5. Summary of the KAN–CNN

4. Case Studies

4.1. Experimental Environment

4.2. Evaluation Metrics

4.3. Prediction of Typical Data Across Different Seasons

4.4. Evaluation of Predictive Performance Under Complex and Fluctuating Scenarios

4.5. Comprehensive Prediction and Comparison in the Real World

4.6. KAN–CNN Parameter Adjustment Advantage Analysis

4.7. Evaluating the Capability of EV Load Forecasting to Provide Actionable Insights Under Uncertainty

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI