1. Introduction
There are many differentiators between planning and scheduling policy paradigms, but these two paradigms are complementary to each other. A pure policy assignment can be defined in terms of its applicability in the relevant disciplines such as “The difficulty of Policy is in identifying a set of actions that will transform the existing circumstance into one in which the target description is correct.” [
1,
2]. While Marsay, D. J. focuses on moving the policy world from specified beginning circumstances to a specified target state is the crux of the AI Policy issue through sequence of actions.” [
3]. Another viewpoint on the policy of planning task is to evaluate it as a design or synthesis activity. This perspective differs considerably from the usual approaches. The policy can be defined as an inventing, styling, or forming something, such as the pieces’ ranking of a thing through execution of actions, according to the dictionary [
4]. There are a few inferences that may be made about the nature of a policy assignment after examining the numerous definitions aforementioned. When policy is a job, the primary focus is developing a series of actions that will enable the problems necessary to achieve.
The many constraints that restrict the range of potential solutions frequently impose restrictions on the generated actions.
The policy is not a synthesis of activities’ sequence through possible known actions merely because the time element which defines the interdependence between these actions is uncharted.
One of the critical ways that various manufacturers promote and differentiate their EVs is by providing a vast network of charging stations throughout the advertising region or reducing the charging time schedule. Rapid technological advancements over the past several decades have led to both huge beneficial improvements in daily living but rising in pollution levels. Extensive and radical sustainable policies have been done to simplify electromobility globally to avoid polluting transportation emissions through secure actions’ (e.g., EVs, electric vehicles) to create clean cities: as cited in
Jan. 2018 about the Energy Future and Mobility innovation discussed by the Global Economic Summit. This desired transformation pushes the researchers to enhance the deep learning mechanisms to reduce the error in obtaining the best solution (i.e., reward) when assigning the object with low consumption cost to its destination, so candidate electricity and vehicle EVs to grid station
, [
5,
6] who suggested a reinforcement learning-based/free energy management system. The
Markov decision process has been used to represent the state space (i.e., the number of EVs), transition probability, action space (i.e., assignment condition), and reward function of the energy management system as a scheduling issue. Clean cities require some physical infrastructure layers (e.g., electric cars, charging stations, transformers, electric lines, etc.,) and cyberinfrastructure layers (e.g., IoT devices, sensor nodes, meters, monitoring devices, etc.,). These layers make up the majority of the electric vehicle system EVs [
7,
8] when using successive relations sketched as in
Figure 1 based on tackling through one of branches appeared in
Figure 2.
The authors accentuate that supporting the infrastructure layers of charging in congestion-ridden urban areas relies on understanding the long-term change in mobility to save them from pollution risk. To close the gap between the demands for EV charging (i.e., requests) [
9] and charging station availability (i.e., invitations), which is named bidirectional connectivity network (
biCN), the smart reinforcement scheduling policy learning (RL) is needed, which is one of the three keys of machine learning paradigms, whether supervised learning and unsupervised learning for tackling the network elements [
10]. The RL can be written by many programming language, the authors tend to formulate the desired (
biCN) by MATLAB and discuss their relations which consists of: the operator (O; assigning) learns and trains to accomplish a goal by interacting with the environment through selective policy (
π; Function steps) that manage the back propagation for the operator’s series of actions (
; prefered assigning index. An alleged policy is a function that when applied to a function by deep neural networks, maps every action of an operator that takes to the anticipated result or reward. The Actions Space is limited or continuous, with a combination of all appropriate actions in a certain group that allows different actions and often describe the actions space via the number of moves or the sequence of states (i.e., the policy that guide trajectory
; Electric vehicle) available to the operator and are represented by real-valued vectors. An operator (O) uses a rule known as a policy to determine what actions to be selected, which may be deterministic, in which case the alluded symbol
is typically used, or stochastic, usually denoted by
π. Therefore,
Figure 1 discusses the state, actions, parameters architecture distribution relationships based on the following relation expressed by
(hint, denotes “sampling” from the stochastic process)
,
the policy of fully observed is based on some actions using the states to make the observation like these actions. A state
is an exhaustive account of how things are in the world. As a result, every state had a place in the ecosystem under study. While the observation
just partially describes this state and may leave out important details. The operator could be able to see a state completely or in part, though. If only a portion, the operator creates an internal state (or state estimation). In order to observe states mechanism, the deep RL approach always uses a genuine array, vector, or polynomial tensor to observe it. The states are the significant features (parameterized) that affect the objective achievement in minimum time and errors, which are called parameterized policies and depend on a set of parameters
θ as aforementioned. State transitions are dependent exclusively on recent events and describe what changes occur in the environment timely
and the state at time
, where if it determinstic
or stochastic
and create the valid actions by the operator according to its policy. The rewards function R which pull the solution in specific paths (i.e., Trajectories) is critically important in searching the route by giving back a reward
, which can be simplified to
i.e., the value for one movement in the propagation for the state is representative usually as a real +ve|−ve number = the cumulative
for a whole trajectory τ at the end) and can be expressed as follow:
The selection policy frequently use discount benefits γ (0,1) received in the candidate trajectory as expressed as .
Nevertheless, the operator’s ultimate objective is to choose an ideal policy of action that optimizes the due reward (overall trajectories) and define the probability distributions by assuming stochastic policy and environmental changes that can be expressed as:
(
initial state probability) and
state transition probability of environment [
11,
12]
is action probability of agent. The
expected can be denoted by
J (π) that represent the charging stations’ actions
that can be expressed by
and from there in the illustrative case study in
Section 3, the primary optimization issue in RL may be articulated as alternative
, where
being the optimal
scheduling policy that emphasizes the importance of function value (i.e., selection criteria index discussed in Equation (8)) for a the pair
state-
action (
), which represent the EVs’ requests and the stations’ invitations. This issue has several origin of uncertainty because interconnections across different operators, such as EVs,
, the grid of electric power, connectivity networks (biCN), and the electricity supplier. The applications of deep RL in this field, focuses on the operational and demand response control between the pair
Q above and their energy consumption management [
13] with regard to the electric power grid [
14]. Therefore, the researchers classified the
policies to four
functions to reach the destination through minimum route and consuming less time. The first function indexing value (
+ve) is called
Action-value that provides the anticipated result if starting from the provided state (
), and performing any action (
) forever according to policy (
π) and delivers the return (
) if starting from the given state (
) and repeated along
policy π directly, while the function without action is named
negative policy value. The first function can be expressed by the
and the second is expressed by
. The second, if the function followed optimal policy whether
on value or
on action value can be expressed by
and
where both act according to optimal policy
at Equations (4) and (5), and the bellman style as in Equations (4a) and (5a).
Its fundamental tenet is that “the reward which is expected to gain from being there in a specific position, plus the value of wherever will arrive next,” sums up the worth of the beginning state. The authors note that the connection between
and
is as follows,
and the
and special self-consistency equations called Bellman equations are obeyed by the discussed functions according to relative action
. Therefore, the proportionate benefit of such action
is expressed by:
The taxonomy of reinforcement learning illustrates several algorithms that show the position of the proposed methodology as in
Figure 2.
Figure 2 shows that when the operator has access to an environment’s model through forecasting state transitions via specific functions and gains rewards [
15], it aims to learn an optimal policy as described by the model-based RL algorithms without actually acting out the policy aims to assigning some of the movement states that request some of the actions Q (
) [
16]. According to Zhang H. et al. the AlphaZero is interested in chess pieces’ movement issues, which are discussed in the Google DeepMind project (i.e., the predecessor played go to enhance their expectation which considered a scheduling issue) [
17]. The lack of scheduling in this issue demands instantaneous request behavior of the mayday case and consumes long time to generate solution. Therefore, the scheduling time is critical indicator [
18]. With more EVs on the road, it will be harder to locate connections. Although services such as ChargePoint or ChargeHub (e.g., cloud) give EVs drivers actual-time information guide about available
stations, the capability to reserve connectors for a later time has not yet been implemented [
19]. Therefore, using the autonomous internet of things (AIoT) enhances this drawback. The stations were selected according to many models, which helps in reducing the route such as Zero-only
action is unlike prior iterations of AlphaGo actions [
20,
21], whether Go or shogi rules were not initially known to the neural network training. The AI used reinforcement learning to practice playing against itself (i.e., attempting to understand oneself) until it could foresee its own movements and how they would impact the result of the game, in contrast AlphaGo involves slow training and consumes a long time to reach the same level [
22]. In the free models in the left side of
Figure 2, the operators often do not have access to a ground-truth representation of the environment’s model, which is the fundamental disadvantage and drawback of these algorithms, then need experience through behaving erratically in public environment. Therefore, the operators need learning to catch optimization for selected route or use called Q-learning. The functions of the free-model are enhanced directly by optimizing the parameters (
) by gradient ascent toward
, while Q-learning focuses on the value of the functions and their policies
that are managed by the neural network rules as discussed in Equations (6) and (7).
Moreover, described implementations of the deep RL technique to deal with a variety of new issues, including management rate of aggregated data, convert the data to arguments, flexible dynamical access for the network, bidirectional wireless catching in actual time as illustrated in the right side of
Figure 3, network security, etc., and regarding communication and networking. The Lei, L. et al. discuss also the managing of the DRL environment that consists of three layers (e.g., the perception, the network, and the application) by the autonomous IoT (AIoT) systems, which are the next generation of IoT systems [
23]. The main advantage of these models (i.e., model-Free RL) [
24] are principled techniques for policy optimization that lead to goals directly, which are more trustworthy and stable in deducing the results. Therefore, the authors integrated the Q-Learning and use the Policy-Optimization branch in this taxonomy for finding their proposed methodology. The proposed methodology relies on the Policy Gradient (e.g., Stochastic or Deterministic evaluated by Monte-Carlo or Actor-Critic have multi agents) [
25] and calculates
that focuses on increasing the performance of function indexing used by minimizing error when tackled as Bellman equation
(gap among destination and the gaind output position) and performs gradient ascent to directly maximize performance when following Equation (8) in
Section 3. If the policy optimization is proximal through maximizing a surrogate, objective function may be called PPO (proximal policy optimization). This integration aims to reduce the failure in mechanism behavior of DQN (deep Q-learning network) when obtaining the results to reach
as trying to enhance C51 method. The hybridization occurs when DDPG is implemented. The main objective is obtaining the shortest route between the selected beginning position and nearest destination (i.e.,
Electric charging station), which helps save time in service. Therefore, the method suggested by the Lee et al. and Ahmed M. Abed supports the paper objective and reduces the total trip time while accounting for the dynamic nature of traffic situations and unforeseen future charging requirements [
26,
27]. It returns the optimum route (
) and charging stations, which is considered as a challenging issue. The operator’s ultimate objective is to create a policy that yields the highest reward over an extended period of time [
28]. For the policy EV to charge properly, it is suggested to use RL to learn about generated government energy usage, which decreases energy consumption costs as well as encourages Evs’ marketing [
29,
30]. Therefore, many researchers discuss ways to increasing the power grid utilization to reach a steady state of load and to trying to prevent electric potential fluctuations reductions by using RL approaches [
31,
32,
33]. The famous model for selecting the shortest route between source position and destination is the Markov model, which has been mimicked by the proposed methodology, that schedule the assignment of states (EVs) to the preferred charging stations (
). Therefore, Wang, K. et al. and Wan, Y. et al. suggest to tackling the intended scheduling such as (MDP) because of its random behavior especially for the traffic parameter [
12,
34], while EVs arrival time in an uncertain case is based on the recommendations for using DRL [
30] to solve the shortcomings of the conventional model-based methods branch that require a smart model to be efficient, where delivering an EV charging approach is scalable and adaptable [
35]. A model-free real-time scheduling as illustrated in
Figure 2 of electric car charging relies on DRL [
36], which is considered an uncertain transition probability and is developed. The performance of proposed policy relies on forecasting method, predicting the readiness of
by the EV using an LSTM neural network to create an intelligent scheduling [
37], which present promising results. Therefore, the authors verify those results [
38], after mimicking their function formulation recommendations to adjust the operators and mimic the methodology.
2. The Policy Formulation and Research Contribution
The majority of the current research relies on inaccurate traditional econometric or time series approaches that have uncertain behavior, which motivate the authors to study the feasibility of EVs stations and their usability management via tuple parameters (state, action, transaction function, reward). The review guides to use RL with motivated searching mechanism that preferred to be a heuristic to expand the solution space [
39,
40]. Therefore, adopted
policy, a rewarding argument is the gained value from the transaction function
through sequences of decisions (i.e.,
Markov principles) enhanced by a metaheuristic technique. The possibility of obliteration and inflation makes the value of the future less valuable than a reward now,
. The movement of state (EV) relies on the available locations
in the backlogs-list (e.g., the layout map) having two cluster directions, unidirectional or bidirectional. The gained reward
tends to support policy
or policy
when
s was not on the proposed routes. The charter of the work is illustrated in
Table 1. To improve the searching mechanism of various computations is among the significant parameters to find the best hybridization to meet optimization processes based on fifth different metaheuristic algorithms [
41]. The first is the Scatter search, made famous by the Tabu search, and is hybridized with the immune algorithm. The second is the differential evolution algorithm (DE), which is called evolutionary computation because its behavior relies on iteratively trying to improve the candidate’s solution and solve the complex parameter optimization problem to optimize profitability regardless of time. Therefore, the researchers hybridize the Grey Wolf Optimizer (GWO) with the immune algorithm to minimize the total service time, [
42] as a third trial. On the other hand, the fourth hybridization is done between the Sine–cosine and Whale Optimization Algorithm, which creates initial random agents and requires them to fluctuate outwards or toward the best possible solution based on sine and cosine functions [
43,
44,
45,
46]. Finally, the process parameters were optimized using reinforcement learning and a Bayesian normalized NN utilizing the beetle antennae search (BASNNC) algorithm, which outperformed the previously discussed processes [
38]. Therefore, the proposed methodology selects the branch of “need-based,” as illustrated in
Figure 2 above, which matched with understanding the behavior of the tackling parameters through servicing (i.e., EVs charging operation) mainly if multiple constraints are considered. Therefore, this work tries to enhance this mechanism to find a preferred solution in the minimum time and reduce the service time under some of the abovementioned considerations. The proposed design consists of two recruitment networks; a deep Q network represent the relation between the state and actions
Q (
) for the best action-value function indexing discriminative features for the idle time. The suggested system does not need to know in advance variables such as arrival and departure times or power usage, in contrast to optimization-based techniques that can be managed by the autonomous clouding technique, because the neural network can estimate the right choice based on the present parameters (
). The paper discusses the reinforcement learning mechanism for optimization algorithms in the review section and constructs the formulation for the problem that targets reducing the total service time for EVs in the suitable station taking into account the minimum route distance to pair them.
Section 2 provides the paper charter that expounds the evaluation indicator, which highlights the contribution as presented in
Section 3.
Section 4 discusses how to analyze the DQN via fuzzy scheduling iterations to pick good distributions of EVs around the stations on the road by the proposed metaheuristic methodology. Finally, the results are discussed in
Section 5. In the end, the conclusions are presented and the drawbacks are discussed, as well as future work directions are envisaged. The conclusion also presents the comparative results with outputs mimicked.
4. The Methodology Description
The forward propagation of aggregated data inputs and the reverse propagation of errors are two bidirectional processes that make up the classic gradient-based BP neural network learning process. Training is repeated, and deviations and network weight changes are continually computed in the direction of the relative error function gradient. It takes time to get closer to the objective.
Traditional gradient-based BP neural networks do, however, have certain intrinsic drawbacks, such as delayed convergence and a propensity toward local optimums. Researchers often experiment with different activation functions, change the network layout, and enhance weight-definite approaches. The proposed methodology works in case of many requested stations at the same time. The aim is selecting the closest five EVs to specific station according to distance on the main road trip and their requested service. The analysis of the procedures requested to reduce the total idle time of EVs and increase the utilization of the charging stations, which is calculated by the OEE indicator. Since there are an infinite number of possible scenarios, it is impossible to keep the ideal answer for each one of them in a cloud database. This issue served as the impetus for us to create the AIoT-DQN algorithm, which employs a function to identify the optimal course of action in each situation and is controlled by the autonomous Internet of Things AIoT response [
39,
50], which suggests the weight-and-structure-determination (WASD) algorithms, i.e., technique of employing linearly independent or orthogonal polynomials as activation functions, and several other ways. The relative gradient descent technique used by the heuristic random search algorithm gives it a powerful global search capability integrating the APSO method with a neural network to improve network parameters [
40,
51]. The proposed model is based on hybridizing with three model-free rules to gain an advantage to tailor some of the heuristic steps empowering the searching mechanism, such as the DQN, and DDPG, and policy gradient as illustrated in
Figure 2. The proposed AIoT-DQN methodology was written in MATLAB to tackle the drawbacks in native DQN through two sequential phases [
52]. The first phase is interested in building the network for mimicking the environment illustrated in
Figure 5 when selecting the EVs and serviceable time at available stations via specific policies to get the minimum route between the car’s position and destination [
53]. The second phase is an interest in cost analysis. The different zones of any charging request procedures shown in left side of
Figure 5 represented by (
) have a sequence of actions (
). Zones (a, b, and d) are the BVA actions (i.e., idle time -service time) that are represented by (select many available stations, inspection procedures before charging, average setup time procedures, arrival distance, and waiting time). At the same time, zone (c) is the VA activity (i.e., service time) which must be scheduled in a minimum period. These zones were deduced from previous studies. Because of its large solution space, the authors resort to smart scheduling solution managed by the AIoT. If there are
of EVs requests and
stations
, the total number of potentially viable solution is equal to
. After analyzing all of the feasible alternatives, an optimal solution for a certain performance metric may be identified among these possibilities. For instance, if there are five EVs making a request at the same time, the
Q (
) =
alternatives, which consumes long computational time.
The proposed smart scheduling helps the EV drivers to plan their trip by selecting suitable stations on their roads and desired charging levels. These stations are not conditioned for full charging but charging to a particular level which reduces the queue time, and the remaining charge can be compensated on the road from the scheduled stations. The drivers can accept this plan or cancel to select another station for another charging policy (i.e., dataset’s list). The proposed methodology is identified through many parameters and decision variables as shown in
Table 3.
4.1. Phase I: The Smart Fuzzy Scheduling Formulation, Fecrease EVs Service Time
Many authors met to integrate the heuristic steps to enhance their founding solution with the minimum errors. A back propagation neural network based on particle swarm optimization was suggested after they investigated how to carefully choose input parameters to obtain desired outcomes [
54]. The conjugate gradient approach [
55], the least squares method, and better methods based on numerical optimization exist in addition to those mentioned above. Therefore, the heuristic step is an important approach that must be used in enhancing the solution. The smart fuzzy scheduling for DQN managed by autonomous Internet of Things (AIoT) depends on applying the proposed fuzzy metaheuristic steps expressed in
Table 4 called the decrease EVs service time (
), and arranges the output index [
] and relative index [
] in descending order as expressed in Equation (8), then constructs the Gantt chart in its first construction. This schedule presents more effective service-span path than some of the other published rules in the same context. Equation (9) discusses all possible solutions for assigning different EVs in a pair aspect in sequence. The first step computes the result of this rule and determines its priority (priority index). The second step arranges these indexes in descending order for groups of stations that aim to reduce waiting time. If two EVs follow in the arrangement loaded by one station, then test the best starting EV that saves time. This testing is the third step. After that, it reschedules EVs for every station alone in its waiting time by sliding EVs to finish two processes simultaneously, still rescheduling till it stops reducing the total idle time. Optimality is achieved when rescheduling the EVs by the same route with the next assumption. The acronym used in the proposed equation, (
), this rule is compared with
Lee et al. and other published rules; this model is effective in most examples. The formula relies on six parameters as:
ES finalEVs | Earliest start request time of final EVs estimated for certain station. |
ETfinalEVs | Executing request time of final EVs. |
ES firstlEVs | Earliest start request time to first EVs estimated assigned to a station. |
ES predecessorEVs | Earliest start request time of predecessor of first EVs estimated time. |
ETpredecessorEVs | Executing request time for EVs selected. |
ETfirstlEVs | Executing request time for first EVs estimated chosen time. |
The methodology has pseudo-code composed of ten sequential steps as follow:
4.2. Phase II: Cost Analysis Formulation
According to actions distribution shown in
Figure 6, the priority for requests to be sequenced to
is its servicing time, where the shortest one for a request is finishing the request service earlier. One drawback is that the EVs request that picked the longest service time will be serviced last in scheduling, even though it has a high priority. Therefore, find the
expressed in Equation (10) i.e., the make span index of the processing actions to choose the serviced EVs.
The route length under cost consideration of specific service span generated by policy function can be expressed by Equation (10), where the cost is according to branch
but not to the proposed trip route, plus to all
that are serviceable, or if one of them are disposed due to time consideration.
Subject to:
where
and
.
For each potential realization of
, Constraints (12) and (16) maintain track till the end of period
or return to station at the end of period
. The block of
represents the capacity of the requested service at the end of the schedule time
, which follows Exponential and Weibull behavior throughout the day from 5 a.m.: 11 a.m. and from 5 p.m.: 11 p.m. respectively. Constraint (13) generates a priority index SI as Palmer’s algorithm, i.e., the job ordering. The policy mechanism shown in Equations (11)–(16) should be reformulated to be a traceable model according to index
value, and the suggested cost analysis should be used to regulate the uncertainty behavior of the right-hand sides of the variables
,
as discussed in Equation (17).
Some researchers reported that ANN, but due to the trajectory, tended to become entrapped in a local optimum, therefore is not used for training alone. This is the main reason that some researchers show that hybrid approaches perform better than the traditional ANN [
28,
56]. This paper observed the Zap map for a long time and generated 120 random examples for requests from three to six EVs simultaneously at a specific station as selected according to Equations (9)–(16). The proposed methodology explains an illustrative example of serving six actions for six different EVs by tackling the DQN network to serve the EVs in minimum time. The proposed methodology focuses on significant parameters illustrated in
Figure 7 and
Figure 8 and requested actions according to their policy
that guides the driver to the shortest path to the station. The AIoT tackles this hybridization to enhance the efficiency of (biCN) or the bi-directional relationship between the EVs and available stations. While BASNNC search algorithm was used to optimize the process parameters only via reinforcement learning and were superior to the previous mechanisms aforementioned [
38]. Therefore, the illustrative methodology examples are compared by them to the study’s efficiency via the OEE indicator.
The time of obtaining results for examples of some EVs that need some of the stations’ actions to be served ranges between 60 and 488 s according to its complexity. The analysis of the results as discussed in the next section of the paper show that the significant factors affect the response of reducing total service time as illustrated in
Figure 7 and
Figure 8 above, where the actions such as charging point dev., inspection time, and the Battery size must be controlled to manage the average service span-time illustrates in
Figure 9. The recharge service RcS problem is modelled in this work and mimicked as
Markov movement steps with uncertain transition probability. A deep Q (
) network with function ranking as discussed in Equation (8) approximation has been utilized to find the optimum EVCS selection policy.
The fuzzy group
set is arranged in a descending order, this means the
precedes
, and
precedes
On the other hand, the remaining actions have been tested in
which are arranged in ascending order, where
,
. Therefore,
precedes
and so on as shown in
Table 5, which expounds 180 direct potential relations and 6! indirect relations.
{1.0/1} | {0.9/4, 1.0/5} | {1.0/2, 0.9/3} |
{1.0/2, 0.2/4} | {1.0/5} | {0.7/2, 1.0/3} |
{0.5/4, 1.0/5} | {1.0/7} | {1.0/6} |
{1.0/4} | {1.0/9} | {1.0/3, 0.9/4} |
{1.0/5, 0.9/6} | {1.0/2, 0.8/3} | {1.0/4} |
Service after added | Service after added | Service after added |
{1.0/1} | {0.9/5, 1.0/6} | {0.9/7, 1.0/8, 0.9/9} |
{1.0/3, 0.2/5} | {0.9/10, 1.0/11} | {0.7/12, 0.9/13, 1.0/14} |
{0.5/7, 1.0/8, 0.2/9, 0.2/10} | {0.9/15, 1.0/16} | {0.9/21, 1.0/22} |
{0.5/11, 1.0/12, 0.2/13, 0.2/14} | {0.9/22, 1.0/23} | {0.9/25, 1.0/26, 0.9/27} |
{0.5/6, 1.0/17, 0.9/18, 0.2/19, 0.2/20} | {0.9/24, 1.0/25, 0.8/26} | {0.2/29, 1.0/30, 0.9/31} |
Therefore, requests more than N of EVs are a complex problem and needs to be programed on a suitable mobile application. The fuzzy intervals for expected service time according to requested actions (e.g., inspection time) by the EVs according to the bidirectional relationship between the stations and EVs on the road for the illustrative case study are discussed in
Table 5 with aggregated data, which constructs the deep Q network illustrated in
Figure 11, and other actions also have its intervals.
5. Results Analysis
In this work, we use data aggregation to train the reinforcement learning of the suggested approach using counting of EV charging data from the city of Dundee’s open data site, which presents statistics expound various EV requests [
53,
57]. Each charging request provides details about the chargers’ monikers, start and end times, energy consumption, power output, and physical locations, as well as the shortest distance, reaching time, and idle time required to accomplish the charging operation and the various six actions. Every studied station has three distinct types of chargers: chargers (7 kW, slow), (22 kW, fast), and (>43 kW, rapid) [
49,
57]. In the present study, a four-months observed requests dataset aggregating across (127 days) is used to produce the descriptive statistics. The excluded outliers, or 0.83%, are the charging times at rapid type that differ by
from the median (the total number of quick charger charging requests used in this investigation is 4645). According to
Table 6, the rapid chargers’ standard deviation is 24.08 min, whereas their typical charging time is 516.5 min. There are five main factors that affect the charging sequence acceleration of EVs such as battery size (the larger the battery capacity, the longer it takes to charge), battery status (empty vs. full, or when it is half full), high vehicle charging rate, high charging rate point, and weather (charging time tends to be longer at lower temperatures, especially when using a fast charger). Moreover, EVs are less efficient at lower temperatures. So, too much travel distance cannot be added according to the charging time. The descriptive statistics of EVs charging requests used in the training study region are described as shown in the
Table 6 for a long four months. The preferred scheduling obtained from (
) after 59 iterations in 1-EVs size to 9116 iterations in 6-EVs size to advice the EV drivers and stations
by the shared beneficial interests paired together by managing of the AIoT to the deep Q network illustrated in
Figure 10 according to the utilization% illustrated in the
Figure 11. In case of assigning more than 150 EVs in the same time for specific station which generate non available time. Moreover, if the problem size increases more than 559, EVs will not get any scheduling solution by (BASNNC) algorithm, while the (
) generates solution in time over 5.7 min, while BASNNC fails over 1374 EVs. The absolute average for ideal EVs service time is (644.58/52) = 10.73 h/EVs. The scheduling of the different actions for requested charging orders according to Bayesian regularized BASNNC search algorithm illustrated in
Figure 12 to calculate the average time of charging six EVs per hours i.e., 55 h/Six EVs (9.16 h/EV). While the proposed scheduling according to AIoT if integrated in managing the DQN reduces the idle time of the stations and the waiting time of the EVs to gain their requests actions as illustrated in
Figure 13 to average 41.25 h/Six EVs (6.875 h/EV). Both are affected by the behavior of the index
throughout the day every 10 min. The average of
and
at period 5 a.m.: 11 p.m., and the
and
at the period 5 p.m.: 11 p.m. for four months. The tracing of the EVs’ requests behavior reveals that the following exponential distribution in the morning changes to Weibull in the afternoon along the studying interval, which presents the lowest error in servicing time expectation as illustrated in
Figure 14 and
Figure 15.
The part of results of the 2145 generated hypothetical examples dealing with the EVs requests or stations invitations through (
biCN) managed by the AIoT for four months from all EVs to execute six potential actions for specific stations checks the condition of the shortest route between EVs’ requests and the stations on the main three roads stored in dataset at the downtown are listed in the
Table 7.
MATLAB is used to predict the total service time to proposed methodology and BASNNC. The aggregated data are classified into four groups according to the problem size, the first is only three EVs up to six EVs and executed all main actions at assigned stations. The average ideal time approximates 9.167 h, while the proposed discuss average approximates 8.562 h, while BASNNC presents average 10.42 h OEE indicates that (
) methodology has 72.4% for generated 2145 (i.e., 43 × 50) hypothetical examples and 59% for other algorithm. The worst results of the hypothetical examples shown in
Table 7 according to the proposed methodology (
) have been chosen and tackled by using grey wolf optimization (GWO) and extracting other four groups (
) for resolving their scheduling and obtaining the solutions shown in
Table 8. The worst results of the hypothetical examples are shown in
Table 8 and extract other three examples having large scale (more than eight EVs) according to the proposed algorithm (
) have been chosen again and tackled by using the third metaheuristic optimization named (Sine-cosine & Whale; SCW) for resolving their scheduling to check its efficiency [
58] and obtain the solutions shown in
Table 9. The average service time of proposed (
) if compared by the GWO is superior by only 4%.
Table 7,
Table 8 and
Table 9 indicate the test of the average service time of Q (s, a) groups, consisting of 50 hypothetical examples that have the same number of EVs and requested actions with different arrangements, and wait for the solution to be extracted after 60 s or less after running the methodology by the MATLAB code. The authors noticed the failure of SCW in obtaining solutions in time and for a long time, reaching 26 min for all these examples that have more than seventeen EVs’ requests simultaneously. While the solutions of GWO and SCW are close to the size of the problems, they are less than six EVs/point charger/station. The proposed methodology is superior to SCW, GWO, and over six EVs up to eight. The average service time of proposed (
) if compared by the GWO is superior by only 15%.