The Neural Network Classifier Works Efficiently on Searching in DQN Using the Autonomous Internet of Things Hybridized by the Metaheuristic Techniques to Reduce the EVs’ Service Scheduling Time

M. Abed, Ahmed; AlArjani, Ali

doi:10.3390/en15196992

Open AccessArticle

The Neural Network Classifier Works Efficiently on Searching in DQN Using the Autonomous Internet of Things Hybridized by the Metaheuristic Techniques to Reduce the EVs’ Service Scheduling Time

by

Ahmed M. Abed

^1,2,*

and

Ali AlArjani

¹

Department of Industrial Engineering, College of Engineering, Prince Sattam Bin Abdulaziz University, Alkharj 16273, Saudi Arabia

²

Industrial Engineering Department, Zagazig University, Zagazig 44519, Egypt

^*

Author to whom correspondence should be addressed.

Energies 2022, 15(19), 6992; https://doi.org/10.3390/en15196992

Submission received: 24 August 2022 / Revised: 15 September 2022 / Accepted: 19 September 2022 / Published: 23 September 2022

(This article belongs to the Section E: Electric Vehicles)

Download

Browse Figures

Versions Notes

Abstract

:

Since the rules and regulations strongly emphasize environmental preservation and greenhouse gas GHG reduction, researchers have progressively noticed a shift in the transportation means toward electromobility. Several challenges must be resolved to deploy EVs, beginning with improving network accessibility and bidirectional interoperability, reducing the uncertainty related to the availability of suitable charging stations on the trip path and reducing the total service time. Therefore, suggesting DQN supported by AIoT to pair EVs’ requests and station invitations to reduce idle queueing time is crucial for long travel distances. The author has written a proposed methodology in MATLAB to address significant parameters such as the battery charge level, trip distance, nearby charging stations, and average service time. The effectiveness of the proposed methodology is derived from hybridizing the meta-heuristic techniques in searching DQN learning steps to obtain a solution quickly and improve the servicing time by 34%, after solving various EV charging scheduling difficulties and congestion control and enabling EV drivers to policy extended trips. The work results obtained from more than 2145 training hypothetical examples for EVs’ requests were compared with the Bayesian Normalized Neural Network (BASNNC) algorithm, which hybridize the Beetle Antennae Search and Neural Network Classifier, and with other methods such as Grey Wolf Optimization (GWO) and Sine-cosine and Whale optimization, revealing that the mean overall comparison efficiencies in error reduction were 72.75%, 58.7%, and 18.2% respectively.

Keywords:

location-based scheduling; EV charging station; intelligent transport system; EV charging navigation system; Markov decision process; deep reinforcement learning DQN

1. Introduction

There are many differentiators between planning and scheduling policy paradigms, but these two paradigms are complementary to each other. A pure policy assignment can be defined in terms of its applicability in the relevant disciplines such as “The difficulty of Policy is in identifying a set of actions that will transform the existing circumstance into one in which the target description is correct.” [1,2]. While Marsay, D. J. focuses on moving the policy world from specified beginning circumstances to a specified target state is the crux of the AI Policy issue through sequence of actions.” [3]. Another viewpoint on the policy of planning task is to evaluate it as a design or synthesis activity. This perspective differs considerably from the usual approaches. The policy can be defined as an inventing, styling, or forming something, such as the pieces’ ranking of a thing through execution of actions, according to the dictionary [4]. There are a few inferences that may be made about the nature of a policy assignment after examining the numerous definitions aforementioned. When policy is a job, the primary focus is developing a series of actions that will enable the problems necessary to achieve.

The many constraints that restrict the range of potential solutions frequently impose restrictions on the generated actions.

The policy is not a synthesis of activities’ sequence through possible known actions merely because the time element which defines the interdependence between these actions is uncharted.

One of the critical ways that various manufacturers promote and differentiate their EVs is by providing a vast network of charging stations throughout the advertising region or reducing the charging time schedule. Rapid technological advancements over the past several decades have led to both huge beneficial improvements in daily living but rising in pollution levels. Extensive and radical sustainable policies have been done to simplify electromobility globally to avoid polluting transportation emissions through secure actions’ (e.g., EVs, electric vehicles) to create clean cities: as cited in Jan. 2018 about the Energy Future and Mobility innovation discussed by the Global Economic Summit. This desired transformation pushes the researchers to enhance the deep learning mechanisms to reduce the error in obtaining the best solution (i.e., reward) when assigning the object with low consumption cost to its destination, so candidate electricity and vehicle EVs to grid station

δ_{i}

, [5,6] who suggested a reinforcement learning-based/free energy management system. The Markov decision process has been used to represent the state space (i.e., the number of EVs), transition probability, action space (i.e., assignment condition), and reward function of the energy management system as a scheduling issue. Clean cities require some physical infrastructure layers (e.g., electric cars, charging stations, transformers, electric lines, etc.,) and cyberinfrastructure layers (e.g., IoT devices, sensor nodes, meters, monitoring devices, etc.,). These layers make up the majority of the electric vehicle system EVs [7,8] when using successive relations sketched as in Figure 1 based on tackling through one of branches appeared in Figure 2.

The authors accentuate that supporting the infrastructure layers of charging in congestion-ridden urban areas relies on understanding the long-term change in mobility to save them from pollution risk. To close the gap between the demands for EV charging (i.e., requests) [9] and charging station availability (i.e., invitations), which is named bidirectional connectivity network (biCN), the smart reinforcement scheduling policy learning (RL) is needed, which is one of the three keys of machine learning paradigms, whether supervised learning and unsupervised learning for tackling the network elements [10]. The RL can be written by many programming language, the authors tend to formulate the desired (biCN) by MATLAB and discuss their relations which consists of: the operator (O; assigning) learns and trains to accomplish a goal by interacting with the environment through selective policy (π; Function steps) that manage the back propagation for the operator’s series of actions (

a_{t}

; prefered assigning index. An alleged policy is a function that when applied to a function by deep neural networks, maps every action of an operator that takes to the anticipated result or reward. The Actions Space is limited or continuous, with a combination of all appropriate actions in a certain group that allows different actions and often describe the actions space via the number of moves or the sequence of states (i.e., the policy that guide trajectory

τ

; Electric vehicle) available to the operator and are represented by real-valued vectors. An operator (O) uses a rule known as a policy to determine what actions to be selected, which may be deterministic, in which case the alluded symbol

μ

is typically used, or stochastic, usually denoted by π. Therefore, Figure 1 discusses the state, actions, parameters architecture distribution relationships based on the following relation expressed by

a_{t} = μ (s_{t}) \to O_{t}, a_{t} ~ π_{θ} (a_{t} | O_{t})

(hint, denotes “sampling” from the stochastic process)

and π_{θ} (a_{t} | s_{t})

,

where

the policy of fully observed is based on some actions using the states to make the observation like these actions. A state

s_{t}

is an exhaustive account of how things are in the world. As a result, every state had a place in the ecosystem under study. While the observation

O_{t}

just partially describes this state and may leave out important details. The operator could be able to see a state completely or in part, though. If only a portion, the operator creates an internal state (or state estimation). In order to observe states mechanism, the deep RL approach always uses a genuine array, vector, or polynomial tensor to observe it. The states are the significant features (parameterized) that affect the objective achievement in minimum time and errors, which are called parameterized policies and depend on a set of parameters θ as aforementioned. State transitions are dependent exclusively on recent events and describe what changes occur in the environment timely

(s_{t})

and the state at time

(s_{t + 1})

, where if it determinstic

f (s_{t}, a_{t})

or stochastic

P (. | s_{t}, a_{t})

and create the valid actions by the operator according to its policy. The rewards function R which pull the solution in specific paths (i.e., Trajectories) is critically important in searching the route by giving back a reward

r_{t} = R (s_{t}, a_{t}, s_{t + 1})

, which can be simplified to

r_{t} = R (s_{t}) | R (s_{t}, a_{t})

i.e., the value for one movement in the propagation for the state is representative usually as a real +ve|−ve number = the cumulative

r_{t}

for a whole trajectory τ at the end) and can be expressed as follow:

R (τ) = \sum_{t = 0}^{T - 1} r_{t} = \sum_{t = 0}^{T - 1} R (s_{t}, a_{t}, s_{t + 1})

(1)

The selection policy frequently use discount benefits γ (0,1) received in the candidate trajectory as expressed as

R (τ) = \sum_{t = 0}^{\infty} γ^{t} \cdot r_{t} = γ^{0} \cdot r_{0} + γ^{1} \cdot r_{1} + γ^{2} \cdot r_{2} + \dots

.

Nevertheless, the operator’s ultimate objective is to choose an ideal policy of action that optimizes the due reward (overall trajectories) and define the probability distributions by assuming stochastic policy and environmental changes that can be expressed as:

P (τ | π) = ρ_{0} (s_{0}) \prod_{t = 0}^{T - 1} P (s_{t + 1} | s_{t}, a_{t}) \cdot π (a_{t} | s_{t})

(2)

(

ρ_{0}

initial state probability) and

P (s_{t + 1} | s_{t}, a_{t}) is

state transition probability of environment [11,12]

π (a_{t} | s_{t})

is action probability of agent. The expected

(r_{t})

can be denoted by J (π) that represent the charging stations’ actions

δ_{i}^{a}

that can be expressed by

J (π) = \int_{τ}^{n} P (τ | π) \cdot R (τ) = E_{τ ~ π} [R (τ)]

(3)

and from there in the illustrative case study in Section 3, the primary optimization issue in RL may be articulated as alternative

π^{*} = {a r g m a x |}_{π} J (π)

, where

π^{*}

being the optimal scheduling policy that emphasizes the importance of function value (i.e., selection criteria index discussed in Equation (8)) for a the pair state-action (

Q (s_{t}, a_{t})

), which represent the EVs’ requests and the stations’ invitations. This issue has several origin of uncertainty because interconnections across different operators, such as EVs,

δ_{i}

, the grid of electric power, connectivity networks (biCN), and the electricity supplier. The applications of deep RL in this field, focuses on the operational and demand response control between the pair Q above and their energy consumption management [13] with regard to the electric power grid [14]. Therefore, the researchers classified the policies to four functions to reach the destination through minimum route and consuming less time. The first function indexing value (+ve) is called Action-value that provides the anticipated result if starting from the provided state (

s_{t}^{τ}

), and performing any action (

a_{t}

) forever according to policy (π) and delivers the return (

r_{t}

) if starting from the given state (

s_{t}^{τ}

) and repeated along policy π directly, while the function without action is named negative policy value. The first function can be expressed by the

Q^{π} (s_{t}, a_{t}) = E_{τ ~ π} [R (τ) | s_{0} = s_{t}, a_{0} = a_{t}]

and the second is expressed by

V^{π} (s) = E_{τ ~ π} [R (τ) | s_{0} = s]

. The second, if the function followed optimal policy whether on value or on action value can be expressed by

V^{*} (s)

and

Q^{*} (s, a)

where both act according to optimal policy

π^{*}

at Equations (4) and (5), and the bellman style as in Equations (4a) and (5a).

V^{*} (s) = \max_{π} E_{τ ~ π} [R (τ) | s_{0} = s]

(4)

V^{*} (s) = \max_{a} E_{s^{'} ~ P} [r (s, a) + γ \cdot V^{*} (s^{'})]

(4a)

Q^{*} (s, a) = \max_{π} E_{τ ~ π} [R (τ) | s_{0} = s, a_{0} = a]

(5)

Q^{*} (s, a) = E_{s^{'} ~ P} [r (s, a) + γ \cdot \max_{a^{'}} Q^{*} (s^{'}, a^{'})]

(5a)

Its fundamental tenet is that “the reward which is expected to gain from being there in a specific position, plus the value of wherever will arrive next,” sums up the worth of the beginning state. The authors note that the connection between

V^{π o r *}

and

Q^{π o r *}

is as follows,

V^{π} (s) = E_{a ~ π} [Q^{π} (s, a)]

and the

V^{*} (s) = \max_{a} Q^{*} (s, a)

and special self-consistency equations called Bellman equations are obeyed by the discussed functions according to relative action

a_{t}

. Therefore, the proportionate benefit of such action is expressed by:

A^{π} (s, a) = Q^{π} (s, a) - V^{π} (s)

(6)

The taxonomy of reinforcement learning illustrates several algorithms that show the position of the proposed methodology as in Figure 2.

Figure 2 shows that when the operator has access to an environment’s model through forecasting state transitions via specific functions and gains rewards [15], it aims to learn an optimal policy as described by the model-based RL algorithms without actually acting out the policy aims to assigning some of the movement states that request some of the actions Q (

s_{t}^{τ}, a_{i}^{δ}

) [16]. According to Zhang H. et al. the AlphaZero is interested in chess pieces’ movement issues, which are discussed in the Google DeepMind project (i.e., the predecessor played go to enhance their expectation which considered a scheduling issue) [17]. The lack of scheduling in this issue demands instantaneous request behavior of the mayday case and consumes long time to generate solution. Therefore, the scheduling time is critical indicator [18]. With more EVs on the road, it will be harder to locate connections. Although services such as ChargePoint or ChargeHub (e.g., cloud) give EVs drivers actual-time information guide about available

δ_{i}

stations, the capability to reserve connectors for a later time has not yet been implemented [19]. Therefore, using the autonomous internet of things (AIoT) enhances this drawback. The stations were selected according to many models, which helps in reducing the route such as Zero-only action is unlike prior iterations of AlphaGo actions [20,21], whether Go or shogi rules were not initially known to the neural network training. The AI used reinforcement learning to practice playing against itself (i.e., attempting to understand oneself) until it could foresee its own movements and how they would impact the result of the game, in contrast AlphaGo involves slow training and consumes a long time to reach the same level [22]. In the free models in the left side of Figure 2, the operators often do not have access to a ground-truth representation of the environment’s model, which is the fundamental disadvantage and drawback of these algorithms, then need experience through behaving erratically in public environment. Therefore, the operators need learning to catch optimization for selected route or use called Q-learning. The functions of the free-model are enhanced directly by optimizing the parameters (

θ

) by gradient ascent toward

J (π_{θ})

, while Q-learning focuses on the value of the functions and their policies

π_{θ} (a | s)

that are managed by the neural network rules as discussed in Equations (6) and (7).

V^{π} (s) = E_{a ~ π, s^{'} ~ P} [r (s, a) + γ \cdot V^{π} (s^{'})]

(6a)

Q^{π} (s, a) = E_{s^{'} ~ P} [r (s, a) + γ \cdot E_{a^{'} ~ π} [Q^{π} (s^{'}, a^{'})]]

(7)

Moreover, described implementations of the deep RL technique to deal with a variety of new issues, including management rate of aggregated data, convert the data to arguments, flexible dynamical access for the network, bidirectional wireless catching in actual time as illustrated in the right side of Figure 3, network security, etc., and regarding communication and networking. The Lei, L. et al. discuss also the managing of the DRL environment that consists of three layers (e.g., the perception, the network, and the application) by the autonomous IoT (AIoT) systems, which are the next generation of IoT systems [23]. The main advantage of these models (i.e., model-Free RL) [24] are principled techniques for policy optimization that lead to goals directly, which are more trustworthy and stable in deducing the results. Therefore, the authors integrated the Q-Learning and use the Policy-Optimization branch in this taxonomy for finding their proposed methodology. The proposed methodology relies on the Policy Gradient (e.g., Stochastic or Deterministic evaluated by Monte-Carlo or Actor-Critic have multi agents) [25] and calculates

Q_{θ} (s, a)

that focuses on increasing the performance of function indexing used by minimizing error when tackled as Bellman equation

Q^{*} (s, a)

(gap among destination and the gaind output position) and performs gradient ascent to directly maximize performance when following Equation (8) in Section 3. If the policy optimization is proximal through maximizing a surrogate, objective function may be called PPO (proximal policy optimization). This integration aims to reduce the failure in mechanism behavior of DQN (deep Q-learning network) when obtaining the results to reach

Q^{*}

as trying to enhance C51 method. The hybridization occurs when DDPG is implemented. The main objective is obtaining the shortest route between the selected beginning position and nearest destination (i.e., Electric charging station), which helps save time in service. Therefore, the method suggested by the Lee et al. and Ahmed M. Abed supports the paper objective and reduces the total trip time while accounting for the dynamic nature of traffic situations and unforeseen future charging requirements [26,27]. It returns the optimum route (

r_{t}

) and charging stations, which is considered as a challenging issue. The operator’s ultimate objective is to create a policy that yields the highest reward over an extended period of time [28]. For the policy EV to charge properly, it is suggested to use RL to learn about generated government energy usage, which decreases energy consumption costs as well as encourages Evs’ marketing [29,30]. Therefore, many researchers discuss ways to increasing the power grid utilization to reach a steady state of load and to trying to prevent electric potential fluctuations reductions by using RL approaches [31,32,33]. The famous model for selecting the shortest route between source position and destination is the Markov model, which has been mimicked by the proposed methodology, that schedule the assignment of states (EVs) to the preferred charging stations (

δ_{π}

). Therefore, Wang, K. et al. and Wan, Y. et al. suggest to tackling the intended scheduling such as (MDP) because of its random behavior especially for the traffic parameter [12,34], while EVs arrival time in an uncertain case is based on the recommendations for using DRL [30] to solve the shortcomings of the conventional model-based methods branch that require a smart model to be efficient, where delivering an EV charging approach is scalable and adaptable [35]. A model-free real-time scheduling as illustrated in Figure 2 of electric car charging relies on DRL [36], which is considered an uncertain transition probability and is developed. The performance of proposed policy relies on forecasting method, predicting the readiness of

δ_{i}

by the EV using an LSTM neural network to create an intelligent scheduling [37], which present promising results. Therefore, the authors verify those results [38], after mimicking their function formulation recommendations to adjust the operators and mimic the methodology.

2. The Policy Formulation and Research Contribution

The majority of the current research relies on inaccurate traditional econometric or time series approaches that have uncertain behavior, which motivate the authors to study the feasibility of EVs stations and their usability management via tuple parameters (state, action, transaction function, reward). The review guides to use RL with motivated searching mechanism that preferred to be a heuristic to expand the solution space [39,40]. Therefore, adopted policy, a rewarding argument is the gained value from the transaction function

\sum_{t = 1}^{n} J {(π)}_{θ}

through sequences of decisions (i.e., Markov principles) enhanced by a metaheuristic technique. The possibility of obliteration and inflation makes the value of the future less valuable than a reward now,

r_{t \to 0} \geq r_{t \to n}

. The movement of state (EV) relies on the available locations

J {(π)}_{t}

in the backlogs-list (e.g., the layout map) having two cluster directions, unidirectional or bidirectional. The gained reward

r_{t}

tends to support policy

π_{A}

or policy

π_{B},

when

J (π)

s was not on the proposed routes. The charter of the work is illustrated in Table 1. To improve the searching mechanism of various computations is among the significant parameters to find the best hybridization to meet optimization processes based on fifth different metaheuristic algorithms [41]. The first is the Scatter search, made famous by the Tabu search, and is hybridized with the immune algorithm. The second is the differential evolution algorithm (DE), which is called evolutionary computation because its behavior relies on iteratively trying to improve the candidate’s solution and solve the complex parameter optimization problem to optimize profitability regardless of time. Therefore, the researchers hybridize the Grey Wolf Optimizer (GWO) with the immune algorithm to minimize the total service time, [42] as a third trial. On the other hand, the fourth hybridization is done between the Sine–cosine and Whale Optimization Algorithm, which creates initial random agents and requires them to fluctuate outwards or toward the best possible solution based on sine and cosine functions [43,44,45,46]. Finally, the process parameters were optimized using reinforcement learning and a Bayesian normalized NN utilizing the beetle antennae search (BASNNC) algorithm, which outperformed the previously discussed processes [38]. Therefore, the proposed methodology selects the branch of “need-based,” as illustrated in Figure 2 above, which matched with understanding the behavior of the tackling parameters through servicing (i.e., EVs charging operation) mainly if multiple constraints are considered. Therefore, this work tries to enhance this mechanism to find a preferred solution in the minimum time and reduce the service time under some of the abovementioned considerations. The proposed design consists of two recruitment networks; a deep Q network represent the relation between the state and actions Q (

s_{t}^{τ}, a_{t}^{i}

) for the best action-value function indexing discriminative features for the idle time. The suggested system does not need to know in advance variables such as arrival and departure times or power usage, in contrast to optimization-based techniques that can be managed by the autonomous clouding technique, because the neural network can estimate the right choice based on the present parameters (

θ_{i}

). The paper discusses the reinforcement learning mechanism for optimization algorithms in the review section and constructs the formulation for the problem that targets reducing the total service time for EVs in the suitable station taking into account the minimum route distance to pair them. Section 2 provides the paper charter that expounds the evaluation indicator, which highlights the contribution as presented in Section 3. Section 4 discusses how to analyze the DQN via fuzzy scheduling iterations to pick good distributions of EVs around the stations on the road by the proposed metaheuristic methodology. Finally, the results are discussed in Section 5. In the end, the conclusions are presented and the drawbacks are discussed, as well as future work directions are envisaged. The conclusion also presents the comparative results with outputs mimicked.

3. Contribution of This Study

The contribution is based on the inquiry about the advantages of using hybrid approaches to be used to maximize the research functions in uncertain networks through elective policy systems. This policy aims to determine minimum route length between the EVs and candidate stations and achieve the related objective which is minimum service time.

A new routing metaheuristic approach is suggested in order to save expenses, time, and eliminate hardship of travel.
Analyzing how suggested stations affect the effectiveness of routing heuristics.

λ_{t}^{Q}

and

λ_{t}^{V}

are specified as the target values of Q-functions and value functions, respectively, to derive the loss functions. When using value-based approaches, the regression loss may be used to assess how successfully NN mimics Q functions or value functions.

The Environment Layout Map Description

The proposed policy seeks for selecting the shortest route length between main successive EVs requests and their expected service time. There are five sequential actions discussed through Markov to identify the shortest route in lowest time, when modeling the RL (travelling state

(s_{t})

to a desired place, looking up pricing information, serviceable actions

(a_{t})

, setting up if necessary, and making other quick decisions). The effectiveness and execution of any subsequent actions policy have four tactical choices affecting the result [48]. The analyzed environment (i.e., layout design and its chosen locations) sketched as a rectangular-like illustration is presented in Figure 3.

Shape of Charger Station	Rectangular
Number of stations	$2 \leq J_{π} \leq 2100$
Main starting position	Bottom, Right, Up
Street span	As safety instructed
Policy of Service	$\sum_{t = 0}^{n} \sum_{a = 1}^{m} π_{t}^{a}$
Proposed system	Dynamic and stochastic

In assignment policy, priority facility movement and routing policy are considered the basis for the suggested methodology. The layout design is stressed as an essential component impacting the process performance and routing distance in the proposed partitioning of the map according to the studied characteristics, where the policy consuming time to complete assigning n (locations) to

δ_{i}

indicated equals the number of desired selected locations. The rear and front walls of the borders are close by and parallel to the two end streets. Each place on the layout has a definition of each symbol, which is also represented in Table 2. Numerous potential stations are suggested on the right or left of the roadway in the proposed layout plan, which was created using Google Maps and is framed in a fixed two-dimensional rectangular interior [19], as shown in Figure 4. Through “Zap Map,” a website displays charging stations accessible in several European nations and a wealth of beneficial information to assist the EVs drivers charge the vehicle quickly and securely via creating most preferred scheduling map according to all requests done in the same time road, to locate the finest station for your needs, such as minimum charging time, waiting time, and closer to the main flight route.

The charging stations

δ_{i} (π)

is ascending where

i \in {1, 2, \dots, m}

is

(m - 1) N + 1

, the second station may be available as

δ_{2}

is

(J - 1) N + 3

, …,

(J - 1) N + i,

and so on. The distance between the two stations

δ_{i}

and

δ_{i + 1}

must be the shortest and calculated by Equation (8).

λ_{i} = m i n \sum_{j = 1}^{n} [∆ l (2 N - δ i_{x} - δ i_{y}) + 2 ∆ B + | δ_{x} - δ_{y} | ∆ A + 2 ∆ D, ∆ l (δ i_{x} + δ i_{y} - 2) + 2 ∆ F + | δ_{x} - δ_{y} | ∆ A

(8)

where

δ i_{x} = [\frac{x}{m}], \dots i_{x} = x - [\frac{x}{m}] N, \dots δ j_{y} = [\frac{y}{m}], \dots i_{y} = y - [\frac{y}{m}] N

.

At the start of each period

(t)

when requested, the drivers must determine the number of serviceable stations

(δ)

and if they are serviceable.

The originality of the suggested method in creating recruitment datasets needed to train the algorithm via simulator because of dearth of datasets [27,32,49]. The sequential stages of proposed methodology characteristics are illustrated in Figure 5.

4. The Methodology Description

The forward propagation of aggregated data inputs and the reverse propagation of errors are two bidirectional processes that make up the classic gradient-based BP neural network learning process. Training is repeated, and deviations and network weight changes are continually computed in the direction of the relative error function gradient. It takes time to get closer to the objective.

Traditional gradient-based BP neural networks do, however, have certain intrinsic drawbacks, such as delayed convergence and a propensity toward local optimums. Researchers often experiment with different activation functions, change the network layout, and enhance weight-definite approaches. The proposed methodology works in case of many requested stations at the same time. The aim is selecting the closest five EVs to specific station according to distance on the main road trip and their requested service. The analysis of the procedures requested to reduce the total idle time of EVs and increase the utilization of the charging stations, which is calculated by the OEE indicator. Since there are an infinite number of possible scenarios, it is impossible to keep the ideal answer for each one of them in a cloud database. This issue served as the impetus for us to create the AIoT-DQN algorithm, which employs a function to identify the optimal course of action in each situation and is controlled by the autonomous Internet of Things AIoT response [39,50], which suggests the weight-and-structure-determination (WASD) algorithms, i.e., technique of employing linearly independent or orthogonal polynomials as activation functions, and several other ways. The relative gradient descent technique used by the heuristic random search algorithm gives it a powerful global search capability integrating the APSO method with a neural network to improve network parameters [40,51]. The proposed model is based on hybridizing with three model-free rules to gain an advantage to tailor some of the heuristic steps empowering the searching mechanism, such as the DQN, and DDPG, and policy gradient as illustrated in Figure 2. The proposed AIoT-DQN methodology was written in MATLAB to tackle the drawbacks in native DQN through two sequential phases [52]. The first phase is interested in building the network for mimicking the environment illustrated in Figure 5 when selecting the EVs and serviceable time at available stations via specific policies to get the minimum route between the car’s position and destination [53]. The second phase is an interest in cost analysis. The different zones of any charging request procedures shown in left side of Figure 5 represented by (

J i

) have a sequence of actions (

a_{i}

). Zones (a, b, and d) are the BVA actions (i.e., idle time -service time) that are represented by (select many available stations, inspection procedures before charging, average setup time procedures, arrival distance, and waiting time). At the same time, zone (c) is the VA activity (i.e., service time) which must be scheduled in a minimum period. These zones were deduced from previous studies. Because of its large solution space, the authors resort to smart scheduling solution managed by the AIoT. If there are

N

of EVs requests and

m

stations

δ_{i}

, the total number of potentially viable solution is equal to

{(N!)}^{m}

. After analyzing all of the feasible alternatives, an optimal solution for a certain performance metric may be identified among these possibilities. For instance, if there are five EVs making a request at the same time, the Q (

s_{t}^{5}, a_{t}^{5}

) =

2.48 \times 10^{8}

alternatives, which consumes long computational time.

The proposed smart scheduling helps the EV drivers to plan their trip by selecting suitable stations on their roads and desired charging levels. These stations are not conditioned for full charging but charging to a particular level which reduces the queue time, and the remaining charge can be compensated on the road from the scheduled stations. The drivers can accept this plan or cancel to select another station for another charging policy (i.e., dataset’s list). The proposed methodology is identified through many parameters and decision variables as shown in Table 3.

4.1. Phase I: The Smart Fuzzy Scheduling Formulation, Fecrease EVs Service Time

Many authors met to integrate the heuristic steps to enhance their founding solution with the minimum errors. A back propagation neural network based on particle swarm optimization was suggested after they investigated how to carefully choose input parameters to obtain desired outcomes [54]. The conjugate gradient approach [55], the least squares method, and better methods based on numerical optimization exist in addition to those mentioned above. Therefore, the heuristic step is an important approach that must be used in enhancing the solution. The smart fuzzy scheduling for DQN managed by autonomous Internet of Things (AIoT) depends on applying the proposed fuzzy metaheuristic steps expressed in Table 4 called the decrease EVs service time (

d E V s S_{t}

), and arranges the output index [

α

] and relative index [

α_{r}

] in descending order as expressed in Equation (8), then constructs the Gantt chart in its first construction. This schedule presents more effective service-span path than some of the other published rules in the same context. Equation (9) discusses all possible solutions for assigning different EVs in a pair aspect in sequence. The first step computes the result of this rule and determines its priority (priority index). The second step arranges these indexes in descending order for groups of stations that aim to reduce waiting time. If two EVs follow in the arrangement loaded by one station, then test the best starting EV that saves time. This testing is the third step. After that, it reschedules EVs for every station alone in its waiting time by sliding EVs to finish two processes simultaneously, still rescheduling till it stops reducing the total idle time. Optimality is achieved when rescheduling the EVs by the same route with the next assumption. The acronym used in the proposed equation, (

d E V s S_{t}

), this rule is compared with Lee et al. and other published rules; this model is effective in most examples. The formula relies on six parameters as:

ES _finalEVs	Earliest start request time of final EVs estimated for certain station.
ET_finalEVs	Executing request time of final EVs.
ES _firstlEVs	Earliest start request time to first EVs estimated assigned to a station.
ES _{predecessorEVs}	Earliest start request time of predecessor of first EVs estimated time.
ET_{predecessorEVs}	Executing request time for EVs selected.
ET_firstlEVs	Executing request time for first EVs estimated chosen time.

{(d E V s S_{t})}_{α} = \frac{{ES}_{finalEVs} + {ET}_{finalEVs} - {ES}_{firstlEVs}}{{ES}_{predecessorEVs} + {ET}_{predecessorEVs} + {ET}_{firstlEVs}}

(9)

The methodology has pseudo-code composed of ten sequential steps as follow:

4.2. Phase II: Cost Analysis Formulation

According to actions distribution shown in Figure 6, the priority for requests to be sequenced to

δ_{i}

is its servicing time, where the shortest one for a request is finishing the request service earlier. One drawback is that the EVs request that picked the longest service time will be serviced last in scheduling, even though it has a high priority. Therefore, find the

C_{m a x} (π)

expressed in Equation (10) i.e., the make span index of the processing actions to choose the serviced EVs.

C_{m a x} (π) = \max_{0 \leq t_{1} \leq t_{2} \dots \leq t_{m - 1} \leq N} (\sum_{j = 0}^{t 1} p_{π} (a_{j}) 1 + \sum_{j = t_{1}}^{t 2} p_{π} (a_{j}) 2 + \dots + \sum_{j = t_{m - 1}}^{N} p_{π} (a_{j}) m)

(10)

The route length under cost consideration of specific service span generated by policy function can be expressed by Equation (10), where the cost is according to branch

(π_{A})

but not to the proposed trip route, plus to all

δ_{i, j}

that are serviceable, or if one of them are disposed due to time consideration.

λ_{i}^{d} = \min \sum_{t = 1}^{T} [h^{π A} (I_{0}^{S} + \sum_{i = 1}^{n} (r_{i} - ∆ B (C_{i}^{π A} - C_{i}^{π B}))) + I_{t}^{S} + C_{i}^{π A} λ_{i} S_{t}^{π A} + C_{i}^{π B} λ_{i} S_{t}^{π B} + C^{d π B} X_{t}^{D}] + h^{π A} 𝛽_{t} 𝜏_{t}^{r} + T h^{π B} I_{0}^{R} + \sum_{i = 1}^{t} (T - t + 1) {\bar{r}}_{t}^{A B}

(11)

S_{t}^{π A} = S_{0}^{π A} + \sum_{k = 1}^{t - 1} (d_{k} + {\bar{r}}_{t}^{A B}) S_{t k}^{π A}, \dots \forall 1 < t < T

(12)

S_{t}^{π B} = S_{0}^{π B} + \sum_{k = 1}^{t - 1} (d_{k} + {\bar{r}}_{t}^{A B}) S_{t k}^{π B}, \dots \forall 1 < t < T

(13)

X_{t}^{D} = X_{0}^{D} + \sum_{k = 1}^{t - 1} (d_{k} + {\bar{r}}_{t}^{A B}) X_{t k}^{D}, \dots \forall 1 < t < T

(14)

I_{t}^{S} = I_{t 0}^{S} + \sum_{k = 1}^{t - 1} (d_{k} + {\bar{r}}_{t}^{A B}) I_{t k}^{π b}, \dots \forall 1 < t < T

(15)

Subject to:

I_{t}^{S} \leq h^{π A} [I_{0}^{S} + \sum_{i = 1}^{t} (S_{i}^{π A} + S_{i}^{π B} - {\bar{d}}_{i}) + q_{t} τ_{t}^{d} + \sum_{i = 1}^{t} ϑ_{i t}] \dots 1 \leq t \leq T, d \in λ_{i}^{d}

(15a)

I_{t}^{S} \leq b [I_{0}^{S} + \sum_{i = 1}^{t} (S_{i}^{π A} + S_{i}^{π B} - {\bar{d}}_{i}) + q_{t} τ_{t}^{d} + \sum_{i = 1}^{t} ϑ_{i t}] \dots 1 \leq t \leq T, d \in λ_{i}^{d}

(15b)

S I_{i} = - \sum_{δ = 1}^{m} \frac{[δ - (2 J - 1)] t_{i j}}{2}

(16)

where

S_{t}^{π A}, S_{t}^{π B}, X_{t}^{D} > 0

and

S I_{i} \geq S I_{i + 1} \geq S I_{i + 2} \dots S I_{N}

.

For each potential realization of

δ

, Constraints (12) and (16) maintain track till the end of period

t

or return to station at the end of period

t

. The block of

\sum_{i = 1}^{t} (S_{i}^{π A} + S_{i}^{π B} - {\bar{d}}_{i}) = τ_{t}^{d} / I_{0}^{R | S}

represents the capacity of the requested service at the end of the schedule time

t

, which follows Exponential and Weibull behavior throughout the day from 5 a.m.: 11 a.m. and from 5 p.m.: 11 p.m. respectively. Constraint (13) generates a priority index SI as Palmer’s algorithm, i.e., the job ordering. The policy mechanism shown in Equations (11)–(16) should be reformulated to be a traceable model according to index

α

value, and the suggested cost analysis should be used to regulate the uncertainty behavior of the right-hand sides of the variables

S_{t}^{π A}, S_{t}^{π B}, X_{t}^{D}

,

I_{t}^{S},

as discussed in Equation (17).

α_{0} + \sum_{t = 1}^{T} (α_{t} a_{t} + γ_{t} {\hat{a}}_{t}) \leq 0 \forall - γ_{t} \leq a_{t} \leq γ_{t}, a n d 1 \leq t < T

(17)

Some researchers reported that ANN, but due to the trajectory, tended to become entrapped in a local optimum, therefore is not used for training alone. This is the main reason that some researchers show that hybrid approaches perform better than the traditional ANN [28,56]. This paper observed the Zap map for a long time and generated 120 random examples for requests from three to six EVs simultaneously at a specific station as selected according to Equations (9)–(16). The proposed methodology explains an illustrative example of serving six actions for six different EVs by tackling the DQN network to serve the EVs in minimum time. The proposed methodology focuses on significant parameters illustrated in Figure 7 and Figure 8 and requested actions according to their policy

π

that guides the driver to the shortest path to the station. The AIoT tackles this hybridization to enhance the efficiency of (biCN) or the bi-directional relationship between the EVs and available stations. While BASNNC search algorithm was used to optimize the process parameters only via reinforcement learning and were superior to the previous mechanisms aforementioned [38]. Therefore, the illustrative methodology examples are compared by them to the study’s efficiency via the OEE indicator.

The time of obtaining results for examples of some EVs that need some of the stations’ actions to be served ranges between 60 and 488 s according to its complexity. The analysis of the results as discussed in the next section of the paper show that the significant factors affect the response of reducing total service time as illustrated in Figure 7 and Figure 8 above, where the actions such as charging point dev., inspection time, and the Battery size must be controlled to manage the average service span-time illustrates in Figure 9. The recharge service RcS problem is modelled in this work and mimicked as Markov movement steps with uncertain transition probability. A deep Q (

s_{t}^{n}, a_{t}^{τ}

) network with function ranking as discussed in Equation (8) approximation has been utilized to find the optimum EVCS selection policy.

The fuzzy group

F G_{i} = {α_{a 1 \to a 3} = J {(E V s)}_{1} = 6.25, α_{a 4 \to a 2} = 3.8, α_{a 4 \to a 6} = 2}

set is arranged in a descending order, this means the

a_{1 ~} J {(E V s)}_{1}

precedes

a_{4 ~} J {(E V s)}_{4} \approx J_{4}

, and

a_{4}

precedes

a_{6 ~} J {(E V s)}_{6} \approx J_{6}, a_{2 ~} J {(E V s)}_{2} \approx J_{2} .

On the other hand, the remaining actions have been tested in

F G_{j}

which are arranged in ascending order, where

α_{a 3 \to a 6} = 0.3142

,

α_{a 6 \to a 3} = 0.44

. Therefore,

a_{6 ~} J {(E V s)}_{6} \approx J_{6}

precedes

a_{3 ~} J {(E V s)}_{3} \approx J_{3}

and so on as shown in Table 5, which expounds 180 direct potential relations and 6! indirect relations.

{1.0/1}	{0.9/4, 1.0/5}	{1.0/2, 0.9/3}
{1.0/2, 0.2/4}	{1.0/5}	{0.7/2, 1.0/3}
{0.5/4, 1.0/5}	{1.0/7}	{1.0/6}
{1.0/4}	{1.0/9}	{1.0/3, 0.9/4}
{1.0/5, 0.9/6}	{1.0/2, 0.8/3}	{1.0/4}
Service after added $λ_{1}^{d}$	Service after added $λ_{2}^{d}$	Service after added $λ_{3}^{d}$
{1.0/1}	{0.9/5, 1.0/6}	{0.9/7, 1.0/8, 0.9/9}
{1.0/3, 0.2/5}	{0.9/10, 1.0/11}	{0.7/12, 0.9/13, 1.0/14}
{0.5/7, 1.0/8, 0.2/9, 0.2/10}	{0.9/15, 1.0/16}	{0.9/21, 1.0/22}
{0.5/11, 1.0/12, 0.2/13, 0.2/14}	{0.9/22, 1.0/23}	{0.9/25, 1.0/26, 0.9/27}
{0.5/6, 1.0/17, 0.9/18, 0.2/19, 0.2/20}	{0.9/24, 1.0/25, 0.8/26}	{0.2/29, 1.0/30, 0.9/31}

Therefore, requests more than N of EVs are a complex problem and needs to be programed on a suitable mobile application. The fuzzy intervals for expected service time according to requested actions (e.g., inspection time) by the EVs according to the bidirectional relationship between the stations and EVs on the road for the illustrative case study are discussed in Table 5 with aggregated data, which constructs the deep Q network illustrated in Figure 11, and other actions also have its intervals.

5. Results Analysis

In this work, we use data aggregation to train the reinforcement learning of the suggested approach using counting of EV charging data from the city of Dundee’s open data site, which presents statistics expound various EV requests [53,57]. Each charging request provides details about the chargers’ monikers, start and end times, energy consumption, power output, and physical locations, as well as the shortest distance, reaching time, and idle time required to accomplish the charging operation and the various six actions. Every studied station has three distinct types of chargers: chargers (7 kW, slow), (22 kW, fast), and (>43 kW, rapid) [49,57]. In the present study, a four-months observed requests dataset aggregating across (127 days) is used to produce the descriptive statistics. The excluded outliers, or 0.83%, are the charging times at rapid type that differ by

\pm 3 σ

from the median (the total number of quick charger charging requests used in this investigation is 4645). According to Table 6, the rapid chargers’ standard deviation is 24.08 min, whereas their typical charging time is 516.5 min. There are five main factors that affect the charging sequence acceleration of EVs such as battery size (the larger the battery capacity, the longer it takes to charge), battery status (empty vs. full, or when it is half full), high vehicle charging rate, high charging rate point, and weather (charging time tends to be longer at lower temperatures, especially when using a fast charger). Moreover, EVs are less efficient at lower temperatures. So, too much travel distance cannot be added according to the charging time. The descriptive statistics of EVs charging requests used in the training study region are described as shown in the Table 6 for a long four months. The preferred scheduling obtained from (

dEVs S_{t}

) after 59 iterations in 1-EVs size to 9116 iterations in 6-EVs size to advice the EV drivers and stations

δ_{i}

by the shared beneficial interests paired together by managing of the AIoT to the deep Q network illustrated in Figure 10 according to the utilization% illustrated in the Figure 11. In case of assigning more than 150 EVs in the same time for specific station which generate non available time. Moreover, if the problem size increases more than 559, EVs will not get any scheduling solution by (BASNNC) algorithm, while the (

dEVs S_{t}

) generates solution in time over 5.7 min, while BASNNC fails over 1374 EVs. The absolute average for ideal EVs service time is (644.58/52) = 10.73 h/EVs. The scheduling of the different actions for requested charging orders according to Bayesian regularized BASNNC search algorithm illustrated in Figure 12 to calculate the average time of charging six EVs per hours i.e., 55 h/Six EVs (9.16 h/EV). While the proposed scheduling according to AIoT if integrated in managing the DQN reduces the idle time of the stations and the waiting time of the EVs to gain their requests actions as illustrated in Figure 13 to average 41.25 h/Six EVs (6.875 h/EV). Both are affected by the behavior of the index

τ_{t}^{d} / I_{0}^{R | S}

throughout the day every 10 min. The average of

τ_{t}^{d} = 35

and

I_{0}^{R | S} = 12

at period 5 a.m.: 11 p.m., and the

τ_{t}^{d} = 45

and

I_{0}^{R | S} = 8

at the period 5 p.m.: 11 p.m. for four months. The tracing of the EVs’ requests behavior reveals that the following exponential distribution in the morning changes to Weibull in the afternoon along the studying interval, which presents the lowest error in servicing time expectation as illustrated in Figure 14 and Figure 15.

The part of results of the 2145 generated hypothetical examples dealing with the EVs requests or stations invitations through (biCN) managed by the AIoT for four months from all EVs to execute six potential actions for specific stations checks the condition of the shortest route between EVs’ requests and the stations on the main three roads stored in dataset at the downtown are listed in the Table 7.

MATLAB is used to predict the total service time to proposed methodology and BASNNC. The aggregated data are classified into four groups according to the problem size, the first is only three EVs up to six EVs and executed all main actions at assigned stations. The average ideal time approximates 9.167 h, while the proposed discuss average approximates 8.562 h, while BASNNC presents average 10.42 h OEE indicates that (

dEVs S_{t}

) methodology has 72.4% for generated 2145 (i.e., 43 × 50) hypothetical examples and 59% for other algorithm. The worst results of the hypothetical examples shown in Table 7 according to the proposed methodology (

dEVs S_{t}

) have been chosen and tackled by using grey wolf optimization (GWO) and extracting other four groups (

F G_{31} : F G_{34}

) for resolving their scheduling and obtaining the solutions shown in Table 8. The worst results of the hypothetical examples are shown in Table 8 and extract other three examples having large scale (more than eight EVs) according to the proposed algorithm (

dEVs S_{t}

) have been chosen again and tackled by using the third metaheuristic optimization named (Sine-cosine & Whale; SCW) for resolving their scheduling to check its efficiency [58] and obtain the solutions shown in Table 9. The average service time of proposed (

dEVs S_{t}

) if compared by the GWO is superior by only 4%.

Table 7, Table 8 and Table 9 indicate the test of the average service time of Q (s, a) groups, consisting of 50 hypothetical examples that have the same number of EVs and requested actions with different arrangements, and wait for the solution to be extracted after 60 s or less after running the methodology by the MATLAB code. The authors noticed the failure of SCW in obtaining solutions in time and for a long time, reaching 26 min for all these examples that have more than seventeen EVs’ requests simultaneously. While the solutions of GWO and SCW are close to the size of the problems, they are less than six EVs/point charger/station. The proposed methodology is superior to SCW, GWO, and over six EVs up to eight. The average service time of proposed (

dEVs S_{t}

) if compared by the GWO is superior by only 15%.

6. Conclusions

Many authors tried to integrate the heuristic steps to enhance their founding solution with the minimum errors [39,54,55]. This work tries to modify the DQN searching mechanism for tackling the uncertain EVs requests and the electric charging stations’ invitations to find the preferred assignment in the minimum time reliving it by guiding it to the closest electrical charging station. This bidirectional connectivity is managed by the AIoT, which tries to achieve two objectives. The first is to reduce its scheduling of service time and must be on the trip’s road through analyzing the route distance length. Therefore, the proposed methodology consists of two sequential phases, the first aims to constructing a fuzzy metaheuristic scheduling steps enhancing the searching in the DQN for the best action-value function (Equation (8)). Figure 9 discusses the network managing by autonomous Internet of Things (AIoT) to manage the requests and invitations for the EVs during their trips for decreasing EVs service time (

dEVs S_{t}

) [29], and gained important feature, which are not required in advance variables such as arrival and departure times or power usage, in contrast to optimization-based techniques that can be manage by the autonomous clouding technique, because the neural network can estimate the right choice based on the present parameters

(θ_{i})

. The second phase carried out the cost analysis for assigning specific

N

of EVs to preferred m of station to begin extracting the Gantt chart for distributing EVs at the same time to receive request actions, especially if taken advantage of scheduling EVs’ servicing distribution exponential or Weibull for the charging actions rate during service as illustrated in Figure 14 and Figure 15. The proposed method is verified by computing the OEE for proposed and comparative presented by Qing Wu, [38]. Because of its large solution space

{(N!)}^{m}

, the authors resort to smart scheduling solution programming. The proposed AIoT-DQN methodology was written in MATLAB to tackle the drawbacks in native DQN [52,59]. Therefore, the metaheuristic was a preferred mechanism to tackling the problem. The aggregated data are classified into four groups according to the problem size, the first is only three EVs up to six EVs and executed all main actions at assigned station. The average ideal time approximates 9.167 h and the proposed method has an average of approximately 6.875 h, while BASNNC presents an average of 10.47 h. The OEE indicator indicates that equal 79.06% for (

dEVs S_{t}

) methodology and 58.7% for other algorithms discussed above. Table 10 shows some comparative indicators to check the scope of superiority of (

dEVs S_{t}

) over BASNNC and BASNNC algorithm in finding scheduling distribution for different EVs that need some service actions at the same time.

Author Contributions

Conceptualization, A.M.A.; Data curation, A.M.A. and A.A.; Formal analysis, A.M.A.; Funding acquisition, A.M.A.; Investigation, A.M.A.; Methodology, A.M.A.; Project administration, A.M.A. and A.A.; Software, A.A.; Supervision, A.M.A.; Validation, A.M.A.; Visualization, A.M.A. and A.A.; Writing—original draft, A.M.A.; Writing—review & editing, A.M.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

References

Allen, J.; Koomen, J. Planning using a temporal world model. In Proceedings of the Eighth International Joint Conference on Artificial Intelligence, Karlsruhe, Germany, 8–12 August 1983; Volume 2, pp. 741–747. [Google Scholar]
Bartak, B. Slot Models for Schedulers Enhanced by planning Capabilities. In Proceedings of the 19th Workshop of the UK Planning and Scheduling Special Interest Group, Milton Keynes, UK, 14–15 December 2000; pp. 11–24. [Google Scholar]
Marsay, D.J. Uncertainty in Planning: Adapting the framework of Game Theory. In Proceedings of the 19th Workshop of the UK Planning and Scheduling Special Interest Group, Milton Keynes, UK, 14–15 December 2000; pp. 101–107. [Google Scholar]
Noronha, S.J.; Sarma, V.V.S. Knowledge-Based Approaches for Scheduling Problems: A Survey. IEEE Trans. Knowl. Data Eng. 1991, 3, 160–171. [Google Scholar] [CrossRef]
Kim, S.; Lim, H. Reinforcement Learning Based Energy Management Algorithm for Smart Energy Buildings. Energies 2018, 11, 2010. [Google Scholar] [CrossRef]
Qian, T.; Shao, C.; Wang, X.; Shahidehpour, M. Deep Reinforcement Learning for EV Charging Navigation by Coordinating Smart Grid and Intelligent Transportation System. IEEE Trans. Smart Grid 2020, 11, 1714–1723. [Google Scholar] [CrossRef]
Zhang, Y.; Yu, X.; Guo, D.; Yin, Y.; Zhang, Z. Weights and structure determination of multiple-input feed-forward neural network activated by Chebyshev polynomials of Class 2 via cross-validation. Neural Comput. Appl. 2014, 25, 1761–1770. [Google Scholar] [CrossRef]
Silva, F.C.; Ahmed, M.A.; Martínez, J.M.; Kim, Y.-C. Design and Implementation of a Blockchain-Based Energy Trading Platform for Electric Vehicles in Smart Campus Parking Lots. Energies 2019, 12, 4814. [Google Scholar] [CrossRef]
Schwemmle, N. Short-Term Spatio-Temporal Demand Pattern Predictions of Trip Demand. Master’s Thesis, Katholieke Universiteit Leuven, Leuven, Belgium, 2021. Available online: https://zenodo.org/record/4514435#.YRZTNYgzbIU (accessed on 6 February 2021).
Wang, R.; Chen, Z.; Xing, Q.; Zhang, Z.; Zhang, T. A Modified Rainbow-Based Deep Reinforcement Learning Method for Optimal Scheduling of Charging Station. Sustainability 2022, 14, 1884. [Google Scholar] [CrossRef]
Soldan, F.; Bionda, E.; Mauri, G.; Celaschi, S. Short-term forecast of EV charging stations occupancy probability using big data streaming analysis. arXiv 2021, arXiv:2104.12503. [Google Scholar]
Wan, Y.; Qin, J.; Ma, Q.; Fu, W.; Wang, S. Multi-agent DRL-based data-driven approach for PEVs charging/discharging scheduling in smart grid. J. Frankl. Inst. 2022, 359, 1747–1767. [Google Scholar] [CrossRef]
Lee, K.-B.; AAhmed, M.; Kang, D.-K.; Kim, Y.-C. Deep Reinforcement Learning Based Optimal Route and Charging Station Selection. Energies 2020, 13, 6255. [Google Scholar] [CrossRef]
Yang, J.-Y.; Chou, L.-D.; Chang, Y.-J. Electric-Vehicle Navigation System Based on Power Consumption. IEEE Trans. Veh. Technol. 2016, 65, 5930–5943. [Google Scholar] [CrossRef]
Motz, M.; Huber, J.; Weinhardt, C. Forecasting BEV charging station occupancy at work places. In Informatik 2020; Reussner, R.H., Koziolek, A., Heinrich, R., Eds.; Gesellschaft für Informatik: Bonn, Germany, 2021; p. 771e81. [Google Scholar] [CrossRef]
Schrittwieser, J.; Antonoglou, I.; Hubert, T.; Simonyan, K.; Sifre, L.; Schmitt, S.; Guez, A.; Lockhart, E.; Hassabis, D.; Graepel, T.; et al. Mastering Atari, Go, chess and shogi by Policyning with a learned model. Nature 2020, 588, 604–609. [Google Scholar] [CrossRef] [PubMed]
Zhang, H.; Yu, T. AlphaZero. In Deep Reinforcement Learning; Dong, H., Ding, Z., Zhang, S., Eds.; Springer: Singapore, 2020. [Google Scholar] [CrossRef]
Engel, H.; Hensley, R.; Knupfer, S.; Sahdev, S. Charging Ahead: Electric-Vehicle Infrastructure Demand; McKinsey Center for Future Mobility: New York, NY, USA, 2018. [Google Scholar]
Sawers, P. (2022). Google Maps Will Now Show Real-Time Availability of Electric Vehicle Charging Stations. 2019. Available online: https://venturebeat.com/2019/04/23/google-maps-will-now-show-real-time-availability-of-charging-stations-for-electric-cars/ (accessed on 1 April 2022).
Shioda, M.; Ito, T. Learning of Evaluation Functions on Mini-Shogi Using Self-playing Game Records. In Proceedings of the International Conference on Technologies and Applications of Artificial Intelligence (TAAI), Taipei, Taiwan, 3–5 December 2020; pp. 41–46. [Google Scholar] [CrossRef]
Amara-Ouali, Y.; Goude, Y.; Massart, P.; Poggi, J.M.; Yan, H. A review of electric vehicle load open data and models. Energies 2021, 14, 2233. [Google Scholar] [CrossRef]
François-Lavet, V.; Henderson, P.; Islam, R.; Bellemare, M.G.; Pineau, J. An Introduction to Deep Reinforcement Learning. Found. Trends Mach. Learning 2018, 11, 219–354, arXiv:1811.12560. Available online: https://arxiv.org/abs/1811.12560 (accessed on 30 November 2018). [CrossRef]
Ji, Y.; Wang, J.; Xu, J.; Fang, X.; Zhang, H. Real-time energy management of a microgrid using deep reinforcement learning. Energies 2019, 12, 2291. [Google Scholar] [CrossRef]
Sadeghianpourhamami, N.; Deleu, J.; Develder, C. Definition and Evaluation of Model-Free Coordination of Electrical Vehicle Charging with Reinforcement Learning. IEEE Trans. Smart Grid 2020, 11, 203–214. [Google Scholar] [CrossRef]
Gu, S.; Lillicrap, T.; Ghahramani, Z.; Turner, R.E.; Levine, S. Qprop: Sample-efficient policy gradient with an off-policy critic. arXiv 2016, arXiv:1611.02247. [Google Scholar]
Lei, L.; Tan, Y.; Zheng, K.; Liu, S.; Zhang, K.; Shen, X. Deep Reinforcement Learning for Autonomous Internet of Things: Model, Applications and Challenges. IEEE Commun. Surv. Tutor. 2020, 22, 1722–1760. [Google Scholar] [CrossRef]
Abed, A.M.; Elattar, S. Minimize the Route Length Using Heuristic Method Aided with Simulated Annealing to Reinforce Lean Management Sustainability. Processes 2020, 8, 495. [Google Scholar] [CrossRef]
Subramanian, A.; Chitlangia, S.; Baths, V. Reinforcement learning and its connections with neuroscience and psychology. Neural Netw. 2022, 145, 271–287. [Google Scholar] [CrossRef]
Lee, S.; Choi, D.-H. Energy Management of Smart Home with Home Appliances, Energy Storage System and Electric Vehicle: A Hierarchical Deep Reinforcement Learning Approach. Sensors 2020, 20, 2157. [Google Scholar] [CrossRef]
Abdullah, H.M.; Gastli, A.; Ben-Brahim, L. Reinforcement Learning Based EV Charging Management Systems–A Review. IEEE Access 2021, 9, 41506–41531. [Google Scholar] [CrossRef]
Mostafa, S.; Loay, I.; Ahmed, M. Machine Learning-Based Management of Electric Vehicles Charging: Towards Highly-Dispersed Fast Chargers. Energies 2020, 13, 5429. [Google Scholar]
Liu, Y.; Chen, W.; Huang, Z. Reinforcement Learning-Based Multiple Constraint Electric Vehicle Charging Service Scheduling. Math. Probl. Eng. 2021, 2021, 1401802. [Google Scholar] [CrossRef]
Konstantina, V.; Wolfgang, K.; John, C. Smart Charging of Electric Vehicles Using Reinforcement Learning. In Proceedings of the Workshops at the Twenty-Seventh AAAI Conference on Artificial Intelligence, Bellevue, WA, USA, 14–18 July 2013; pp. 41–48. Available online: https://www.researchgate.net/publication/286726772_Smart_charging_of_electric_vehicles_using_reinforcement_learning (accessed on 24 February 2022).
Wang, K.; Wang, H.; Yang, J.; Feng, J.; Li, Y.; Zhang, S.; Okoye, M.O. Electric vehicle clusters scheduling strategy considering real-time electricity prices based on deep reinforcement learning. Energy Rep. 2022, 8 (Suppl. 4), 695–703. [Google Scholar] [CrossRef]
Tuchnitz, F.; Ebell, N.; Schlund, J.; Pruckner, M. Development and Evaluation of a Smart Charging Strategy for an Electric Vehicle Fleet Based on Reinforcement Learning. Appl. Energy 2021, 285, 116382. [Google Scholar] [CrossRef]
Wan, Z.; Li, H.; He, H.; Prokhorov, D. Model-Free Real-Time EV Charging Scheduling Based on Deep Reinforcement Learning. IEEE Trans. Smart Grid 2019, 10, 5246–5257. [Google Scholar] [CrossRef]
Ma, T.; Faye, S. Multistep electric vehicle charging station occupancy prediction using hybrid LSTM neural networks. Energy 2022, 244 Pt B, 123217. [Google Scholar] [CrossRef]
Wu, Q.; Ma, Z.; Xu, G.; Li, S.; Chen, D. A Novel Neural Network Classifier Using Beetle Antennae Search Algorithm for Pattern Classification. IEEE Access 2019, 7, 64686–64696. [Google Scholar] [CrossRef]
Zhang, L.; Li, K.; Bai, E.W.; Irwin, G.W. Two-stage orthogonal least squares methods for neural network construction. IEEE Trans. Neural Netw. Learn. Syst. 2015, 26, 1608–1621. [Google Scholar] [CrossRef]
Han, H.G.; Lu, W.; Hou, Y.; Qiao, J.F. An adaptive-PSO-based self organizing RBF neural network. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 104–117. [Google Scholar] [CrossRef]
Yıldız, A.R. A novel hybrid immune algorithm for global optimization in design and manufacturing. Robot. Comput. Manuf. 2009, 25, 261–270. [Google Scholar] [CrossRef]
Khalilpourazari, S.; Khalilpourazary, S. Optimization of production time in the multi-pass milling process via a robust grey wolf optimizer. Neural Comput. Appl. 2018, 29, 1321–1336. [Google Scholar] [CrossRef]
Mirjalili, S. SCA: A sine cosine algorithm for solving optimization problems. Knowl.-Based Syst. 2016, 96, 120–133. [Google Scholar] [CrossRef]
Nguyen, T.T.; Nguyen, T.A.; Trinh, Q.H. Optimization of milling parameters for energy savings and surface quality. Arab. J. Sci. Eng. 2020, 45, 9111–9125. [Google Scholar] [CrossRef]
Kaur, G.; Dhillon, J. Economic power generation scheduling exploiting hill-climbed Sine–Cosine algorithm. Appl. Soft Comput. 2021, 111, 107690. [Google Scholar] [CrossRef]
World Economic Forum. Electric Vehicles for Smarter Cities: The Future of Energy and Mobility; World Economic Forum: Cologny, Switzerland, 2018; p. 32. Available online: https://www3.weforum.org/docs/WEF_2018_%20Electric_For_Smarter_Cities.pdf (accessed on 28 January 2022).
Ghosh, A. Possibilities and Challenges for the Inclusion of the Electric Vehicle (EV) to Reduce the Carbon Footprint in the Transport Sector: A Review. Energies 2020, 13, 2602. [Google Scholar] [CrossRef]
Blair, E.H. Regulation time Culture. Professional Regulation time. J. Prof. Saf. 2013, 58, 59–65. [Google Scholar]
EU Science Hub. Electric Vehicles: A New Model to Reduce Time Wasted at Charging Points. 2019. Available online: https://ec.europa.eu/jrc/en/news/electric-vehicles-newmodel-reduce-time-wasted-charging-points (accessed on 28 January 2022).
Zhang, J.; Yan, J.; Liu, Y.; Zhang, H.; Lv, G. Daily electric vehicle charging load profiles considering demographics of vehicle users. Appl. Energy 2020, 274, 115063. [Google Scholar] [CrossRef]
Zhang, X.; Peng, L.; Cao, Y.; Liu, S.; Zhou, H.; Huang, K. Towards holistic charging management for urban electric taxi via a hybrid deployment of battery charging and swap stations. Renew. Energy 2020, 155, 703–716. [Google Scholar] [CrossRef]
Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.A.; Fidjeland, A.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
Wang, J.Q.; Du, Y.; Wang, J. LSTM based long-term energy consumption prediction with periodicity. Energy 2020, 197, 117197. [Google Scholar] [CrossRef]
Ren, C.; An, N.; Wang, J.; Li, L.; Hu, B.; Shang, D. Optimal parameters selection for BP neural network based on particle swarm optimization: A case study of wind speed forecasting. Knowl.-Based Syst. 2014, 56, 226–239. [Google Scholar] [CrossRef]
Khadse, C.B.; Chaudhari, M.A.; Borghate, V.B. Conjugate gradient back-propagation based articial neural network for real time power quality assessment. Int. J. Electr. Power Energy Syst. 2016, 82, 197–206. [Google Scholar] [CrossRef]
Schwemmle, N.; Ma, T.Y. Hyper parameter optimization for neural network based taxi demand prediction. In Proceedings of the BIVEC-GIBET Transport Research Days 2021, Delft, The Netherlands, 27–28 May 2021. [Google Scholar]
Guo, Q.; Xin, S.; Sun, H.; Li, Z.; Zhang, B. Rapid-Charging Navigation of Electric Vehicles Based on Real-Time Power Systems and Traffic Data. IEEE Trans. Smart Grid 2014, 5, 1969–1979. [Google Scholar] [CrossRef]
Mirjalili, S. The whale optimization algorithm. Adv. Eng. Softw. 2016, 9, 51–67. [Google Scholar] [CrossRef]
Yang, H.; Deng, Y.; Qiu, J.; Li, M.; Lai, M.; Dong, Z.Y. Electric Vehicle Route Selection and Charging Navigation Strategy Based on Crowd Sensing. IEEE Trans. Ind. Inform. 2017, 13, 2214–2226. [Google Scholar] [CrossRef]

Figure 1. The architecture of the deep learning.

Figure 2. The taxonomy of machine learning models and methodologies.

Figure 3. The map layout of state available direction and electric charger stations.

Figure 4. The spatial distribution of chargers in the city of Dundee and EVs allocation via Zap Map (source: https://data.dundeecity.gov.uk/dataset/ev-charging-data, accessed on 28 January 2022).

Figure 5. The stages of the methodology framework.

Figure 6. The perspective of procedures for EVs charging process at requested.

Figure 7. The significant factors on service span time.

Figure 8. The significant factors on utilization.

Figure 9. The queue of the EVs and their service time The DQN network was built for specific station

δ_{1}

from three different stations

δ_{i}

selected on the Zap Map and one of them meets the distance considerations discussed in Equations (4)–(10), and illustrated in Figure 10, to distribute the six different EVs to be served by the six actions

a_{i}

until complete charging according to a specific policy

π

that selects a specific station

δ_{i}

.

Figure 9. The queue of the EVs and their service time The DQN network was built for specific station

δ_{1}

from three different stations

δ_{i}

selected on the Zap Map and one of them meets the distance considerations discussed in Equations (4)–(10), and illustrated in Figure 10, to distribute the six different EVs to be served by the six actions

a_{i}

until complete charging according to a specific policy

π

that selects a specific station

δ_{i}

.

Figure 10. The DQN network for specific station from three different stations.

Figure 11. The iterations to get scheduling by proposed methodology.

Figure 12. The solution of service span time by BASNNC algorithm.

Figure 13. The solution of service span time by proposed methodology aided by AIoT management.

Figure 14. The lowest error in expected EVs’ service time is exponential during the period [5 A.m.: 11 A.m.].

Figure 15. The lowest error in expected EVs’ service time is Weibull during the period [5 P.m.: 11 P.m.].

Table 1. The work charter design sheet.

Title	Scheduling EVs Charging Stations on Their Trip Route Using AIoT
Goals	Reduce idle time actions (e.g., setup time, inspection time, response time, etc.) Reduce the service time by divide the charging span into fuzzy zones. Select the preferred station which closer to the main flight route.
Problem Scope: The scope of the work will focus on scheduling the different selection of requests for EVs stations to achieve the goals above through modelling fuzzy network trained by reinforcement learning bidirectional DQN and manage it by the AIoT. The Q, V loss function is used to generate the initial searching values formulated as follow: $λ_{t}^{Q} = (Q (s_{t}$ , $a_{t}$ ); $θ - λ_{t}^{Q})^{2}$ and $λ_{t}^{V} = (V (s_{t}$ , $a_{t}$ ); $θ - λ_{t}^{V})^{2}$ The OEE is a verification indicator revealing the effectiveness of the proposed scheduling by three terms: Availability = Station Run Time/Planned Service Time. Performance = (Ideal Charging Cycle Time × Total Satisfied EVs Count)/Station Run Time. Quality = Good EVs servicing Count/Total serviced Count.
Government/Customer Impact: Deployment of EVs to help clean the environment Smart Scheduling Q Network Map Transportation without hindrances.
Business Impact: The expected savings time via comparing the results by Qing Wu, (2019) [38] and Lee et al. (2020) [29] to reduce trip time and increase the stations utilization by measuring the OEE [47].
Current State (Baseline)						Objective and Stretch Target
$O E E = [(S a t i s f i e d E V \times I d e a l C h a r g i n g C y c l e T i m e) / P l a n n e d s e r v i c e T i m e]$ $O E E = A v a i l a b i l i t y \times P e r f o r m a n c e \times Q u a l i t y$
Metrics		“As-is”		“Should-be”		Tolerance	Set Mfg. condition
Standardization		63%		97%		No control	No procedures
Idle-Time		23%		6%		Control	Fixed procedures
Construct cause and effect						Construct FMEA
Policy function: Y = f (battery charge level, route length, trip distance, the role on queue list L)
Major contribution of this work smart fuzzy scheduling for DQN managed by AIoT
Zap Map on google engine search			Pareto Chart		Proposed Solutions

Table 2. Symbols describe the illustrated map discussed in Figure 3.

$x, y$	Coordinates of available serviceable stations.	$∆ A$	The gap between two streets.
$δ_{i, j}$	The number of stations on the map that were tallied (1, 2, 3, …, i, …, m)	$∆ F$	The separation between the major thorough fare and the first candidate station.
$N$	The number of EVs places on each street, regardless of whether $π_{A}$ or $π_{B}$ is used.	$∆ B$	The distance between the last $i, j$ on the map’s vertical street and the rear street.
$F G_{i}$	The fuzzy group for selected $\sum_{i = 1}^{N = 3} δ_{i}$	$∆_{l}$	The separation between two nearby stations $J_{i}$ and $J_{i + 1}$
$λ_{i}^{d}$	The route length under cost consideration	$L$	The list of suggested stations

Table 3. The AIoT-DQN equations parameters and Decision Variables.

Parameters
$C^{π A}$	The cost of distance of arrive the EVs to $δ_{i}$ as suggested via $π_{A}$ .
$C^{π B}$	The cost of distance of arrive the EVs to $δ_{i}$ as suggested via $π_{B}$ .
$C^{d π B}$	The cost of disposing stations $δ_{j}$ due to distance or to EVs driver canceled and switch to another $π_{B}$ .
$h^{π A}$	Processing fees for usable station to be available list of EVs in ( $π_{A}$ ) at period $(t) .$
$h^{π B}$	Processing fees for usable station to be available list of EVs in ( $π_{B}$ ) at period $(t) .$
$I_{0}^{S}$	Number of stations $δ_{i}$ candidate at time ( $t$ ) according of ( $π_{A}$ ).
$I_{0}^{R}$	Number of stations $δ_{i}$ candidate at time ( $t^{'}$ ) according of ( $π_{B}$ ).
$r_{t}^{A B}$	Uncertain serviceable number of $δ_{j}$ in ( $π_{B}$ ) and/or hybrid at ( $π_{A}$ ) in period $t + ∆ t$
${\bar{r}}_{t}^{A B}$	The average number of $d_{i} + d_{j}$ sites that were upgraded to be serviceable during period t is estimated.
${\bar{\hat{r}}}_{t}^{A B}$	Maximum variation from the mean for the number of $δ_{i}$ upgraded to serviceable in period $t$ ,
$τ_{t}^{d}$	The maximum number of uncertain requests and (re)requests for the $\sum_{i = 1}^{n} δ_{i}$ along the route that can diverge from the mean concurrently until the end of period $(t)$ .
$d_{t}$	Uncertain $δ_{i}$ ’ service time requested for the list $L$ .
$b$	EVs backlogging cost to serviceable $δ_{j}$ station at ( $π_{B}$ ) after finishing ( $π_{A}$ ).
$γ$	The discount parameters for control reward gained
Decision Variables
$S_{t}^{π A}$	The number of serviceable $δ_{i}$ that are distributed in period $t$ of ( $π_{A}$ ).
$S_{t}^{π B}$	The number of serviceable $δ_{j}$ at ( $π_{B}$ ) that is updated of ( $π_{A}$ ) in period $(t ’)$ .
$X_{t}^{D}$	The number of $δ_{i}$ that form list L and eliminated during period $t ’$ owing to cancellation or going against the cost analysis step and being delayed until route actions were complete
$I_{t}^{S}$	The number of backlogging EVs $i + j$ at the end of period $(t)$ for ( $π_{A & B}$ ).

Table 4. The (

d E V s S_{t}

) pseudo steps to construct the DQ Network.

Table 4. The (

d E V s S_{t}

) pseudo steps to construct the DQ Network.

Step 1:	Construct the serviceable network that represents the EVs requesting the charger stations.
Step 2:	Sort the ( $π_{A}$ ) generated requested list $L$ for available stations $δ_{i}$ in ascending order from nearest to farthest.
Step 3:	The route begins from the request point till reaching all station $δ_{i}$ generated by the policy $π_{A \| B}$ .
Step 4:	All serviceable $δ_{i}^{d}$ and $δ_{j}^{d}$ generated by ( $π_{A}$ ) and ( $π_{B}$ ) list for stations on road and on opposite side respectively are subject to constraints as in Equation (7) that lead to select shortest route $λ = d_{i} + d_{j}$ , and go to step 8.
Step 5:	$λ_{i m} a n d λ_{j n}$ are the Euclidean distance between the current $δ_{n}$ and $δ_{m}$ , where $δ_{m}$ and $δ_{n}$ are the first and last station according to the list L of ( $π_{B}$ ).
Step 6:	Choose the serviceable stations $δ_{j}$ along the suggested route to achieve the goals of being closest and serving in the least amount of time and saving money in ( $π_{B}$ ), and take into account the last ( $i)$ as the starting point to search others, keeping the previous calculated distance route as $\sum λ_{i}$ .
Step 7:	If the $\sum_{c o s t = 0}^{d} π_{A} \leq \sum_{c o s t = 0}^{d} π_{B}$ , select $π_{A}$ all $δ_{i}$ branch and neglect $π_{B}$ for all $δ_{j}$ branch and save it in the backlogging list to check the objective attainability from $π_{A}$ . Otherwise, hybrid of EVs up to first six object on the route.
Step 8:	If min [( $λ_{i m}$ > $λ_{j n}$ + $∆ F$ )] < y, arrange the stations in list L in descending order, and follow the function indexing Equation (8) to classify the $r_{t}^{τ}$ .
Step 9:	If all subsequent items in L are serviced, then repeat steps 3 and 4.
Step 10:	Return to the main street before proceeding to the feeding place.

Table 5. {

{FG}_{i}

} set; the fuzzy index for the groups of jobs {

{FG}_{i}

} set of actions.

Table 5. {

{FG}_{i}

} set; the fuzzy index for the groups of jobs {

{FG}_{i}

} set of actions.

The Fuzzy Index for the Groups of Jobs {Ui} Set
$J_{act} (E V s)$	EV(1)	EV(2)	EV(3)	EV(4)	EV(5)	EV(6)	$J_{act} (E V s)$	EV(1)	EV(2)	EV(3)	EV(4)	EV(5)	EV(6)
a₁ then a₂	10.5	0.4	13	2.2	0.5	0.8	a₄ then a₁	−ve	2	−ve	0.112	0.31	−ve
a₁ then a₃	6.25	2.3	5	−ve	0.54	−ve	a₄ then a₂	3.8	1.6	0.34	0.78	0.19	0.2857
a₁ then a₄	2.25	0.1	15	0.47	0.231	0.9	a₄ then a₃	2.1	5.4	−ve	−ve	0.61	−ve
a₁ then a₅	5.75	0.8	9	0.88	−ve	0.2	a₄ then a₅	1.9	2.48	−ve	0.56	−ve	−ve
a₁ then a₆	6	−ve	30	−ve	0.35	−ve	a₄ then a₆	2	0.6	1.34	−ve	0.42	−ve
a₂ then a₁	−ve	1.25	−ve	−ve	0.57	−ve	a₅ then a₁	−ve	0.083	0.112	−ve	0.82	0.143
a₂ then a₃	−ve	3.375	−ve	−ve	0.9	−ve	a₅ then a₂	0.42	−ve	1.67	0.92	0.65	0.762
a₂ then a₄	−ve	0.625	0.47	−ve	0.567	0.364	a₅ then a₃	0.21	1.5	0.656	−ve	1.3	−ve
a₂ then a₅	−ve	1.5	0.07	−ve	0.174	−ve	a₅ then a₄	−ve	−ve	1.67	−ve	0.82	0.857
a₂ then a₆	−ve	0.375	1.47	−ve	−ve	−ve	a₅ then a₆	−ve	−ve	3.34	−ve	1	−ve
a₃ then a₁	−ve	−ve	0.2	1.34	−ve	0.647	a₆ then a₁	−ve	3.34	−ve	2.34	0.03	0.934
a₃ then a₂	1	−ve	2.6	4.7	−ve	1.411	a₆ then a₂	1.12	2.67	−ve	7.34	−ve	1.802
a₃ then a₄	−ve	−ve	3	1.45	−ve	1.529	a₆ then a₃	0.44	9	−ve	1	0.31	0.734
a₃ then a₅	0.274	−ve	1.8	2.3	−ve	0.705	a₆ then a₄	−ve	1.67	−ve	2.5	0.03	1.933
a₃ then a₆	0.312	−ve	6	0.11	0.051	0.353	a₆ then a₅	0.36	4	−ve	3.67	−ve	1.267

Table 6. The descriptive analysis according to training data.

Seven Charger Point Type in Station	# of Studied Points of Charging in Selected Stations	# of Charging Requests via Bidirectional Cloud Management	# of Requests per Day per Charging Point over Horizon Period t	Service Span Time (h).
Seven Charger Point Type in Station	# of Studied Points of Charging in Selected Stations	# of Charging Requests via Bidirectional Cloud Management		$μ$	$σ$
Slow (7 KW)	36 (69.2%)	2145	0.446875	194.1	319.7
Fast (22 KW)	9 (17.3%)	716	0.745833	426.4	195.38
Rapid (43 KW or up)	7 (13.5 %)	4645	1.523148	24.08	7.62
Total	52	7506		644.58

Table 7. Illustrative solutions for the first group FG for four EVs or less size in [hour].

No.	$(d E V s S_{t})$	BASNNC	No.	$(d E V s S_{t})$	BASNNC	No.	$(d E V s S_{t})$	BASNNC	No.	$(d E V s S_{t})$	BASNNC	No.	$(d E V s S_{t})$	BASNNC
1	12.48	13	7	9.48	12.725	13	7.91	10.78	19	7.217	10.462	25	6.364	8.784
2	8.217	10.627	8	7.217	10.737	14	8.219	11.799	20	9.219	12.929	26	6.969	10.679
3	8.15	12.395	9	9.15	12.67	15	8.364	7.654	21	12.219	12.089	27	11.05	10.92
4	12.55	12.03	10	9.55	12.96	16	7.969	10.099	22	12.15	11.63	28	8.364	8.234
5	10.695	10.175	11	10.695	10.94	17	11.05	10.63	23	9.55	12.602	29	6.969	10.389
6	9.48	12.89	12	7.91	11.33	18	10.91	11.62	24	7.695	10.285	30	7.05	11.76

Table 8. Another verification for proposed methodology with GWO scheduling in [hour].

No.	$(d E V s S_{t})$	GWO	No.	$(d E V s S_{t})$	GWO	No.	$(d E V s S_{t})$	GWO	No.	$(d E V s S_{t})$	GWO	No.	$(d E V s S_{t})$	GWO	No.	$(d E V s S_{t})$	GWO
4	12.55	14.1	15	8.364	9.65	21	12.219	13.089	27	11.05	11.92	31	11.62	11.98	33	11.05	13.1
5	10.695	10.32	17	11.05	11.6	22	12.15	12.63	28	8.364	8.173	32	12.78	11.72	34	8.31	9.14

Table 9. Another verification for proposed methodology with Sine-cosine and Whale optimization scheduling in [hour].

No.	$(d E V s S_{t})$	SCW	No.	$(d E V s S_{t})$	SCW	No.	$(d E V s S_{t})$	SCW	No.	$(d E V s S_{t})$	SCW	No.	$(d E V s S_{t})$	SCW	No.	$(d E V s S_{t})$	SCW
5	10.695	11.52	28	8.364	8.37	32	12.78	13.01	35	13.04	13.3	36	12.7	12.7	37	11.05	-----
38	7.62	9.1	39	7.92	11.6	40	7.6	12.7	41	8.04	11.78	42	7.54	10.65	43	9.67	-----

Table 10. The comparative average of thirty examples for four different problem sizes.

	Three EVs	Four EVs	Five EVs	Six EVs	Eight EVs	Up to
Extracting solution time [s]	6.6	8.1	21.3	34.7	58.6	mins
Average distance among the perfect three stations and EVs’ request position [km]	7.8	8.1	8.4	10.6	16.1	21
Ideal servicing time [h]	7.121	7.535	7.918	8.075	10.2	Over day
Average Span time of ( $dEVs S_{t}$ ) [h]	6.921	7.143	7.474	6.876	7.9	infeasible
Average Span time of BASNNC [h]	9.006	9.221	10.743	11.761	16.1	-------
Average Span time of GWO [h]	9.215	9.7	10.74	11.361	18.7	infeasible
Average Span time of SCW [h]	9.153	9.825	10.241	11.162	14.6	-------
OEE ( $dEVs S_{t}$ )	78.1%	75.2%	72.1%	76.5%	54.1%	-------
OEE of BASNNC	30.1%	29.1%	43.7%	71.0%	52.4%	-------
OEE of GWO	33.1%	35.8%	43.7%	65.2%	51.2%	-------
OEE of SCW	32.2%	37.5%	37.0%	62.3%	51.1%	-------

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

M. Abed, A.; AlArjani, A. The Neural Network Classifier Works Efficiently on Searching in DQN Using the Autonomous Internet of Things Hybridized by the Metaheuristic Techniques to Reduce the EVs’ Service Scheduling Time. Energies 2022, 15, 6992. https://doi.org/10.3390/en15196992

AMA Style

M. Abed A, AlArjani A. The Neural Network Classifier Works Efficiently on Searching in DQN Using the Autonomous Internet of Things Hybridized by the Metaheuristic Techniques to Reduce the EVs’ Service Scheduling Time. Energies. 2022; 15(19):6992. https://doi.org/10.3390/en15196992

Chicago/Turabian Style

M. Abed, Ahmed, and Ali AlArjani. 2022. "The Neural Network Classifier Works Efficiently on Searching in DQN Using the Autonomous Internet of Things Hybridized by the Metaheuristic Techniques to Reduce the EVs’ Service Scheduling Time" Energies 15, no. 19: 6992. https://doi.org/10.3390/en15196992

APA Style

M. Abed, A., & AlArjani, A. (2022). The Neural Network Classifier Works Efficiently on Searching in DQN Using the Autonomous Internet of Things Hybridized by the Metaheuristic Techniques to Reduce the EVs’ Service Scheduling Time. Energies, 15(19), 6992. https://doi.org/10.3390/en15196992

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

The Neural Network Classifier Works Efficiently on Searching in DQN Using the Autonomous Internet of Things Hybridized by the Metaheuristic Techniques to Reduce the EVs’ Service Scheduling Time

Abstract

1. Introduction

2. The Policy Formulation and Research Contribution

3. Contribution of This Study

The Environment Layout Map Description

4. The Methodology Description

4.1. Phase I: The Smart Fuzzy Scheduling Formulation, Fecrease EVs Service Time

4.2. Phase II: Cost Analysis Formulation

5. Results Analysis

6. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI