1. Introduction
With the increasing amount and variety of application data, users’ demand for high-quality services has been growing. As a new computing model of IoT, edge computing has become a highly virtualized platform to provide computing, storage, and network services between terminal devices and traditional cloud data centers. As an important infrastructure of edge computing network, edge node includes the switch, router, and embedded server. With the continuous development of Internet terminal devices, smartphones and other terminal devices are widely used. So far, the penetration rate of smartphones in the United States has reached 80%. According to the results released by Cisco, the average number of connected devices per capita will reach 3.6 by 2023 [
1]. In edge computing, due to the increasing variety and quantity of data producing from IoT devices, the demand of end-users for high-quality mobile services also increases. In addition, due to the increasing number of connected devices on edge nodes, insufficient resource supply will also lead to high costs and serious load imbalance between edge nodes. Therefore, a complete and comprehensive task offloading strategy is particularly critical for the development of the edge computing network and better application performance promotion.
Figure 1 shows the overview of the system architecture that referring to in this paper. We suppose that the system architecture contains three layers which are cloud, edge, and user. In the edge layer, there are several edge nodes with different limited capacities. We suppose that the connections and locations of the edge nodes have been fixed by the third-party service providers or the cloud data centers. For each user, the connection scope is within a certain area, as shown in the dotted circle in
Figure 1. In this area, users can offload tasks to the corresponding edge node or process through a local device. This paper studies the application-driven tasks constructed by several sub-tasks with strong dependencies, and the sub-tasks that belong to one application can process on different devices. In our system architecture, the provisioning process of the application is to decide the locations of sub-tasks. For example, we use
to denote the
application that contains three sub-tasks:
,
, and
. The dependencies of these sub-tasks is
. One extreme solution is offloading strategy by minimizing the transmission cost for
, which processes all sub-tasks on local devices with the order
. However, this solution has the highest delay due to the limited capacity of local devices. Another extreme solution is the offloading strategy by minimizing the processing delay for
, which processes all sub-tasks on edge nodes. The user transmits the sub-tasks to the edge node for processing through the wireless channel, and the results are returned after all sub-tasks are completed. However, when the sizes of sub-tasks are large, the total transmission cost will be correspondingly large due to the dependencies between the precursor and the successor, which will lead to a decreasing in the quality of service of users. In this paper, we propose an efficient offloading strategy for multiple users that jointly considering the processing delay and transmission cost.
In this paper, we concentrate on the application-driven task offloading problem in edge computing by considering the strong dependencies of sub-tasks. The most important point under this problem is to determine the offloading locations of sub-tasks for users, so as to joint optimize the total delay and energy consumption generated by applications while guaranteeing the quality of services for users. Our problem poses several unique challenges as follows: (i) Since the capabilities of edge nodes and local devices are limited and different, it is nontrivial that finding a feasible strategy to complete the sub-task within improving the total cost for users during the offloading process. (ii) In our problem, we consider the applications with strong dependencies which can not achieve complete parallelism. Thus, it is challenging to find a solution that satisfies the dependency relationship with lower cost. The major novel contributions of this paper are as follows:
We discuss the offloading problem for the application-driven tasks in edge computing, and we optimize the total cost of users which jointly consider the delays and energy consumption.
We propose an application-driven task offloading strategy (ATOS) based on deep reinforcement learning (DRL) by adding a preliminary sorting mechanism to realize the joint optimization of the delays and energy consumption. Specifically, we analyze the characteristics of application-driven tasks and propose a heuristic algorithm by introducing a new factor to determine the processing order of parallelism sub-tasks. Based on that, we propose a task offloading strategy based on the deep Q-learning network (DQN) by training a fully connected neural network.
We design a simulator to evaluate our strategy ATOS by comparing it with several state-of-the-art ones. The results are presented from different perspectives to provide conclusions.
The remainder of this paper is organized as follows.
Section 2 surveys related works.
Section 3 describes the model and then formulates the problem.
Section 4 investigates the application-driven task offloading strategy based on DRL.
Section 5 presents the experiments. Finally,
Section 6 concludes the paper.
2. Related Work
The concept of edge computing was introduced to extend the cloud to the edge of the network, thus enabling a new breed of applications and services. There are numerous novel strategies on the task offloading problem in edge computing that have been proposed. Mao et al. [
2,
3] introduced an online learning assignment strategy based on dynamic computing offloading that is applicable to a single user. During the offloading process, the execution cost (time delay and offload failure rate) of performing the offloading was measured at each time interval. The online learning allocation strategy only depends on the current system state and did not need to calculate the task feedback results, the distribution information of the wireless channel, and the energy collection. Chen et al. [
4] discussed the solution of moving edge computing to meet the low latency requirements in an ultra-dense network. Using the idea of a software-defined network, the task offloading problem was expressed as a mixed-integer nonlinear computing process, and the delay reduction problem was transformed into two sub-problems: task offloading problem and resource allocation problem. Yu et al. [
5] considered the application scenarios of the IoT (Internet of Things) and reduced the computing delay by allocating resources reasonably for the program. Then, a complete polynomial-time approximation scheme was proposed, which was more effective in shortening the computing delay than the heuristic algorithm. Spatharakis et al. [
6] proposed a two-level Edge Computing architecture to offer computing resources for the remote execution of location based services (LBS). Xu et al. [
7] proposed a distributed computing offloading algorithm designed with the method of game theory, and the calculation delay index was quantified to achieve a lower calculation time cost.
In recent years, with the continuous development of machine learning methods, it has gradually infiltrated into various fields, among which reinforcement learning has also found a good application in the offloading decision to reduce the time delay. Meng et al. [
8] proposed a delay-sensitive task offloading algorithm based on deep reinforcement learning (DRL) to improve the task processing speed and reduce the task timeout. A new reward function was designed to guide the algorithm to learn offloading decisions from the environment by combining timeout signals and deceleration signals. In addition, intelligent algorithms have also been applied to various fields. Li et al. [
9] proposed a joint optimization method of task allocation ratio, channel bandwidth, and computing resources of mobile edge servers based on genetic algorithm, aiming at the situation that part of computing tasks can be partially allocated to the mobile edge server. Under the constraints of wireless transmission resources and mobile edge server processing resources, a genetic algorithm is used to solve the optimization problem of minimizing user task completion time, and the optimal offloading task strategy and resource allocation scheme were obtained. All the above offloading decisions have achieved the purpose of reducing time delay, but they failed to consider the energy consumption at the end of terminal devices during the calculation of the task offloading. The terminal devices may not be able to operate normally due to the lack of power, which has a huge impact on users’ experience.
There are also many solutions for the task offloading problems in different environmental scenarios from the perspective of optimizing energy consumption. Zhang et al. [
10] adopt the artificial fish swarm algorithm to design the offloading strategy for energy consumption optimization under the constraint of delay. This strategy takes full account of the link conditions in the task data transmission network and effectively reduces the energy consumption of the equipment. However, this strategy has the defect of too high algorithm complexity. In a multi-resource environment, Xu et al. [
11] designed an energy-minimization particle swarm task scheduling algorithm for multi-resource matching to reduce the energy consumption of edge terminal devices. Wei et al. [
12] proposed that the task offloading problem can be divided into mobile management and energy-saving optimization, and they use a greedy algorithm to minimize the energy consumption of mobile devices. Lu et al. [
13] focus on minimizing the total cost for multiple mobile users to provide an efficient resource provisioning scheme by considering three different cases in edge computing. Yu et al. [
14] studied the problem of task offloading in ultra-dense network scenarios. They proposed a task offloading algorithm based on Lyapunov optimization theory, which minimizes the overall energy consumption of the base station and equipment. In order to solve the privacy leakage problem that may occur in the computing offloading decision of mobile edge computing, Zhao et al. [
15] proposed a privacy perception computing offloading algorithm. This algorithm has low computational complexity and maintains low terminal energy consumption while ensuring high privacy security. Liu et al. [
16] studied the offloading problem based on deep learning, and they proposed a group sparse beamforming framework to optimize network power consumption.
Some studies jointly considered the energy consumption and delay in offloading trade-off optimization problems and put forward some ideas and solutions. Zhang et al. [
17] proposed an offloading mechanism assisted by SDN-V, which is suitable for the scenario of the IoV (Internet of Vehicles). The mechanism considered the task diversity, establishes the mathematical model of importance degree, and designed the task offloading sorting algorithm according to the model. Finally, an offloading algorithm based on Q-learning is constructed to optimize the energy consumption and time delay during task offloading. In the case of mobile edge computing, there are many reinforcement learning methods to solve optimization. Zhang et al. [
18] proposed a policy-based DRL scheme to solve the problem that a single mobile device offloads tasks to a single mobile edge server. However, there is a question of how much to tweak the network each time that the policy is updated. Too large an amplitude may lead to the problem of non-convergence, while too small an amplitude may lead to the problem of slow convergence. Song et al. [
19] proposed a semi-online computational offloading model soCoM based on dueling deep-Q network to explore the user behaviors in sophisticated action space by reinforcement learning for catching unknown environment information. Liu et al. [
20] proposed an improved scheme. In this scheme, an artificial neural network was firstly used to learn strategies and make decisions, and another artificial neural network was used to score this decision [
21]. In order to improve this problem, Zhan et al. [
22] proposed a scheme of disengagement strategy. Firstly, two artificial neural networks were used to approximate the behavior strategy and the target strategy respectively. Then, learning data was generated from behavioral strategies to train the neural network of target strategies. Finally, the parameters of the trained target policy were assigned to the behavior policy. After repeated iterative learning of the target strategy [
23], it introduced more artificial neural networks and more parameters. In this paper, we are committed to designing an offloading strategy based on DRL for application-driven tasks that jointly optimize the total delay and energy consumption.
4. An Application-Driven Task Offloading Strategy Based on DRL
In this section, we propose an Application-driven Task Offloading Strategy (ATOS) based on DRL. The main idea of ATOS is to add a preliminary sorting mechanism and realize the jointly optimization of the delay and energy consumption by proposing a task offloading strategy based on the deep Q-learning. The detailed description of ATOS is shown as follows.
4.1. Preliminary Sorting Mechanism (PSM)
In this paper, we assume that the applications are generated by the users which are composed of several fine-grained sub-tasks. Although these sub-tasks have strong dependencies, there are still existing some parallel sub-tasks whose execution order will affect the result of subsequent task offloading. An illustration of PSM for the application
is shown in
Figure 3. In this subsection, we introduce a Preliminary Sorting Mechanism (PSM) to determine the sequences of sub-tasks. We first initialize the preliminary sorting set
in line 1. For each sub-task
in application
, we check the in-degree
. If the value of in-degree is 0, we add this sub-task into queue
. Otherwise, we check the out-degree
. If the value of out-degree is 0, it represents that this is the last sub-task in the application. Then, we return the sequence queue
. If neither of the above cases is true, we add the subsequent and sibling sub-tasks of
to the preliminary sorting set
. According to the structure and characteristics of the application, we define a priority factor
.
Definition 1 (Priority Factors). The priority factors is to decide the execution order for the parallel sub-tasks in application , where .
Based on that, we calculate the priority factors for subsequent sub-tasks in set . In line 9, we update with descending order of , where . Then, we update by adding the preliminary sorting set into queue. In line 11, we return sequence queue .
4.2. Task Offloading Based on Deep Q-Learning
In this subsection, we introduce our task offloading strategy based on DQN. To describe the environment of the DCN correctly and concisely for the agent, the state space should include the knowledge of applications and the status of the total cost. So, the state is designed as follows.
Definition 2 (State). The state is a vector consisting of , where are the sub-tasks waiting to be scheduled, and is the total cost of the scheduled sub-tasks .
We consider realizing the offloading by training the agent which needs to choose a destination (edge nodes or local devices) for the sub-tasks of each application. The action is designed as follows.
Definition 3 (Action). The action space is the adjusting action, where or means that the target location of adjustment is on edge node or local device.
At each time slot t, the agent will receive a reward in a certain state st after executing action . Since the objective is to minimize the total cost of delay and energy consumption which contract with the goal of RL that maximizing the long-term reward, the reward function should be negatively related to the weighted sum of delay and energy consumption. The reward function is designed as follows.
Definition 4 (Reward). The immediate reward is , where is the total cost of the scheduled sub-tasks, and is a baseline cost that offloading with greedy strategy.
Algorithm 1 summarizes the ATOS, and the main idea is to use a deep reinforcement learning agent to perform the dynamic offloading of sub-tasks in applications to minimize the total cost of delay and energy consumption. We first initialize some preliminary parameters which include setting the replay memory
to capacity
N. Meanwhile, we initialize the action-value function
Q with random weight
and the target action-value function
with weights
. In lines 2 to 15, we start to train the agent by running a number of
episodes with our environment. During each episode, Initialize sequence
based on Algorithm 2 in line 3. The training process starts from lines 4 to 14. In line 4, the agent selects a random action
with probability
; otherwise, it will select
with the maximum
Q value in line 5. In line 6, we set
,
,
, and preprocess
, and we store the transition
in the replay memory
. After that, we sample a random minibatch of transitions
from
in lines 7 to 8. The objective of our problem is to minimize the total cost of the users which is contrary to the cumulative reward received by the agent. In line 12 to 14, the agent performs a gradient descent step on
with respect to the network parameters
, and resets
every
C steps. The offloading results are returned in line 15.
Algorithm 1 Application-driven Task Offloading Strategy based on DQN (ATOS). |
Input: The applications generated by user with sequences ; Output: Offloading strategy ;
- 1:
Initialize to N, Q with random weights , and with weights ; - 2:
for episode from 1 to do - 3:
Initialize sequence based on Algorithm 2; - 4:
With probability select a random action ; - 5:
Otherwise select ; - 6:
Set , , and preprocess . - 7:
Store transition in ; - 8:
Sample random minibatch of transitions from . - 9:
if episode terminates at step then - 10:
Set ; - 11:
else - 12:
Set ; - 13:
Perform a gradient descent step on with respect to the network parameters . - 14:
Every C steps reset ; - 15:
return Offloading strategy ;
|
Algorithm 2 Preliminary Sorting Mechanism (PSM). |
Input: The application generated by user ; Output: The sequence queue of the sub-tasks in ;
- 1:
Initialize the preliminary sorting set ; - 2:
for each sub-task in do - 3:
if then - 4:
Adding sub-task into queue ; - 5:
else if then - 6:
Go to line 11; - 7:
Adding subsequent and sibling sub-tasks of to preliminary sorting set ; - 8:
Calculate the priority factors for subsequent sub-tasks in set ; - 9:
Update with descending order of , where ; - 10:
Update by adding set into queue; - 11:
return Sequence queue ;
|