4.2. Experimental Results and Analysis
In the experiment, four algorithm models were set up and two environments were designed to prove the good performance of the proposed algorithm in terms of latency, energy consumption and reward. Here, maximizing system revenue is the purpose of this article; the higher the revenue, the better the performance. The latency and energy consumption of the system, as well as the unfinished rate of tasks and the utilization of cache resources, are taken as indicators to evaluate algorithm performance.
In
Figure 3, we plot the average system revenue for eight scenarios in each episode. The icon “D3QN−cache” refers to the proposed joint edge computing and caching method based on the D3QN; icon “D3QN” refers to the D3QN algorithm; icon “D2QN” refers to the D2QN algorithm; “D2QN−cache” refers to the D2QN algorithm with caching models that are based on popularity as well as on the D2QN; and so on for the other icons. It can be seen from
Figure 3 that the average revenue of D3QN−cache in each episode is much higher than that of other DRL algorithms, which means that D3QN−cache can better achieve the goal of maximizing system revenue. This is because in the D3QN, the RL agent uses double frame to observe the changed state of the environment and the reward due to the execution of the action, which makes the Q values more efficient than those of the D2QN, DDQN and DQN. In the D2QN, the RL agent uses the target network to obtain the action values of all actions in the next state, and then calculates the target value based on the optimal action value. Due to its maximization operation, there is an overestimation problem in the algorithm, which affects the accuracy of decision making. In the DDQN, the agent only cares about the action of updating the Q value function. However, the DQN has limited ability within the high-dimensional state and action space, and it is difficult to explore and find the optimal joint offload and cache strategy due to its limited capability within the high-dimensional state and action space. So compared to other DRL algorithms, the agent of the D3QN can find a better joint offload and cache policy than other DRL algorithms in accomplishing tasks. From 0 to 6000 times, the average system revenue of the proposed algorithm tends to decline, which is because the agent is constantly learning, exploring a good strategy and reaches a stable state about 6000 times. It can be seen that, compared with other DRL algorithms, the D3QN can converge quickly and reach a steady state more easily. This is because the passive cache model based on the D3QN is used to cache tasks, which can use the advantage function and target Q value for choosing a suitable strategy to execute the action of caching tasks, with the agent thus being prompted to explore a better strategy and action in order to obtain higher rewards. In order to reduce system latency and energy consumption, some appropriate tasks can be cached in RSUs, which can reduce the number of computation and transmission tasks, thus reducing system costs and improving system revenue. For this reason, the average revenue of a system with a DRL algorithm caching model is much higher than that of a system without a caching model. As can be seen from
Figure 4, compared with DQN−cache, DDQN−cache, D2QN−cache, DQN, DDQN, D2QN and D3QN, the total average system revenue of D3QN−cache is increased by 65%, 35%, 66%, 257%, 142%, 312% and 57%, respectively. This shows that the proposed algorithm is more effective than other DRL algorithms.
An active cache based on the popularity and a passive cache based on the D3QN are used together; the parallel use of the two caching models enables more tasks to be cached in the RSU, and its calculation result is also retained. If the requested task can match the task in the cache area, the task does not produce any computation and transmission latency and energy consumption; it only generates the transmission latency and energy consumption of the transmission result. With an active cache, the most popular task will be cached in the cache area, and this will reduce the calculation of part of the tasks; however, when the cache area is enough, only caching the most popular task is not enough to maximize the use of the cache area, so a passive cache is proposed, using the caching policy of the D3QN to cache tasks that are not the most popular tasks in order to maximize the use of the cache space.
The latency and energy consumption of the transmission results are very small and negligible compared to the computation and transmission tasks.
Figure 5 and
Figure 6 illustrate that the average latency and energy consumption of the system with caching models are much lower than those without the caching policy. This is because we use two caching models to cache tasks together, which increases the probability that tasks will be cached. Caching as many tasks as possible without exceeding cache resources improves the utilization of cache resources and reduces system latency and energy consumption due to the numbers of computation and offloading tasks being reduced, which proves the necessity and importance of the caching model in computing tasks. In addition, the proposed algorithm always has much lower average latency and energy consumption than other DRL algorithms in each episode, and it is easier to reach a steady state. This is because the agent of the D3QN executes the cache action based on the greedy strategy to select an action that maximizes the reward and offloads tasks to an appropriate location for calculation based on the characteristics of the tasks. At the same time, distributed MEC enables nearby MEC servers to help with computing tasks, which relieves computing pressure. After calculation, compared with DQN−cache, DDQN−cache, D2QN−cache, DQN, DDQN, D2QN and D3QN, the total average system latency of the D3QN−cache is reduced by 53%, 42%, 53%, 66%, 59%, 66% and 49%, respectively, and the total average energy consumption of the system is reduced by 15%, 28%, 22%, 44%, 38%, 37% and 57%, respectively. Therefore, from what has been discussed above, the proposed algorithm can greatly reduce system latency and energy consumption.
In the whole system, we chose to discard the unfinished task, with the agent then receiving a corresponding negative reward because the task was not completed. In order to receive a more positive reward, the agent chose an appropriate action to complete the task.
where
is probability of unfinished tasks,
indicates that task
k is not completed and
,
W is the number of requested tasks in slot
t.
Figure 7 shows the probability of unfinished tasks under the eight algorithms in each slot. As can be seen from
Figure 7, the probability of unfinished tasks with the D3QN algorithm is much lower than that with other algorithms. This is because the D3QN not only uses the effect of the state on the Q value, but also the effect of the action on the Q value. When the task needs to be cached, the state has a great impact on the Q value. This is seen with the popularity of the task, the size of the task and other states having a great impact on the Q value. Additionally, when a task needs to be offloaded, the action has a great impact on the Q value. The choice of which action produces a greater revenue also greatly affects the Q value. Combined with the advantages of both, the performance of the D3QN is improved. In addition, it changes the way that it chooses the target Q value by using the action of the target Q-value, which eliminates overestimation. Compared with the D3QN, although the probability of unfinished tasks with D3QN−cache is higher at the beginning, when the number of iterations increases, the probability of unfinished tasks with D3QN−cache keeps decreasing, because the agent is constantly exploring better actions to complete tasks. However, a lower probability of unfinished tasks with D3QN is at the cost of high latency, high energy consumption and low profit; in our study, the agent of the D3QN chose a high-pay, low-return action to complete the task. Additionally, when the number of iterations increases, the agent in pursuit of a high profit also begins to explore actions that provide low latency and low energy consumption, so the probability of unfinished tasks begins to rise. Therefore, considering the latency, energy consumption, revenue and probability of unfinished tasks, the proposed algorithm is better.
In order to complete tasks as much as possible and obtain maximum system revenue, it is necessary to make full use of the cache space while not exceeding cache capacity, completing as many cache tasks as possible in the process. The utilization of cache area is the ratio of the size of all tasks in the cache area-to-the size of the cache space. The utilization of the cache area can be calculated as
where
represents the caching decision of task
k in slot
t,
represents the size of task
k in slot
t and
C represents the size of the cache space. As long as the size of the cache space is not exceeded, this number should be as large as possible.
Figure 8 shows the average utilization of the cache area under the four algorithms. As can be seen from
Figure 8, compared with DQN−cache, D2QN−cache and DDQN−cache, D3QN−cache, that is, the proposed algorithm, has the highest average utilization of cache space, with its average utilization being improved by 28%, 42% and 36%. It makes use of two caching models, the active cache and passive cache, and not only caches the most popular tasks, but also uses the D3QN to cache other tasks. The cache difference between these four algorithms is a result of the different DRL algorithms used in the passive cache. In the proposed algorithm, the best action is selected to cache more tasks according to the advantages of the D3QN mentioned above; by doing so, the utilization of the cache space is improved.
The average utilization of the cache space of D3QN−cache is only 63% because when cache resources are exhausted, in order to release a large amount of idle cache resources, some tasks of a certain type in the cache space need to be deleted, which means a large number of tasks will be deleted, which will lead to a sharp decrease in the number of tasks in the cache space. In each episode or several episodes, cache resources are released, which results in a low average utilization of cache resources. However, the advantage of D3QN−cache is that the number of task deletions and operational burden on the server are reduced.
4.2.1. The Influence of Different Training Parameters in D3QN−Cache
We compared the effects of batch size, memory size, gamma and learning rate (lr) on simulation results. The batch size, memory size, gamma and learning rate of D3QN−cache are 512, 3000, 0.99 and 0.0001, respectively. Using a larger batch size allows the agent to perform parallel calculations to a greater extent, and less time being required for training in each period can significantly accelerate model training. However, the larger the batch size, the slower the reduction of training losses, and the overall training time becomes longer. The memory records the previous training experience. When similar states occur, the value function network can be updated based on the previous successful experience, and the model will learn faster, whereas if there is too little experience stored, some good experiences will be discarded. If there is too much experience stored, some bad experiences will be learned. Therefore, to improve profits, it is necessary to choose a suitable memory size. If the gamma value is not large, only the current reward will be considered, which means adopting a short-sighted strategy. If we want to balance current and future rewards, a larger gamma value will be considered, which means that the long-term future reward will be taken into account in the value generated by the current behavior. However, if the gamma value is too large, it is considered too long-term, and even beyond the range that the current behavior can affect, which is obviously unreasonable. The learning rate controls the learning progress of the model and determines the time when the network obtains the global optimal solution. The learning rate being set too high can cause the network to not converge, but the learning rate being set too low can reduce the speed of network convergence and increase the time taken to find the optimal value. Therefore, in summary, from
Figure 9 and
Figure 10, it can be seen that the training parameters of the D3QN are optimal for the average system revenue.
From
Figure 11, it can be seen that the system average latency of D3QN−cache is much lower than that of other parameters, because a larger batch size can accelerate the convergence speed of the model through a large number of parallel calculations, and the memory size is 3000, which means that the empirical data is not too much or too little, and the agent learns some good behaviors and ignores some bad ones. Additionally, the gamma is set as 0.99, and this can effectively balance the relationship between current rewards and future rewards; furthermore, the learning rate of 0.0001 can quickly find the optimal parameters of the model. Although the average latency of D3QN−cache is the lowest, it does not mean that the energy consumption is the lowest. From
Figure 12, it can be seen that the average energy consumption of D3QN−cache did not reach the minimum. This is because D3QN−cache aims to maximize the optimization problem, which involves cache costs. To reduce cache costs, energy consumption must be sacrificed.
Figure 13 and
Figure 14 compare the probability of unfinished tasks and utilization of cache space for different training parameters in D3QN−cache, respectively. As the batch size increases, the speed of processing the same amount of data increases, but the number of epochs required to achieve the same accuracy increases. Due to the contradiction between the two factors, the batch size must be increased to a certain value in order to achieve the optimal time. From
Figure 13 and
Figure 14, it can be seen that when the batch size is 512, the agent will find the optimal joint decision of offloading and caching at the fastest speed, resulting in the lowest probability of unfinished tasks and the highest utilization of cache space. If the memory size is very small, there will be very little experience, which is not conducive to training. However, if there is too much experience, which leads to useless experiences being added to the training, it is clearly not conducive to training. From
Figure 13, it can be seen that a memory size of 3000 has the best effect, but from
Figure 14, it can be seen that utilization of cache space with a memory size of 2000 exceeds D3QN−cache, which indicates that a memory size of 2000 is the most advantageous for cache decision making. However, we are considering the overall system revenue, so in order to achieve this goal, some cache costs must be sacrificed. The gamma being equal to 1 means that the algorithm lacks convergence ability and the agent will consider possibilities, which will increase complexity. However, when the gamma value is too small, this means that the agent only focuses on current information and has a poor estimation of action. Therefore, it can be seen from
Figure 13 that the probability of unfinished tasks with gamma values of 1 and 0.9 is lower, and the utilization of cache space as shown in
Figure 14 is also lower. If the learning rate (lr) is set too high, it will lead to overfitting and a poor convergence effect, but if it is set too small, the convergence speed will be slow. From
Figure 13 and
Figure 14, it can be seen that the probability of unfinished tasks and utilization of cache space with the learning rates sets as 0.00001 and 0.001 are lower.
4.2.2. The Influence of Different Simulation Parameters in D3QN−Cache
We compared the impact number
W, size
I and type
L of tasks on the simulation results. The number
W, size
I and type
L of tasks for D3QN−cache are 30, 8–10 and 4. From
Figure 15 and
Figure 16, it can be seen that the average revenue for D3QN−cache is the highest for the following reasons. When the number of tasks increases from 10 to 30, the average revenue will greatly increase. This is because there are more tasks to process, which leads to more revenue. However, when the number of tasks increases from 30 to 50, the revenue will decrease. This is because with more tasks, the greater the pressure of the backhaul link and the greater the number of computing resources required. However, the computing resources of SDVs and RSUs are limited and cannot handle a large number of tasks. This will result in a decrease in revenue. When the size of the task decreases from 10–20 KB to 8–10 KB, the average revenue will greatly increase. This is because the larger the size of the task, the greater the number of resources that are required, making it is impossible to calculate and store tasks that are too large. When the size of the task decreased from 8–10 KB to 1–8 KB, the average revenue decreased, but it did not decrease much. This is because the currently designed resources of SDVs and RSUs are suitable for sizes of between 1–10 KB. Since the deletion rule of the task is type deletion, the type of task will affect the cache hit rate of each task. When there are very few types of tasks, each type will contain many tasks, so when performing task deletion, it will cause a large number of tasks to be deleted, resulting in a decrease in the cache hit rate and average revenue. When there are many types of tasks, it indicates that the number of same tasks is very small, and the experience data that the agent can obtain are very small, which can lead to a lower average revenue. Therefore, the average revenue for
L is 4 times higher, and for
L it is two and six times higher.
Figure 17 and
Figure 18 compare the average latency and energy consumption of D3QN−cache under different simulation parameters. Too many tasks can lead to an increase in action space and state space, making it more difficult to find the optimal solution, thus resulting in higher latency and energy consumption. However, if the number of tasks is too small, the agent cannot obtain a large amount of experience data and cannot make optimal strategies, resulting in high latency and energy consumption. So, from
Figure 17 and
Figure 18, it can be seen that the latency and energy consumption of D3QN−cache are lower than those of
W, which are 10 and 50. The size of the task affects latency and energy consumption. The larger the size of the task, the more offloading latency and computing resources it can occupy, leading to increase in system latency and energy consumption. So, the latency and energy consumption of tasks with a size of 1–8 KB are lower than those of the D3QN. The type of task can affect the cache hit rate, leading to changes in latency and energy consumption. Although the energy consumption and latency of D3QN−cache are not the lowest, the proposed algorithm focuses on long-term system revenues rather than the energy consumption and latency per slot, demonstrating the importance of considering long-term optimization in dynamic networks.
Figure 19 and
Figure 20 compare the probability of unfinished tasks and utilization of cache space of D3QN−cache under different simulation parameters, respectively. If the number of tasks is too large, it will result in a high demand for resources. However, with limited computing and storage resources, the probability of not being able to complete tasks will increase, and the number of tasks cached in the RSUs will also decrease. If the number of tasks is small, the experience data obtained by the agent will be small, making it difficult to find the optimal decisions, so the number of cached tasks will also decrease. Therefore, from
Figure 19 and
Figure 20, it can be seen that the probability of unfinished tasks with a task number of 30 is lower than that when
W is 10 and 50, and the utilization of the cache space of D3QN−cache is higher than that when
W is 10 and 50. The fewer types of tasks, the larger the number of tasks of a certain type. Due to the type of deletion rule, there are fewer tasks in the cache space, resulting in a lower utilization of cache space and cache hit rates; therefore, the probability of unfinished tasks is high. The fewer task types, the fewer tasks deleted by the type of deletion rule, and the more tasks in the cache space and less remaining cache resources, resulting in the delayed caching of subsequent tasks. So, from
Figure 19 and
Figure 20, it can be seen that the probability of unfinished tasks with
L is 4 times lower, and for
L it is two and six times higher. The larger the size of the task, the greater the likelihood of content delivery failure within a fixed deadline, which can result in penalty costs. So, the probability of unfinished tasks with a task size of 1–8 KB is the lowest. However, the larger the size of the task, the larger the cache resources it occupies, resulting in a higher utilization of cache space.
As can be seen from the simulation results in
Section 4.2.1, the training parameters we selected perform well in several evaluation indexes, such as average reward. Although the effects of some evaluation indexes are not very good, they do not affect the overall effect of the system. It can be seen from
Section 4.2.2 that the simulation parameters we designed are appropriate. If the malleability of the proposed algorithm is to be expanded, the number of RSUs can be increased to enhance the auxiliary computing capacity between them.