Cooperative Service Caching and Task Offloading in Mobile Edge Computing: A Novel Hierarchical Reinforcement Learning Approach
Abstract
:1. Introduction
- We propose a novel HRL algorithm hierarchical service caching and task offloading (HSCTO) to solve cooperative service caching and task offloading problem in MEC. The lower layer is in charge of task offloading strategies, while the upper layer makes decisions on service caching. We design the algorithm based on analyzing the intrinsic structure and a priori knowledge of the MEC scenario, and mostly focus on specificity rather than generality. In contrast to current HRL algorithms, the action of the upper-layer agent in HSCTO not only represents the temporal and behavioral abstractions of the primitive actions of the lower-layer agent, but also corresponds to concrete meaning, i.e., service caching decisions which can be manually defined in advance. Within the hierarchical organization, the lower-layer module learns ordinary offloading policies by interacting with its environment, and the upper-layer module learns policies directly by exploiting the Q value functions of the lower-layer agents. The tightly coupled training method guarantees the algorithm performance.
- HSCTO arranges two layer agents to work at different time granularities. The lower-layer agent takes action upon the MEC environment in each time slot, and the upper-layer agent makes a decision over multiple time slots. In addition, we adopt a fixed multiple-time-step approach in the upper layer, so as to not rely on SMDP theory. We theoretically analyze that the upper layer is also an MDP and derive the Bellman optimal equations. As a result, current DRL algorithms can be employed in our hierarchical architecture directly and seamlessly. Furthermore, this two-time-scale framework ensures the stability of service deployment, while reducing the cost of frequently switching services in terms of energy consumption.
- In terms of extra limitation in a practical scenario, we consider separate protection and licensing methods in intellectual property, i.e., the number of MEC servers that can legally host the same type of service at the same time is limited. We integrate this constraint into our MEC model and address it.
- We conduct extensive numerical experiments to verify the performance of HSCTO, and the results show that our algorithm can effectively reduce computational delay while optimizing energy consumption, achieving significant performance improvements over multiple baseline approaches.
2. Mathematical Model and Problem Formulation
2.1. MEC Scenario
2.2. Decision Model
2.3. Computation Delay and Energy Consumption Model
2.3.1. Transmission Model
2.3.2. Computation Model
- Local computingA computational task will be partitioned into two parts for local computing and offloading computing, respectively, based on offloading decisions. The local part is directly processed in MT, and thus, the computing delay can be calculated by
- Offloading to edge serverGiven the transmission rate by Equation (4), the time delay of offloading from MT d to ES e is shown in next equation.Suppose the power of MT d remains unchanged in one time slot, the energy consumption of the transmitting task between MT d and ES e can be estimated as:
- Computing in edge serverLet denote the computational resource allocated for MEC computing in ES e, and the time of computing the task from MT d is given by
- Offloading to the cloud centerIf there is no required service , the ES e will transmit the task to the cloud computing center for computation. Since the wired backbone network is adopted between the edge servers and cloud center, the transmission rate is supposed to be a constant . Therefore, in time slot t, the transmission delay can be estimated as follows:Furthermore, let denote the transmission power of ES e, and the energy consumption of the previous relay process can be written as
- Service cachingTo simplify, we use and to denote the transmission delay and energy consumption of downloading and deploying service , respectively. Thus, the delay and energy required to update the caching situation of will beConsequently, the time and energy consumption of the offloading process can be calculated by the next two equations, respectively.If the required service is cached on ES e in time slot t, i.e., , the indicator function , the values of and will be added into the and , respectively. Otherwise, the indicator function , and will be considered. In summary, taking into account both local and offloading computing, the total delay and energy consumption of the task generated by MT d can be estimated byFurthermore, let denote the set of new caching services that no one uses in ES e, and the corresponding energy consumption can be expressed as
2.4. Problem Formulation
3. Hierarchical Reinforcement Learning Framework
3.1. MDP
3.1.1. The Lower Layer
3.1.2. The Upper Layer
3.2. Bellman Optimality Equation
4. Algorithm
4.1. Task Offloading Algorithm
- ObservationFirstly, we discuss the observation of each agent in multi-agent environment. is the partial observation of ES e; moreover, the total state at each time step t is denoted by , and , and denote the set of distance, channel gain and bandwidth between ES e and each MT d in , respectively. is the computing capacity of each MT in . As defined in the previous section, is the set of all tasks observed by ES e. In addition, is the current service set cached in ES e. Obviously, is the current super action of the upper-layer algorithm.
- ActionAs mentioned before, we consider a partial offloading method and the corresponding decision for MT d is , and thus, is the action of ES e. Moreover, we use to be the action of the lower-layer algorithm.
- RewardSince the optimization objective of the task offloading decision is to minimize the average delay and energy consumption of all mobile terminal tasks, the reward function for each ES e is defined as the weighted value of the overall energy consumption for computation and the penalty for latency.The smaller the weighted sum of the completion delay and energy consumption, the greater the reward of the task offloading decision.
Algorithm 1 Task offloading algorithm |
|
4.2. Service Caching Algorithm
- StateAccording to the design principle of FSHRL, the state observed by the service caching layer is , where and
- Super actionObviously, the super action is the service caching operation in next macro time step, i.e.,
- RewardThe reward function is the sum of the lower-layer agents’ rewards of each time step t within the current macro time step , and an extra cost of super action , i.e., .
Algorithm 2 Service Caching Algorithm |
|
5. Evaluation
5.1. Evaluation Configuration
- DDPG. A typical single-agent reinforcement learning algorithm. In our experiment, DDPG makes action decisions for task offloading and service caching simultaneously.
- H-DQN [35]. A typical HRL algorithm. In order to cope with the continuous action of task offloading, we first discretize the action space, and then make an offloading decision.
- Random-DQN. An HRL algorithm, in which a random policy is adopted to handle service caching, while task offloading uses DQN for decision making.
- Random-DDPG. An HRL algorithm, similar to the previous one, but using DDPG algorithm for decision making.
- Random. A hierarchical algorithm, which randomly makes both service caching and task offloading decisions.
5.2. Performance Analysis
6. Related Works
6.1. Service Caching and Task Offloading in MEC
6.2. Hierarchical Reinforcement Learning in MEC
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Qiu, T.; Chi, J.; Zhou, X.; Ning, Z.; Atiquzzaman, M.; Wu, D.O. Edge Computing in Industrial Internet of Things:Architecture, Advances and Challenges. IEEE Commun. Surv. Tutor. 2020, 22, 2462–2488. [Google Scholar] [CrossRef]
- Lin, H.; Zeadally, S.; Chen, Z.; Labiod, H.; Wang, L. A Survey on Computation Offloading Modeling for Edge Computing. J. Netw. Comput. Appl. 2020, 169, 102781. [Google Scholar] [CrossRef]
- Guo, F.; Zhang, H.; Ji, H.; Li, X.; Leung, V.C.M. An Efficient Computation Offloading Management Scheme in the Densely Deployed Small Cell Networks With Mobile Edge Computing. IEEE/ACM Trans. Netw. 2018, 26, 2651–2664. [Google Scholar] [CrossRef]
- Hortelano, D.; De Miguel, I.; Barroso, R.J.D.; Aguado, J.C.; Merayo, N.; Ruiz, L.; Asensio, A.; Masip-Bruin, X.; Fernández, P.; Lorenzo, R.M.; et al. A Comprehensive Survey on Reinforcement-Learning-Based Computation Offloading Techniques in Edge Computing Systems. J. Netw. Comput. Appl. 2023, 216, 103669. [Google Scholar] [CrossRef]
- Xu, X.; Liu, K.; Dai, P.; Jin, F.; Ren, H.; Zhan, C.; Guo, S. Joint Task Offloading and Resource Optimization in NOMA-based Vehicular Edge Computing: A Game-Theoretic DRL Approach. J. Syst. Archit. 2023, 134, 102780. [Google Scholar] [CrossRef]
- Wang, L.; Jiao, L.; He, T.; Li, J.; Mühlhäuser, M. Service Entity Placement for Social Virtual Reality Applications in Edge Computing. In Proceedings of the IEEE INFOCOM 2018-IEEE Conference on Computer Communications, Honolulu, HI, USA, 16–19 April 2018; pp. 468–476. [Google Scholar]
- Pasteris, S.; Wang, S.; Herbster, M.; He, T. Service Placement with Provable Guarantees in Heterogeneous Edge Computing Systems. In Proceedings of the IEEE INFOCOM 2019-IEEE Conference on Computer Communications, Paris, France, 29 April–2 May 2019; pp. 514–522. [Google Scholar]
- Liu, S.; Yu, Y.; Lian, X.; Feng, Y.; She, C.; Yeoh, P.L.; Guo, L.; Vucetic, B.; Li, Y. Dependent Task Scheduling and Offloading for Minimizing Deadline Violation Ratio in Mobile Edge Computing Networks. IEEE J. Sel. Areas Commun. 2023, 41, 538–554. [Google Scholar] [CrossRef]
- Plageras, A.P.; Psannis, K.E. IoT-based Health and Emotion Care System. ICT Express 2023, 9, 112–115. [Google Scholar] [CrossRef]
- Xu, J.; Chen, L.; Zhou, P. Joint Service Caching and Task Offloading for Mobile Edge Computing in Dense Networks. In Proceedings of the IEEE INFOCOM 2018-IEEE Conference on Computer Communications, Honolulu, HI, USA, 16–19 April 2018; pp. 207–215. [Google Scholar]
- Zhou, R.; Wu, X.; Tan, H.; Zhang, R. Two Time-Scale Joint Service Caching and Task Offloading for UAV-assisted Mobile Edge Computing. In Proceedings of the IEEE INFOCOM 2022-IEEE Conference on Computer Communications, London, UK, 2–5 May 2022; pp. 1189–1198. [Google Scholar]
- Ko, S.W.; Kim, S.J.; Jung, H.; Choi, S.W. Computation Offloading and Service Caching for Mobile Edge Computing Under Personalized Service Preference. IEEE Trans. Wirel. Commun. 2022, 21, 6568–6583. [Google Scholar] [CrossRef]
- Premsankar, G.; Ghaddar, B. Energy-Efficient Service Placement for Latency-Sensitive Applications in Edge Computing. IEEE Internet Things J. 2022, 9, 17926–17937. [Google Scholar] [CrossRef]
- Geng, Y.; Liu, E.; Wang, R.; Liu, Y. Hierarchical Reinforcement Learning for Relay Selection and Power Optimization in Two-Hop Cooperative Relay Network. IEEE Trans. Commun. 2022, 70, 171–184. [Google Scholar] [CrossRef]
- Zhou, H.; Long, Y.; Zhang, W.; Xu, J.; Gong, S. Hierarchical Multi-Agent Deep Reinforcement Learning for Backscatter-aided Data Offloading. In Proceedings of the 2022 IEEE Wireless Communications and Networking Conference (WCNC), Austin, TX, USA, 10–13 April 2022; pp. 542–547. [Google Scholar]
- Yao, Z.; Xia, S.; Li, Y.; Wu, G. Cooperative Task Offloading and Service Caching for Digital Twin Edge Networks: A Graph Attention Multi-Agent Reinforcement Learning Approach. IEEE J. Sel. Areas Commun. 2023, 41, 3401–3413. [Google Scholar] [CrossRef]
- Zhang, Y.; Mou, Z.; Gao, F.; Xing, L.; Jiang, J.; Han, Z. Hierarchical Deep Reinforcement Learning for Backscattering Data Collection With Multiple UAVs. IEEE Internet Things J. 2021, 8, 3786–3800. [Google Scholar] [CrossRef]
- Shi, W.; Li, J.; Wu, H.; Zhou, C.; Cheng, N.; Shen, X. Drone-Cell Trajectory Planning and Resource Allocation for Highly Mobile Networks: A Hierarchical DRL Approach. IEEE Internet Things J. 2021, 8, 9800–9813. [Google Scholar] [CrossRef]
- Ren, T.; Niu, J.; Dai, B.; Liu, X.; Hu, Z.; Xu, M.; Guizani, M. Enabling Efficient Scheduling in Large-Scale UAV-Assisted Mobile-Edge Computing via Hierarchical Reinforcement Learning. IEEE Internet Things J. 2022, 9, 7095–7109. [Google Scholar] [CrossRef]
- Birman, Y.; Ido, Z.; Katz, G.; Shabtai, A. Hierarchical Deep Reinforcement Learning Approach for Multi-Objective Scheduling With Varying Queue Sizes. In Proceedings of the 2021 International Joint Conference on Neural Networks (IJCNN), Shenzhen, China, 18–22 July 2021; pp. 1–10. [Google Scholar]
- Fang, C.; Xu, H.; Zhang, T.; Li, Y.; Ni, W.; Han, Z.; Guo, S. Joint Task Offloading and Content Caching for NOMA-Aided Cloud-Edge-Terminal Cooperation Networks. IEEE Trans. Wirel. Commun. 2024, 23, 15586–15600. [Google Scholar] [CrossRef]
- Lin, N.; Han, X.; Hawbani, A.; Sun, Y.; Guan, Y.; Zhao, L. Deep Reinforcement Learning Based Dual-Timescale Service Caching and Computation Offloading for Multi-UAV Assisted MEC Systems. IEEE Trans. Netw. Serv. Manag. 2024. [Google Scholar] [CrossRef]
- Xu, Y.; Peng, Z.; Song, N.; Qiu, Y.; Zhang, C.; Zhang, Y. Joint Optimization of Service Caching and Task Offloading for Customer Application in MEC: A Hybrid SAC Scheme. IEEE Trans. Consum. Electron. 2024. [Google Scholar] [CrossRef]
- Sutton, R.S.; Precup, D.; Singh, S. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning. Artif. Intell. 1999, 112, 181–211. [Google Scholar] [CrossRef]
- Parr, R.; Russell, S. Reinforcement Learning with Hierarchies of Machines. Adv. Neural Inf. Process. Syst. 1997, 10, 1043–1049. [Google Scholar]
- Dietterich, T.G. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition. J. Artif. Intell. Res. 2000, 13, 227–303. [Google Scholar] [CrossRef]
- Al-Eryani, Y.; Hossain, E. Self-Organizing mmWave MIMO Cell-Free Networks With Hybrid Beamforming: A Hierarchical DRL-Based Design. IEEE Trans. Commun. 2022, 70, 3169–3185. [Google Scholar] [CrossRef]
- Jahanshahi, A.; Sabzi, H.Z.; Lau, C.; Wong, D. Gpu-Nest: Characterizing Energy Efficiency of Multi-Gpu Inference Servers. IEEE Comput. Archit. Lett. 2020, 19, 139–142. [Google Scholar] [CrossRef]
- Caron, E.; Chevalier, A.; Baillon-Bachoc, N.; Vion, A.L. Heuristic for License-Aware, Performant and Energy Efficient Deployment of Multiple Software in Cloud Architecture. In Proceedings of the 2021 12th International Conference on Information and Communication Systems (ICICS), Valencia, Spain, 24–26 May 2021; pp. 297–304. [Google Scholar]
- Chevalier, A.; Caron, E.; Baillon-Bachoc, N.; Vion, A.L. Towards Economic and Compliant Deployment of Licenses in a Cloud Architecture. In Proceedings of the 2018 IEEE 11th International Conference on Cloud Computing (CLOUD), San Francisco, CA, USA, 2–7 July 2018; pp. 718–724. [Google Scholar]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning, Second Edition: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.M.O.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
- Hasselt, H.v.; Guez, A.; Silver, D. Deep reinforcement learning with double Q-Learning. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; AAAI Press: Cambridge, MA, USA, 2016; pp. 2094–2100. [Google Scholar]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver BC, Canada, 8–14 December 2019; pp. 8026–8037. [Google Scholar]
- Kulkarni, T.D.; Narasimhan, K.R.; Saeedi, A.; Tenenbaum, J.B. Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. In Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; pp. 3682–3690. [Google Scholar]
Parameter | Numerical Values |
---|---|
Number of mobile terminals | 27 |
Number of MEC servers | 3 |
Number of service types | 5 |
Transmission power of MT | [1,2] W |
Transmission power of MEC server | 10 W |
Transmission rate between MEC server and cloud center | 20 Mbps |
Bandwidth of MEC server | 100 MHz |
Computational capacity of MT | 2 GHz |
Computational capacity of MEC server | [15,20] GHz |
Number of service authorizations | 3 |
Number of available resources in server | 3 |
Energy consumption of downloading and deploying a service | 5 |
Size of one task | [1,2] MB |
CPU cycles for one bit task | [100,200] cycle/bit |
Energy consumption coefficient of MT | J/Cycle |
Energy consumption coefficient of MEC server | J/Cycle |
Large-scale path loss parameter | 3 |
Parameter | Numerical Values |
---|---|
Replay buffer size | 20,000 |
Batch size | 256 |
Episode | 10,000 |
Dimension of hidden layer | 512 |
Learning rate | 0.0001 |
Discount factor | 0.95 |
0.9 | |
Soft update rate | 0.1 |
Explore noise | 0.1 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chen, T.; Ai, J.; Xiong, X.; Hu, G. Cooperative Service Caching and Task Offloading in Mobile Edge Computing: A Novel Hierarchical Reinforcement Learning Approach. Electronics 2025, 14, 380. https://doi.org/10.3390/electronics14020380
Chen T, Ai J, Xiong X, Hu G. Cooperative Service Caching and Task Offloading in Mobile Edge Computing: A Novel Hierarchical Reinforcement Learning Approach. Electronics. 2025; 14(2):380. https://doi.org/10.3390/electronics14020380
Chicago/Turabian StyleChen, Tan, Jiahao Ai, Xin Xiong, and Guangwu Hu. 2025. "Cooperative Service Caching and Task Offloading in Mobile Edge Computing: A Novel Hierarchical Reinforcement Learning Approach" Electronics 14, no. 2: 380. https://doi.org/10.3390/electronics14020380
APA StyleChen, T., Ai, J., Xiong, X., & Hu, G. (2025). Cooperative Service Caching and Task Offloading in Mobile Edge Computing: A Novel Hierarchical Reinforcement Learning Approach. Electronics, 14(2), 380. https://doi.org/10.3390/electronics14020380