Distributed Inference Models and Algorithms for Heterogeneous Edge Systems Using Deep Learning
Abstract
:1. Introduction
2. Related Work
2.1. DNN Model Modification
2.2. Cloud–Edge Collaboration
2.3. Distributed Inference
3. System Model and Problem Description
3.1. System Overview
3.2. Estimation of Inference Latency and Energy Consumption of Edge Devices
3.2.1. Latency
3.2.2. Energy Consumption
3.3. Problem Description
3.4. Linear Programming Relaxation Algorithm
3.5. Workload-Partitioning Method Based on Linear Programming Relaxation
Algorithm 1 Workload Partition Algorithm (WPA) |
|
4. Convolution-Layer Partitioning Algorithm
4.1. OD-FTP
Algorithm 2 OD-FTP |
|
4.2. Convolutional-Layer Partitioning Method Based on the OD-FTP Algorithm
Algorithm 3 Layer Fused Partitioning Algorithm (LFP) |
|
5. Experimental Results and Analysis
5.1. Evaluation Metrics
5.2. Experimental Setup
5.3. Effectiveness of the LFP Method
5.4. Efficiency of the LFP Method
6. Discussion
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Zhou, Z.; Chen, X.; Li, E.; Zeng, L.; Luo, K.; Zhang, J. Edge intelligence: Paving the last mile of artificial intelligence with edge computing. Proc. IEEE 2019, 107, 1738–1762. [Google Scholar] [CrossRef]
- Sze, V.; Chen, Y.H.; Yang, T.J.; Emer, J.S. Efficient processing of deep neural networks: A tutorial and survey. Proc. IEEE 2017, 105, 2295–2329. [Google Scholar] [CrossRef]
- Deng, L.; Li, G.; Han, S.; Shi, L.; Xie, Y. Model compression and hardware acceleration for neural networks: A comprehensive survey. Proc. IEEE 2020, 108, 485–532. [Google Scholar] [CrossRef]
- Yin, M.; Sui, Y.; Liao, S.; Yuan, B. Towards efficient tensor decomposition-based dnn model compression with optimization framework. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 10674–10683. [Google Scholar] [CrossRef]
- Ruiquan, L.; Lu, Z.; Yuanyuan, L. Deep Neural Network Channel Pruning Compression Method for Filter Elasticity. J. Comput. Eng. Appl. 2024, 60, 163. [Google Scholar] [CrossRef]
- Han, S.; Mao, H.; Dally, W.J. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv 2015, arXiv:1510.00149. [Google Scholar] [CrossRef]
- Blakeney, C.; Li, X.; Yan, Y.; Zong, Z. Parallel blockwise knowledge distillation for deep neural network compression. IEEE Trans. Parallel Distrib. Syst. 2020, 32, 1765–1776. [Google Scholar] [CrossRef]
- Hinton, G. Distilling the Knowledge in a Neural Network. arXiv 2015, arXiv:1503.02531. [Google Scholar] [CrossRef]
- Han, S.; Shen, H.; Philipose, M.; Agarwal, S.; Wolman, A.; Krishnamurthy, A. Mcdnn: An approximation-based execution framework for deep stream processing under resource constraints. In Proceedings of the 14th Annual International Conference on Mobile Systems, Applications, and Services, Singapore, 26–30 June 2016; pp. 123–136. [Google Scholar] [CrossRef]
- Kang, Y.; Hauswald, J.; Gao, C.; Rovinski, A.; Mudge, T.; Mars, J.; Tang, L. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. ACM Sigarch Comput. Archit. News 2017, 45, 615–629. [Google Scholar] [CrossRef]
- Teerapittayanon, S.; McDanel, B.; Kung, H.T. Distributed deep neural networks over the cloud, the edge and end devices. In Proceedings of the 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), Atlanta, GA, USA, 5–8 June 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 328–339. [Google Scholar] [CrossRef]
- Mao, J.; Chen, X.; Nixon, K.W.; Krieger, C.; Chen, Y. Modnn: Local distributed mobile computing system for deep neural network. In Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE), Lausanne, Switzerland, 27–31 March 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1396–1401. [Google Scholar] [CrossRef]
- Mao, J.; Yang, Z.; Wen, W.; Wu, C.; Song, L.; Nixon, K.W.; Chen, X.; Li, H.; Chen, Y. Mednn: A distributed mobile system with enhanced partition and deployment for large-scale dnns. In Proceedings of the 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), Irvine, CA, USA, 13–16 November 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 751–756. [Google Scholar] [CrossRef]
- Zhao, Z.; Barijough, K.M.; Gerstlauer, A. Deepthings: Distributed adaptive deep learning inference on resource-constrained iot edge clusters. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 2018, 37, 2348–2359. [Google Scholar] [CrossRef]
- Zeng, L.; Chen, X.; Zhou, Z.; Yang, L.; Zhang, J. Coedge: Cooperative dnn inference with adaptive workload partitioning over heterogeneous edge devices. IEEE/ACM Trans. Netw. 2020, 29, 595–608. [Google Scholar] [CrossRef]
- Fang, W.; Xu, W.; Yu, C.; Xiong, N.N. Joint architecture design and workload partitioning for dnn inference on industrial iot clusters. ACM Trans. Internet Technol. 2023, 23, 1–21. [Google Scholar] [CrossRef]
- Zhou, L.; Samavatian, M.H.; Bacha, A.; Majumdar, S.; Teodorescu, R. Adaptive parallel execution of deep neural networks on heterogeneous edge devices. In Proceedings of the 4th ACM/IEEE Symposium on Edge Computing, Washington, DC, USA, 7–9 November 2019; pp. 195–208. [Google Scholar] [CrossRef]
- Luo, H.; Chen, T.; Li, X.; Li, S.; Zhang, C.; Zhao, G.; Liu, X. KeepEdge: A Knowledge Distillation Empowered Edge Intelligence Framework for Visual Assisted Positioning in UAV Delivery. IEEE Trans. Mob. Comput. 2023, 22, 4729–4741. [Google Scholar] [CrossRef]
- Chen, C.; Jiang, B.; Liu, S.; Li, C.; Wu, C.; Yin, R. Efficient Federated Learning using Random Pruning in Resource-Constrained Edge Intelligence Networks. In Proceedings of the GLOBECOM 2023—2023 IEEE Global Communications Conference, Kuala Lumpur, Malaysia, 4–8 December 2023; pp. 5244–5249. [Google Scholar] [CrossRef]
- Pang, B.; Liu, S.; Wang, H.; Guo, B.; Wang, Y.; Wang, H.; Sheng, Z.; Wang, Z.; Yu, Z. AdaMEC: Towards a Context-adaptive and Dynamically Combinable DNN Deployment Framework for Mobile Edge Computing. ACM Trans. Sens. Netw. 2023, 20, 1–28. [Google Scholar] [CrossRef]
- Ren, W.; Qu, Y.; Qin, Z.; Dong, C.; Zhou, F.; Zhang, L.; Wu, Q. Efficient Pipeline Collaborative DNN Inference in Resource-Constrained UAV Swarm. In Proceedings of the 2024 IEEE Wireless Communications and Networking Conference (WCNC), Dubai, United Arab Emirates, 21–24 April 2024; pp. 1–6. [Google Scholar] [CrossRef]
- Hou, X.; Guan, Y.; Han, T.; Zhang, N. DistrEdge: Speeding up Convolutional Neural Network Inference on Distributed Edge Devices. In Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Lyon, France, 30 May–3 June 2022; pp. 1097–1107. [Google Scholar] [CrossRef]
- Han, B.; Dai, P.; Li, K.; Zhao, K.; Lei, X. SDPMP: Inference Acceleration of CNN Models in Heterogeneous Edge Environment. In Proceedings of the 2024 7th World Conference on Computing and Communication Technologies (WCCCT), Chengdu, China, 12–14 April 2024; pp. 194–198. [Google Scholar] [CrossRef]
- Hu, C.; Li, B. Distributed Inference with Deep Learning Models across Heterogeneous Edge Devices. In Proceedings of the IEEE INFOCOM 2022—IEEE Conference on Computer Communications, Virtual, 2–5 May 2022; pp. 330–339. [Google Scholar] [CrossRef]
- Molchanov, P.; Tyree, S.; Karras, T.; Aila, T.; Kautz, J. Pruning convolutional neural networks for resource efficient inference. arXiv 2016, arXiv:1611.06440. [Google Scholar] [CrossRef]
- Dantzig, G.B. Linear programming and extensions. In Linear Programming and Extensions; Princeton University Press: Princeton, NJ, USA, 2016. [Google Scholar] [CrossRef]
- Lyken17. Pytorch-OpCounter. 2020. Available online: https://github.com/Lyken17/pytorch-OpCounter (accessed on 11 December 2023).
- Shenzhen Beikun Cloud Computing Co., Ltd. Bei Kunyun Supercomputing Platform. 2019. Available online: https://www.bkunyun.com/ (accessed on 11 September 2023).
- Bossard, L.; Guillaumin, M.; Van Gool, L. Food-101–mining discriminative components with random forests. In Proceedings of the Computer vision–ECCV 2014: 13th European conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part VI 13. Springer: Berlin/Heidelberg, Germany, 2014; pp. 446–461. [Google Scholar] [CrossRef]
Variable Symbol | Meaning | Variable Symbol | Meaning |
---|---|---|---|
Edge device number | The number of input channels for the ’s input feature map | ||
workload-partitioning decision | The number of output channels for the ’s output feature map | ||
The j-th layer of the network in | DNN model | ||
H | The height of the output feature map of | The computational power of | |
W | The width of the output feature map of | The transmission power of | |
K | The kernel size of the convolutional layer | B | The bandwidth between and |
The coefficients of the linear regression model of | The coefficients of the linear regression model of | ||
p | The partition points of the network layer | The proportion covered by the workload-partitioning decision of |
Type | CPU | Memory | Number of Instances |
---|---|---|---|
V1 | 4 Core Intel Xeon Gold 6149 2.5 GHz (Intel, Santa Clara, CA, USA) | 4 GB | 2 |
V2 | 8 Core Intel Xeon (Skylake) Platinum 8163 2.5 GHz | 8 GB | 2 |
V3 | 8 Core Intel Xeon Platinum 8369 3.3 GHz | 16 GB | 1 |
Parameter | Value | Definition |
---|---|---|
B | 125 | Total channel bandwidth (MB/s) |
N | 5 | Number of edge device |
{10, 10, 12} | The computing power (W) of different types | |
{2, 2, 3} | The transmission power (W) of different types | |
p | {17, 19, 6, 7} | The partition points for VGG16, VGG19, ResNet18, and ResNet34 |
(0.5, 0.5) | The weights of latency and energy consumption |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yuan, Q.; Li, Z. Distributed Inference Models and Algorithms for Heterogeneous Edge Systems Using Deep Learning. Appl. Sci. 2025, 15, 1097. https://doi.org/10.3390/app15031097
Yuan Q, Li Z. Distributed Inference Models and Algorithms for Heterogeneous Edge Systems Using Deep Learning. Applied Sciences. 2025; 15(3):1097. https://doi.org/10.3390/app15031097
Chicago/Turabian StyleYuan, Qingqing, and Zhihua Li. 2025. "Distributed Inference Models and Algorithms for Heterogeneous Edge Systems Using Deep Learning" Applied Sciences 15, no. 3: 1097. https://doi.org/10.3390/app15031097
APA StyleYuan, Q., & Li, Z. (2025). Distributed Inference Models and Algorithms for Heterogeneous Edge Systems Using Deep Learning. Applied Sciences, 15(3), 1097. https://doi.org/10.3390/app15031097