Joint Optimization for Mobile Edge Computing-Enabled Blockchain Systems: A Deep Reinforcement Learning Approach
Abstract
:1. Introduction
- (1)
- A novel MEC-enabled blockchain framework in IoT networks is developed, considering the latency and scalability issues that arose from the throughput requirements of future wireless networks and blockchain systems. We analyze MEC computation efficiency and critical performance indicators of blockchain, i.e., decentralization, latency, throughput, and adversarial fraction, which can guide the joint optimization of the framework.
- (2)
- A novel MEC and blockchain joint optimization algorithm is developed for maximizing the computational efficiency of MEC and the transaction throughput of blockchain systems, which is formulated as an MDP problem. In contrast to most existing research [15,16,17] in which the modeling and optimization of MEC and blockchain systems are carried out independently, the block interval, block size, data transaction throughput, power allocation for local execution and task offloading, latency, and security constraints are jointly considered in the proposed algorithm. Therefore, we propose a more comprehensive scheme and address the blockchain deployment challenges in IoT scenarios.
- (3)
- The MDP problem is solved using a deep deterministic policy gradient (DDPG)-based learning algorithm to tackle the dynamic and large-dimensional properties of IoT networks that are intractable using classic learning approaches such as -learning [25]. In particular, the DDPG-based algorithm enables the joint resource allocation for MEC and blockchain systems in a continuous domain so as to solve the MDP problem with better convergence.
- (4)
- Extensive simulation findings demonstrate that the presented performance optimization framework has the capacity to enhance the transaction processing efficiency of MEC-enabled blockchain networks significantly. The superiority of the DDPG-based algorithm over the deep Q network (DQN)-based algorithm [26] and other conventional schemes is verified.
2. Related Works
2.1. Blockchain-Enabled IoT Networks
2.2. MEC-Enabled Blockchain for IoT Networks
3. Preliminaries on Blockchain
3.1. Overview of Blockchain
3.2. Consensus Protocol Based on DPoS and PBFT
4. System Model
4.1. Network Model
4.2. MEC Model
4.2.1. Local Computation
4.2.2. Edge Computation
4.3. Blockchain System
4.3.1. Decentralization
4.3.2. Latency Time to Finality and Throughput
4.3.3. Adversarial Fraction
5. DDPG-Based Performance Optimization Framework
5.1. Problem Formulation
5.2. State Space
5.3. Action Space
5.4. Reward Function
5.5. DDPG-Based Algorithm Design
5.5.1. DDPG Background
5.5.2. The Proposed Algorithm
Algorithm 1 DDPG-based Optimization Framework for MEC-enabled Blockchain IoT Systems. |
|
6. Simulation Results and Discussions
6.1. Simulation Setup
- (1)
- Greedy local-execution-first scheme (GD-Local): For each decision epoch n, each node agent m attempts to execute buffered data bits locally first, and then offloads the remainder of computation tasks to the MEC server.
- (2)
- Greedy computation-offloading-first scheme (GD-Offload): For each decision epoch n, the node agent m attempts to offload buffered task bits to the MEC server first, and then the remaining computation tasks are processed locally.
- (3)
- DQN-based dynamic offloading scheme (DQN): DQN is a DRL algorithm with a discrete action space [26]. Specifically, for each node agent m, we tune the power allocated for local computing and computation offloading from limited candidate sets and , respectively, in which denotes the total number of discrete levels.
6.2. Performance of the Proposed Scheme
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
BS | Base Station |
CPU | Central Processing Unit |
DDPG | Deep Deterministic Policy Gradient |
DPoS | Delegated Proof-of-Stake |
DQN | Deep Q Network |
DRL | Deep Reinforcement Learning |
GN | General Node |
IoT | Internet of Things |
LTF | Latency Time to Finality |
MAC | Message Authentication Code |
MDP | Markov Decision Process |
MEC | Mobile Edge Computing |
PBFT | Practical Byzantine Fault Tolerance |
PN | Primary Node |
PoS | Proof of Stake |
PoW | Proof of Work |
SINR | Signal to Interference-plus-Noise Ratio |
VN | Validation Node |
ZF | Zero Forcing |
References
- Zanella, A.; Bui, N.; Castellani, A.; Vangelista, L.; Zorzi, M. Internet of Things for smart cities. IEEE Internet Things J. 2014, 1, 22–32. [Google Scholar] [CrossRef]
- Kang, J.; Yu, R.; Huang, X.; Wu, M.; Maharjan, S.; Xie, S.; Zhang, Y. Blockchain for secure and efficient data sharing in vehicular edge computing and networks. IEEE Internet Things J. 2019, 6, 4660–4670. [Google Scholar] [CrossRef]
- Liu, H.; Zhang, Y.; Yang, T. Blockchain enabled security in electric vehicles cloud and edge computing. IEEE Netw. Mag. 2018, 32, 78–83. [Google Scholar] [CrossRef]
- Wang, H.; Osen, O.L.; Li, G.; Li, W.; Dai, H.; Zeng, W. Big data and industrial Internet of Things for the maritime industry in northwestern Norway. In Proceedings of the TENCON 2015—2015 IEEE Region 10 Conference, Macau, China, 1–4 November 2015; pp. 1–5. [Google Scholar] [CrossRef]
- He, J.; Wei, J.; Chen, K.; Tang, Z.; Zhou, Y.; Zhang, Y. Multitier fog computing with large-scale IoT data analytics for smart cities. IEEE Internet Things J. 2018, 5, 677–686. [Google Scholar] [CrossRef] [Green Version]
- Kshetri, N. Can blockchain strengthen the Internet of Things? IT Prof. 2017, 19, 68–72. [Google Scholar] [CrossRef] [Green Version]
- Pereira, F.; Crocker, P.; Leithardt, V.R. PADRES: Tool for PrivAcy, Data REgulation and Security. SoftwareX 2022, 17, 100895. [Google Scholar] [CrossRef]
- Miller, D. Blockchain and the Internet of Things in the industrial sector. IT Prof. 2018, 20, 15–18. [Google Scholar] [CrossRef]
- Aitzhan, N.Z.; Svetinovic, D. Security and privacy in decentralized energy trading through multi-signatures, blockchain and anonymous messaging streams. IEEE Trans. Dependable Secur. Comput. 2018, 15, 840–852. [Google Scholar] [CrossRef]
- Li, Z.; Kang, J.; Yu, R.; Ye, D.; Deng, Q.; Zhang, Y. Consortium blockchain for secure energy trading in industrial Internet of Things. IEEE Trans. Ind. Inform. 2018, 14, 3690–3700. [Google Scholar] [CrossRef] [Green Version]
- Liu, Y.; Wang, K.; Lin, Y.; Xu, W. LightChain: A Lightweight Blockchain System for Industrial Internet of Things. IEEE Trans. Ind. Inform. 2019, 15, 3571–3581. [Google Scholar] [CrossRef]
- Cong, L.W.; He, Z. Blockchain disruption and smart contracts. Rev. Financ. Stud. 2019, 32, 1754–1797. [Google Scholar] [CrossRef]
- Zheng, Z.; Xie, S.; Dai, H.; Chen, X.; Wang, H. An overview of blockchain technology: Architecture, consensus, and future trends. In Proceedings of the 2017 IEEE International Congress on Big Data (BigData Congress), Honolulu, HI, USA, 25–30 June 2017; pp. 557–564. [Google Scholar] [CrossRef]
- Chen, W.; Ma, M.; Ye, Y.; Zheng, Z.; Zhou, Y. IoT service based on joint cloud blockchain: The case study of smart traveling. In Proceedings of the 2018 IEEE Symposium on Service-Oriented System Engineering (SOSE), Bamberg, Germany, 26–29 March 2018; pp. 216–221. [Google Scholar] [CrossRef]
- Liu, M.; Yu, F.R.; Teng, Y.; Leung, V.C.M.; Song, M. Performance optimization for blockchain-enabled industrial Internet of Things (IIoT) systems: A deep reinforcement learning approach. IEEE Trans. Ind. Inf. 2019, 15, 3559–3570. [Google Scholar] [CrossRef]
- Kang, J.; Xiong, Z.; Niyato, D.; Ye, D.; Kim, D.I.; Zhao, J. Toward Secure Blockchain-Enabled Internet of Vehicles: Optimizing Consensus Management Using Reputation and Contract Theory. IEEE Trans. Veh. Technol. 2019, 68, 2906–2920. [Google Scholar] [CrossRef] [Green Version]
- Fan, K.; Wang, S.; Ren, Y.; Yang, K.; Yan, Z.; Li, H.; Yang, Y. Blockchain-Based Secure Time Protection Scheme in IoT. IEEE Internet Things J. 2019, 6, 4671–4679. [Google Scholar] [CrossRef]
- Li, W.; Su, Z.; Li, R.; Zhang, K.; Wang, Y. Blockchain-Based Data Security for Artificial Intelligence Applications in 6G Networks. IEEE Netw. 2020, 34, 31–37. [Google Scholar] [CrossRef]
- Satyanarayanan, M. The emergence of edge computing. Computer 2017, 50, 30–39. [Google Scholar] [CrossRef]
- Li, J.; Gao, H.; Lv, T.; Lu, Y. Deep reinforcement learning based computation offloading and resource allocation for MEC. In Proceedings of the IEEE Wireless Communications and Networking Conference (WCNC), Barcelona, Spain, 15–18 April 2018; pp. 1–6. [Google Scholar] [CrossRef]
- Zhao, N.; Wu, H.; Chen, Y. Coalition game-based computation resource allocation for wireless blockchain networks. IEEE Internet Things J. 2019, 6, 8507–8518. [Google Scholar] [CrossRef]
- Nguyen, D.C.; Pathirana, P.N.; Ding, M.; Seneviratne, A. Secure computation offloading in blockchain based IoT works with deep reinforcement learning. arXiv 2019, arXiv:1908.07466. [Google Scholar]
- Atzori, L.; Iera, A.; Morabito, G. The Internet of Things: A survey. Comput. Netw. 2010, 54, 2787–2805. [Google Scholar] [CrossRef]
- EOS. IO, EOS.IO Technical White Paper v2. Available online: https://github.com/EOSIO/Documentation/blob/master/TechnicalWhite-Paper.md (accessed on 16 March 2018).
- Watkins, C.J.; Dayan, P. Q-learning. Mach. Learn. 1992, 8, 279–292. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G. Human-level control through deep reinforcement learning. Nature 2015, 518, 529. [Google Scholar] [CrossRef]
- Sadawi, A.A.; Hassan, M.S.; Ndiaye, M. A Survey on the Integration of Blockchain With IoT to Enhance Performance and Eliminate Challenges. IEEE Access 2021, 9, 54478–54497. [Google Scholar] [CrossRef]
- Yang, Q.; Wang, H. Privacy-Preserving Transactive Energy Management for IoT-Aided Smart Homes via Blockchain. IEEE Internet Things J. 2021, 8, 11463–11475. [Google Scholar] [CrossRef]
- Nguyen, L.D.; Leyva-Mayorga, I.; Lewis, A.N.; Popovski, P. Modeling and Analysis of Data Trading on Blockchain-Based Market in IoT Networks. IEEE Internet Things J. 2021, 8, 6487–6497. [Google Scholar] [CrossRef]
- Cao, B.; Liu, W.; Peng, M. Integration of MEC and Blockchain. In Wireless Blockchain: Principles, Technologies and Applications IEEE; Wiley: Hoboken, NJ, USA, 2022; pp. 161–177. [Google Scholar] [CrossRef]
- Qiu, X.; Liu, L.; Chen, W.; Hong, Z.; Zheng, Z. Online Deep Reinforcement Learning for Computation Offloading in Blockchain-Empowered Mobile Edge Computing. IEEE Trans. Veh. Technol. 2019, 68, 8050–8062. [Google Scholar] [CrossRef]
- Guo, F.; Yu, F.R.; Zhang, H.; Ji, H.; Liu, M.; Leung, V.C.M. Adaptive Resource Allocation in Future Wireless Networks With Blockchain and Mobile Edge Computing. IEEE Trans. Wirel. Commun. 2020, 19, 1689–1703. [Google Scholar] [CrossRef]
- Chen, J.; Wang, W.; Zhou, Y.; Ahmed, S.H.; Wei, W. Exploiting 5G and Blockchain for Medical Applications of Drones. IEEE Netw. 2021, 35, 30–36. [Google Scholar] [CrossRef]
- Lv, Z.; Qiao, L.; Hossain, M.S.; Choi, B.J. Analysis of Using Blockchain to Protect the Privacy of Drone Big Data. IEEE Netw. 2021, 35, 44–49. [Google Scholar] [CrossRef]
- Wang, H.; Wang, Q.; He, D.; Li, Q.; Liu, Z. BBARS: Blockchain-Based Anonymous Rewarding Scheme for V2G Networks. IEEE Internet Things J. 2019, 6, 3676–3687. [Google Scholar] [CrossRef]
- Gu, W.; Li, J.; Tang, Z. A Survey on Consensus Mechanisms for Blockchain Technology. In Proceedings of the 2021 International Conference on Artificial Intelligence, Big Data and Algorithms, Xi’an, China, 28–30 May 2021; pp. 46–49. [Google Scholar] [CrossRef]
- Mingxiao, D.; Xiaofeng, M.; Zhe, Z.; Xiangwei, W.; Qijun, C. A review on consensus algorithm of blockchain. In Proceedings of the 2017 IEEE international conference on systems, man, and cybernetics (SMC), Banff, AB, Canada, 5–8 October 2017; pp. 2567–2572. [Google Scholar] [CrossRef]
- Castro, M.; Liskov, B. Practical Byzantine fault tolerance and proactive recovery. ACM Trans. Comput. Syst. 2002, 20, 398–461. [Google Scholar] [CrossRef]
- Meshcheryakov, Y.; Melman, A.; Evsutin, O.; Morozov, V.; Koucheryavy, Y. On Performance of PBFT Blockchain Consensus Algorithm for IoT-Applications With Constrained Devices. IEEE Access 2021, 9, 80559–80570. [Google Scholar] [CrossRef]
- Li, W.; Feng, C.; Zhang, L.; Xu, H.; Cao, B.; Imran, M.A. A Scalable Multi-Layer PBFT Consensus for Blockchain. IEEE Trans. Parallel Distrib. Syst. 2021, 32, 1146–1160. [Google Scholar] [CrossRef]
- Yang, F.; Zhou, W.; Wu, Q.; Long, R.; Xiong, N.N.; Zhou, M. Delegated Proof of Stake With Downgrade: A Secure and Efficient Blockchain Consensus Algorithm With Downgrade Mechanism. IEEE Access 2019, 7, 118541–118555. [Google Scholar] [CrossRef]
- Yoo, T.; Goldsmith, A. On the optimality of multiantenna broadcast scheduling using zero-forcing beamforming. IEEE J. Sel. Areas Commun. 2006, 24, 528–541. [Google Scholar] [CrossRef]
- Ngo, H.Q.; Larsson, E.G.; Marzetta, T.L. Energy and spectral efficiency of very large multiuser mimo systems. IEEE Trans. Wirel. Commun. 2013, 61, 1436–1449. [Google Scholar] [CrossRef] [Green Version]
- Suraweera, H.A.; Tsiftsis, T.A.; Karagiannidis, G.K.; Nallanathan, A. Effect of feedback delay on amplify-and-forward relay networks with beamforming. IEEE Trans. Veh. Technol. 2011, 60, 1265–1271. [Google Scholar] [CrossRef] [Green Version]
- Kwak, J.; Kim, Y.; Lee, J.; Chong, S. Dream: Dynamic resource and task allocation for energy minimization in mobile cloud systems. IEEE J. Sel. Areas Commun. 2015, 33, 2510–2523. [Google Scholar] [CrossRef]
- Burd, T.D.; Brodersen, R.W. Processor design for portable systems. J. VLSI Signal Process. Syst. Signal Image Video Technol. 1996, 13, 203–221. [Google Scholar] [CrossRef]
- Miettinen, A.P.; Nurminen, J.K. Energy efficiency of mobile clients in cloud computing. HotCloud 2010, 10. [Google Scholar] [CrossRef]
- Gini, C. Variability and mutability. J. R. Stat. Soc. 1913, 76, 619–622. [Google Scholar]
- Lin, Z.; Wen, F.; Ding, Y.; Xue, Y. Data-driven coherency identification for generators based on spectral clustering. IEEE Trans. Ind. Inform. 2018, 14, 1275–1285. [Google Scholar] [CrossRef]
- Dai, L.; Jia, Y.; Liang, L.; Chang, Z. Metric and control of system fairness in heterogeneous networks. In Proceedings of the 23rd Asia-Pacific Conference on Communications, Perth, Australia, 11–13 December 2017; pp. 1–5. [Google Scholar] [CrossRef]
- Clement, A.; Wong, E.; Alvisi, L.; Dahlin., M. Making Byzantine fault tolerant systems tolerate Byzantine faults. In Proceedings of the Usenix Symposium on Networked Systems Design and Implementation, Boston, MA, USA, 22–24 April 2009; pp. 153–168. [Google Scholar]
- Bagaria, V.; Kannan, S.; Tse, D.; Fanti, G.; Viswanath, P. Deconstructing the Blockchain to Approach Physical Limits. arXiv 2019, arXiv:1810.08092. [Google Scholar]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. In Proceedings of the International Conference on Learning Representations (ICLR), San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar] [CrossRef] [Green Version]
- Tse, D.; Viswanath, P. Fundamentals of Wireless Communication; Cambridge University Press: Cambridge, MA, USA, 2005. [Google Scholar] [CrossRef]
- Adelman, D.; Mersereau, A.J. Relaxations of weakly coupled stochastic dynamic programs. Oper. Res. 2008, 56, 712–727. [Google Scholar] [CrossRef] [Green Version]
- Martın, A.; Ashish, A. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. Available online: https://www.tensorflow.org/ (accessed on 9 November 2015).
- Kingma, D.P.; Adam, J.B. A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Uhlenbeck, G.E.; Ornstein, L.S. On the theory of the brownian motion. Phys. Rev. 1930, 36, 823. [Google Scholar] [CrossRef]
- Feng, J.; Yu, F.R.; Pei, Q.; Chu, X.; Du, J.; Zhu, L. Cooperative Computation Offloading and Resource Allocation for Blockchain-Enabled Mobile-Edge Computing: A Deep Reinforcement Learning Approach. IEEE Internet Things J. 2020, 7, 6214–6228. [Google Scholar] [CrossRef]
Work | Long-Term Reward | Joint Optimization | Blockchain Agent | State Space Design | ML Approach |
---|---|---|---|---|---|
[21] | × | × | × | / | × |
[22] | ✓ | × | × | DQN | |
[31] | ✓ | × | IoT Community | AGA | |
[32] | ✓ | ✓ | Base Station (BS) | Double-Dueling DQN | |
Our proposed scheme | ✓ | ✓ | IoT Device | DDPG |
Notation | Description |
---|---|
The set of GNs. | |
The set of VNs. | |
M | The number of GNs. |
K | The number of VNs. |
The number of time slots included in decision epoch n. | |
Channel vector between GN m and the BS at time slot . | |
Received signal of the BS at time slot . | |
Normalized temporal channel correlation coefficient of GN m. | |
Channel matrix from all the nodes to the BS at time slot . | |
The receiving SINR of GN m at time slot . | |
ZF detection vector for GN m at time slot . | |
Task arrival rate of GN m. | |
Queue length of GN m’s task buffer at decision epoch n. | |
Number of task arrivals of GN m at decision epoch n. | |
CPU frequency scheduled for local execution of GN m at time slot . | |
CPU cycles required per one task bit (allowable CPU-cycle frequency) at GN m. | |
Transmission power of GN m for computation offloading at decision epoch n. | |
Data transmitted by GN m for computation offloading at time slot (decision epoch n). | |
Power consumption of GN m for local execution at decision epoch n. | |
Data processed by GN m via local execution at time slot (decision epoch n). | |
Maximum transmission power (local execution power) of GN m. | |
Computation rate of GN m during time slot . | |
The set of stakes IoT nodes hold at decision epoch n. | |
Block interval of GN m (PN) at decision epoch n. | |
Latency time to finality at decision epoch n. | |
Block size at decision epoch n. | |
Blockchain transaction throughput at decision epoch n. |
Parameter | Value |
---|---|
The number of action levels in DQN, | 8 |
The path-loss constant in channel vector, | −30 dB |
The reference distance in channel vector, | 1 m |
The length of one time slot, | 10 ms |
The pass-loss exponent in channel vector, | 3 |
The channel correlation coefficient, | 0.95 |
The Doppler frequency of GN m, | 70 Hz |
System bandwidth, W | 1 MHz |
Average transaction size, | 200 B |
The maximum transmission power, | 4 W |
The maximum power available for local execution, | 4 W |
The maximum number of time slots for one epoch, | 20 |
The noise power, | W |
The effective switched capacitance, | |
The required CPU cycles for each task bit, | 500 cycles/bit |
The maximum allowable CPU-cycle frequency, | 1.26 GHz |
The weight factor, | 0.5 |
The soft update rate for the target networks, | 0.001 |
The number of episodes, | 1000 |
The maximum steps of each episode, | 1000 |
The task arrival rate, | 2.5 Mbps |
The thresholds of decentralization, | 0.2, 0.3 |
DDPG | DQN | GD-Offload | GD-Local | ||
---|---|---|---|---|---|
Traning episodes | 100 | 3856.02 | 3381.07 | 2784.23 | 2371.89 |
500 | 4156.14 | 3414.74 | 2885.67 | 2357.42 | |
1000 | 4177.97 | 3445.40 | 3012.81 | 2519.65 | |
Task arrival rate/Mbps | 1.5 | 1742.25 | 1683.47 | 1460.23 | 1310.20 |
2.5 | 3950.14 | 3430.28 | 2955.37 | 2621.09 | |
3.5 | 6940.87 | 5687.55 | 4554.54 | 3989.32 | |
Threshold of LTF/ms | 2.0 | 3761.19 | 3289.24 | 2691.85 | 2478.72 |
7.5 | 4443.35 | 3644.41 | 3250.13 | 2799.44 | |
12.5 | 4857.52 | 4228.17 | 3569.60 | 3070.02 | |
Local power consumption limit/watt | 2 | 3444.58 | 2954.47 | 2686.43 | 2219.36 |
5 | 4133.25 | 3471.36 | 3053.31 | 2684.68 | |
8 | 4317.63 | 3698.01 | 3223.30 | 2916.34 | |
The number of mobile users | 10 | 3950.56 | 3430.09 | 2955.40 | 2621.34 |
20 | 7461.89 | 6862.15 | 6110.04 | 4630.71 | |
30 | 10,006.53 | 8553.49 | 8742.11 | 4621.98 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hu, Z.; Gao, H.; Wang, T.; Han, D.; Lu, Y. Joint Optimization for Mobile Edge Computing-Enabled Blockchain Systems: A Deep Reinforcement Learning Approach. Sensors 2022, 22, 3217. https://doi.org/10.3390/s22093217
Hu Z, Gao H, Wang T, Han D, Lu Y. Joint Optimization for Mobile Edge Computing-Enabled Blockchain Systems: A Deep Reinforcement Learning Approach. Sensors. 2022; 22(9):3217. https://doi.org/10.3390/s22093217
Chicago/Turabian StyleHu, Zhuoer, Hui Gao, Taotao Wang, Daoqi Han, and Yueming Lu. 2022. "Joint Optimization for Mobile Edge Computing-Enabled Blockchain Systems: A Deep Reinforcement Learning Approach" Sensors 22, no. 9: 3217. https://doi.org/10.3390/s22093217
APA StyleHu, Z., Gao, H., Wang, T., Han, D., & Lu, Y. (2022). Joint Optimization for Mobile Edge Computing-Enabled Blockchain Systems: A Deep Reinforcement Learning Approach. Sensors, 22(9), 3217. https://doi.org/10.3390/s22093217