Runtime Adaptive IoMT Node on Multi-Core Processor Platform
Abstract
:1. Introduction
2. Related Works
- We presented a hardware/software/firmware architecture template involving a remotely controlled wearable IoMT device that performs cognitive data processing.
- Its validation on a state-of-the-art data analysis based on a CNN as an example computational load.
- Evaluation of the effectiveness of dynamic optimization techniques on multi-core devices using anomaly classification from an ECG signal as a use case.
3. Reference Architecture
3.1. Sensor Nodes
- Modern multi-processor IoT nodes, especially the plethora of prototype solutions currently designed by the community to support AI-related workloads and optimized for low power, have limited OS support. To try our approach on Orlando, we had to implement adaptivity on a bare-metal system, exploiting the platform-specific set of APIs to manage the application model, the process network, and the related operating modes.
- The availability of more cores requires, when switching operating mode, the adaptation of the parallelism level exploitable within the application structure. The workload imposed by a given mode must be optimally partitioned between the available processing elements, using splitting/merging and pipeline methods.
3.1.1. Hardware Platform
- An on-chip reconfigurable data-transfer fabric to improve data reuse and reduce on-chip and off-chip memory traffic.
- An ARM-based host subsystem with peripherals.
- A range of high-speed IO interfaces for imaging and other types of sensors.
- A chip-to-chip multi-link to pair multiple devices together.
- A power-efficient array of Digital Signal Processors (DSPs) to support complete real-world computer vision applications. Eight DSP clusters are present, each composed by 2 DSPs, 4-way 16 kB instruction caches, 64 kB local RAMs, and a 64 kB shared RAM.
- MB SRAM banks.
3.1.2. Middleware
3.1.3. Application Model
- Get data task: deals with the sampling of the signal acquired by the sensor.
- Process task: there may be multiple processing tasks that allow multiple processing levels to be enabled. The choice of a different level of processing affects the required transmission bandwidth, the detail of the information that can be obtained from the node, and the energy consumption.
- Threshold task: allows filtering the information transmitted to the cloud, further reducing the energy consumption related to the communication with the server. In fact, once the signal has been processed, the system evaluates whether it is useful to send the results or not.
- Send task: takes care of packaging and sending data to the cloud.
- enabling/stopping the periodic execution of the involved task;
- reconfiguring the FIFOs to reshape the process chain accordingly.
3.1.4. Adaptive Runtime Manager
- receiving reconfiguration messages from the cloud.
- the workload. For example, a variation of the detection rate of an event in the acquired signal can lead to a more frequent invocation of a processing task. Consequently, there may be a need to reconfigure the device in case the real-time constraints are no longer respected.
- other system variables, such as the remaining battery charge.
- act on individual tasks or on the entire task chain by enabling or disabling the constituent elements;
- decide when to enable the sleep mode state of peripherals, computing units, or the entire device;
- act on the system frequency and supply voltage;
- reroute the FIFOs data flow according to the selected operating mode;
- efficiently split the workload into available resources.
4. Adaptivity in Advanced Multi-Core Hardware Platforms
4.1. ADAM for Multi-Cores
4.2. Splitting Policies
- the first policy, which we call ADAM-FF (Frequency First), which tries to minimize the working system frequency as the main objective;
- the second policy, that we call ADAM-IF (Idle First), which is more indicated for systems that have less reactive frequency management, which tries to set as many processing elements as possible in sleep mode.
Algorithm 1: ADAM-FF policy algorithm. |
Algorithm 2: ADAM-IF policy algorithm |
4.3. Splitting Model on Orlando
- The ADAM system constantly calculates how many helper cores must be enabled for each i-th block;
- The core dedicated to the i-th block is constantly informed by the ADAM system on how many helper cores are assigned to its block. Within this core, as many rpc_call() functions are performed as there are helper cores specified by ADAM on the i-th block;
- Furthermore, the core dedicated to the i-th block takes care of passing data in a coherent way to the helper core. For example, if two helper cores are assigned for the i-th block, the convolutional kernel pointer of each rpc_call() function that awakens a core helper is changed; in particular, a third of the kernels is associated with each core helper (so that each core helper calculates one-third of the output features), and the remaining third is used within the calling core.
5. Results
5.1. Use Case
- Get data: takes care of acquiring the signal from the AD8232 module;
- Peak: analyzes the ECG signal to detect peaks and calculate the heart rate, and the amount of information sent to the server is greatly reduced;
- CNN: using cognitive analysis based on concurrent neural networks, cardiac abnormalities are detected in the ECG tracing. Signal frames around the peaks detected by the previous processing task are considered. Also in this case, the amount of information sent to the server is greatly reduced;
- Threshold: decides whether or not the results from the enabled processing levels should be sent to the cloud; for example, if the heart rate is within a normal range there is no need to transmit the data;
- Send: packages and sends the data to the server.
5.2. Experimental Setup
- FT: splitting support is not available, and the application is split according to the baseline setup, and, to meet real-time constraints for the maximum heart rate (200 bpm), a starting system frequency is also selected. The resulting mapping is kept equal during execution, while the frequency can be tuned to optimize consumption.
- SSF: no frequency scaling neither splitting support are available. The baseline setup is used for splitting application, and the system frequency is then selected in order to meet real-time constraints for the given maximum heart rate (200 bpm). The resulting mapping and frequency are kept equal during execution.
- ADAM-FF: the proposed ADAM approach for runtime adaptation is enabled, and the frequency-first policy is considered, with the main goal of minimizing the system operating frequency while meeting real-time constraints.
- ADAM-IF: the proposed ADAM approach for runtime adaptation is enabled, and the idle-first policy is considered, with the main goal of minimizing the number of idle cores while meeting real-time constraints.
5.3. Single Channel
5.4. Multi-Channel
6. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Research, A.M. Internet of Things (IoT) Healthcare Market-Global Opportunity Analysis and Industry Forecast, 2014–2020. 2016. Available online: https://www.alliedmarketresearch.com/iot-healthcare-market (accessed on 3 August 2021).
- Maskeliūnas, R.; Damaševičius, R.; Segal, S. A Review of Internet of Things Technologies for Ambient Assisted Living Environments. Future Internet 2019, 11, 259. [Google Scholar] [CrossRef] [Green Version]
- Zouai, M.; Kazar, O.; Haba, B.; Saouli, H. Smart house simulation based multi-agent system and internet of things. In Proceedings of the 2017 International Conference on Mathematics and Information Technology (ICMIT), Adrar, Algeria, 4–5 December 2017; pp. 201–203. [Google Scholar] [CrossRef]
- Scrugli, M.A.; Loi, D.; Raffo, L.; Meloni, P. A Runtime-Adaptive Cognitive IoT Node for Healthcare Monitoring; Association for Computing Machinery: New York, NY, USA, 2019; pp. 350–357. [Google Scholar] [CrossRef]
- Yang, Z.; Zhou, Q.; Lei, L.; Zheng, K.; Xiang, W. An IoT-cloud Based Wearable ECG Monitoring System for Smart Healthcare. J. Med. Syst. 2016, 40, 286. [Google Scholar] [CrossRef] [PubMed]
- Roberts, L.; Michalák, P.; Heaps, S.; Trenell, M.; Wilkinson, D.; Watson, P. Automating the Placement of Time Series Models for IoT Healthcare Applications. In Proceedings of the 2018 IEEE 14th International Conference on e-Science (e-Science), Amsterdam, The Netherlands, 29 October–1 November 2018; pp. 290–291. [Google Scholar] [CrossRef] [Green Version]
- Macis, S.; Loi, D.; Pani, D.; Raffo, L.; Manna, S.L.; Cestone, V.; Guerri, D. Home telemonitoring of vital signs through a TV-based application for elderly patients. In Proceedings of the 2015 IEEE International Symposium on Medical Measurements and Applications (MeMeA) Proceedings, Torino, Italy, 7–9 May 2015; pp. 169–174. [Google Scholar] [CrossRef]
- Kaewkannate, K.; Kim, S. The Comparison of Wearable Fitness Devices; IntechOpen: London, UK, 2018. [Google Scholar] [CrossRef] [Green Version]
- Kaewkannate, K.; Kim, S.C. A comparison of wearable fitness devices. BMC Public Health 2016, 16, 433. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ghasemzadeh, H.; Jafari, R. Ultra Low-power Signal Processing in Wearable Monitoring Systems: A Tiered Screening Architecture with Optimal Bit Resolution. ACM Trans. Embed. Comput. Syst. 2013, 13, 9:1–9:23. [Google Scholar] [CrossRef]
- Tekeste, T.; Saleh, H.; Mohammad, B.; Ismail, M. Ultra-Low Power QRS Detection and ECG Compression Architecture for IoT Healthcare Devices. IEEE Trans. Circuits Syst. I Regul. Pap. 2019, 66, 669–679. [Google Scholar] [CrossRef]
- Wang, C.; Qin, Y.; Jin, H.; Kim, I.; Granados Vergara, J.D.; Dong, C.; Jiang, Y.; Zhou, Q.; Li, J.; He, Z.; et al. A Low Power Cardiovascular Healthcare System with Cross-layer Optimization from Sensing Patch to Cloud Platform. IEEE Trans. Biomed. Circuits Syst. 2019, 13, 314–329. [Google Scholar] [CrossRef]
- Adimulam, M.K.; Srinivas, M.B. Ultra Low Power Programmable Wireless ExG SoC Design for IoT Healthcare System. In Wireless Mobile Communication and Healthcare; Perego, P., Rahmani, A.M., TaheriNejad, N., Eds.; Springer: Cham, Switzerland, 2018; pp. 41–49. [Google Scholar]
- Labati, R.D.; Muñoz, E.; Piuri, V.; Sassi, R.; Scotti, F. Deep-ECG: Convolutional Neural Networks for ECG biometric recognition. Pattern Recognit. Lett. 2018, 126, 78–85. [Google Scholar] [CrossRef]
- Baloglu, U.B.; Talo, M.; Yildirim, O.; Tan, R.S.; Acharya, U.R. Classification of myocardial infarction with multi-lead ECG signals and deep CNN. Pattern Recognit. Lett. 2019, 122, 23–30. [Google Scholar] [CrossRef]
- Li, Y.; Pang, Y.; Wang, J.; Li, X. Patient-specific ECG classification by deeper CNN from generic to dedicated. Neurocomputing 2018, 314, 336–346. [Google Scholar] [CrossRef]
- Tabal, K.M.R.; Caluyo, F.S.; Ibarra, J.B.G. Microcontroller-Implemented Artificial Neural Network for Electrooculography-Based Wearable Drowsiness Detection System. In Advanced Computer and Communication Engineering Technology; Sulaiman, H.A., Othman, M.A., Othman, M.F.I., Rahim, Y.A., Pee, N.C., Eds.; Springer: Cham, Switzerland, 2016; pp. 461–472. [Google Scholar]
- Magno, M.; Pritz, M.; Mayer, P.; Benini, L. DeepEmote: Towards multi-layer neural networks in a low power wearable multi-sensors bracelet. In Proceedings of the 2017 7th IEEE International Workshop on Advances in Sensors and Interfaces (IWASI), Vieste, Italy, 15–16 June 2017; pp. 32–37. [Google Scholar] [CrossRef]
- Flamand, E.; Rossi, D.; Conti, F.; Loi, I.; Pullini, A.; Rotenberg, F.; Benini, L. GAP-8: A RISC-V SoC for AI at the Edge of the IoT. In Proceedings of the 2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP), Milano, Italy, 10–12 July 2018; pp. 1–4. [Google Scholar] [CrossRef]
- Desoli, G.; Chawla, N.; Boesch, T.; Singh, S.; Guidetti, E.; De Ambroggi, F.; Majo, T.; Zambotti, P.; Ayodhyawasi, M.; Singh, H.; et al. 14.1 A 2.9TOPS/W deep convolutional neural network SoC in FD-SOI 28 nm for intelligent embedded systems. In Proceedings of the 2017 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA, 5–9 February 2017; pp. 238–239. [Google Scholar] [CrossRef]
- Google®. Google TPU. 2020. Available online: https://cloud.google.com/tpu (accessed on 3 August 2021).
- NVIDIA®. Embedded Systems for Next-Generation Autonomous Machines. 2019. Available online: https://www.nvidia.com/en-us/autonomous-machines/embedded-systems (accessed on 3 August 2021).
- Meloni, P.; Capotondi, A.; Deriu, G.; Brian, M.; Conti, F.; Rossi, D.; Raffo, L.; Benini, L. NEURAghe: Exploiting CPU-FPGA Synergies for Efficient and Flexible CNN Inference Acceleration on Zynq SoCs. ACM Trans. Reconfig. Technol. Syst. 2018, 11, 1–24. [Google Scholar] [CrossRef] [Green Version]
- Vissers, K. Versal: The Xilinx Adaptive Compute Acceleration Platform (ACAP). In Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA ’19, Seaside, CA, USA, 24–26 February 2019; Association for Computing Machinery: New York, NY, USA, 2019; p. 83. [Google Scholar] [CrossRef]
- Przybył, A. Fixed-Point Arithmetic Unit with a Scaling Mechanism for FPGA-Based Embedded Systems. Electronics 2021, 10, 1164. [Google Scholar] [CrossRef]
- NVIDIA®. NVIDIA cuDNN. 2020. Available online: https://developer.nvidia.com/cudnn (accessed on 3 August 2021).
- arm Developer. Cortex Microcontroller Software Interface Standard. 2016. Available online: https://developer.arm.com/tools-and-software/embedded/cmsis (accessed on 3 August 2021).
- Liang, T.; Glossner, J.; Wang, L.; Shi, S.; Zhang, X. Pruning and Quantization for Deep Neural Network Acceleration: A Survey. arXiv 2021, arXiv:cs.CV/2101.09671. [Google Scholar]
- Ward-Foxton, S. Artificial Intelligence Gets Its Own System of Numbers. 2020. Available online: https://www.eetimes.com/artificial-intelligence-gets-its-own-system-of-numbers/ (accessed on 3 August 2021).
- Dziwiński, P.; Przybył, A.; Trippner, P.; Paszkowski, J.; Hayashi, Y. Hardware Implementation of a Takagi-Sugeno Neuro-Fuzzy System Optimized by a Population Algorithm. J. Artif. Intell. Soft Comput. Res. 2021, 11, 243–266. [Google Scholar] [CrossRef]
- Tuveri, G.; Meloni, P.; Palumbo, F.; Seu, G.P.; Loi, I.; Conti, F.; Raffo, L. On-the-fly adaptivity for process networks over shared-memory platforms. Microprocess. Microsyst. 2016, 46, 240–254. [Google Scholar] [CrossRef]
- Jahn, J.; Henkel, J. Pipelets: Self-organizing software Pipelines for many-core architectures. In Proceedings of the 2013 Design, Automation Test in Europe Conference Exhibition (DATE), Grenoble, France, 18–22 March 2013; pp. 1516–1521. [Google Scholar] [CrossRef]
- Choi, Y.; Li, C.H.; Silva, D.D.; Bivens, A.; Schenfeld, E. Adaptive Task Duplication Using On-Line Bottleneck Detection for Streaming Applications. In Proceedings of the 9th Conference on Computing Frontiers, CF ’12, Cagliari, Italy, 15–17 May 2012; Association for Computing Machinery: New York, NY, USA, 2012; pp. 163–172. [Google Scholar] [CrossRef]
- arm. Arm Compute Library. Available online: https://developer.arm.com/ip-products/processors/machine-learning/compute-library (accessed on 3 August 2021).
- OAID. Tengine. Available online: https://github.com/OAID/Tengine (accessed on 3 August 2021).
- Tencent. NCNN. Available online: https://github.com/Tencent/ncnn (accessed on 3 August 2021).
- Wang, S.; Ananthanarayanan, G.; Zeng, Y.; Goel, N.; Pathania, A.; Mitra, T. High-Throughput CNN Inference on Embedded ARM big.LITTLE Multi-Core Processors. arXiv 2019, arXiv:1903.05898. [Google Scholar] [CrossRef]
- Wu, H.I.; Guo, D.Y.; Chin, H.H.; Tsay, R.S. A Pipeline-Based Scheduler for Optimizing Latency of Convolution Neural Network Inference over Heterogeneous Multicore Systems. In Proceedings of the 2020 2nd IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS), Genova, Italy, 31 August–2 September 2020; pp. 46–49. [Google Scholar] [CrossRef]
- Huang, H.; Chaturvedi, V.; Quan, G.; Fan, J.; Qiu, M. Throughput Maximization for Periodic Real-Time Systems under the Maximal Temperature Constraint. ACM Trans. Embed. Comput. Syst. 2014, 13, 1–22. [Google Scholar] [CrossRef]
- Yu, H.; Ha, Y.; Wang, J. Thermal-aware frequency scaling for adaptive workloads on heterogeneous MPSoCs. In Proceedings of the 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE), Dresden, Germany, 24–28 March 2014; pp. 1–6. [Google Scholar] [CrossRef]
- Yu, H.; Ha, Y.; Wang, J. Quality Optimization of Resilient Applications under Temperature Constraints. In Proceedings of the Computing Frontiers Conference, CF’17, Siena, Italy, 15–17 May 2017; Association for Computing Machinery: New York, NY, USA, 2017; pp. 9–16. [Google Scholar] [CrossRef]
- Ma, Y.; Chantem, T.; Dick, R.P.; Hu, S. Improving System-Level Lifetime Reliability of Multicore Soft Real-Time Systems. IEEE Trans. Very Large Scale Integr. VLSI Syst. 2017, 25, 1895–1905. [Google Scholar] [CrossRef]
- Weissel, A.; Bellosa, F. Process Cruise Control: Event-Driven Clock Scaling for Dynamic Power Management; Association for Computing Machinery: New York, NY, USA, 2002. [Google Scholar] [CrossRef]
- Vogeleer, K.D.; Memmi, G.; Jouvelot, P.; Coelho, F. The Energy/Frequency Convexity Rule: Modeling and Experimental Validation on Mobile Devices. arXiv 2014, arXiv:cs.OH/1401.4655. [Google Scholar]
- Nabavinejad, S.M.; Hafez-Kolahi, H.; Reda, S. Coordinated DVFS and Precision Control for Deep Neural Networks. IEEE Comput. Archit. Lett. 2019, 18, 136–140. [Google Scholar] [CrossRef]
- Motamedi, M.; Fong, D.; Ghiasi, S. Machine Intelligence on Resource-Constrained IoT Devices: The Case of Thread Granularity Optimization for CNN Inference. ACM Trans. Embed. Comput. Syst. 2017, 16, 1–19. [Google Scholar] [CrossRef]
- Bong, K.; Choi, S.; Kim, C.; Yoo, H.J. Low-Power Convolutional Neural Network Processor for a Face-Recognition System. IEEE Micro 2017, 37, 30–38. [Google Scholar] [CrossRef]
- Santoro, G.; Casu, M.R.; Peluso, V.; Calimera, A.; Alioto, M. Design-Space Exploration of Pareto-Optimal Architectures for Deep Learning with DVFS. In Proceedings of the 2018 IEEE International Symposium on Circuits and Systems (ISCAS), Florence, Italy, 27–30 May 2018; pp. 1–5. [Google Scholar] [CrossRef]
- Lee, E.; Parks, T. Dataflow process networks. Proc. IEEE 1995, 83, 773–801. [Google Scholar] [CrossRef] [Green Version]
- Pimentel, A.D. Exploring Exploration: A Tutorial Introduction to Embedded Systems Design Space Exploration. IEEE Des. Test 2017, 34, 77–90. [Google Scholar] [CrossRef]
- Meloni, P.; Loi, D.; Deriu, G.; Pimentel, A.D.; Sapra, D.; Moser, B.; Shepeleva, N.; Conti, F.; Benini, L.; Ripolles, O.; et al. ALOHA: An Architectural-aware Framework for Deep Learning at the Edge. In Proceedings of the Workshop on INTelligent Embedded Systems Architectures and Applications, INTESA ’18, Turin, Italy, 13–18 October 2018; ACM: New York, NY, USA, 2018; pp. 19–26. [Google Scholar] [CrossRef]
- Goodfellow, S.; Goodwin, A.; Eytan, D.; Greer, R.; Mazwi, M.; Laussen, P. Towards understanding ECG rhythm classification using convolutional neural networks and attention mappings. In Proceedings of the 3rd Machine Learning for Healthcare Conference, Palo Alto, CA, USA, 17–18 August 2018. [Google Scholar]
Work/Framework | CNN Workload Partitioning and Mapping | Runtime | |
---|---|---|---|
Kernel-Level | Layer-Level | ||
ARM-CL [34] | √ | ||
tengine [35] | √ | ||
NCNN [36] | √ | ||
Pipe-it [37] | √ | √ | |
[38] | √ | √ | |
Our work | √ | √ | √ |
Work | DVFS Constraints | DVFS on CNN | Dynamic Partitioning and Mapping | |
---|---|---|---|---|
Temperature | Energy | |||
[39] | √ | |||
[40] | √ | |||
[41] | √ | |||
[42] | √ | |||
[43] | √ | |||
[44] | √ | |||
[45] | √ | √ | ||
[46] | √ | √ | ||
[47] | √ | √ | ||
[48] | √ | √ | ||
Our work | √ | √ | √ |
Function Name | Description |
---|---|
sleep() | It is invoked by the DSP that intends to go to sleep; once invoked, the DSP is placed in a low-power state and remains in this state until it receives a wake-up signal. |
wakeup(…) | Once this function is invoked, a wake-up signal is sent to the specified core. |
Function Name | Description |
---|---|
mutex_init(…) | Initialization of the mutual exclusion. |
mutex_lock(…) | Request mutual exclusion. |
mutex_unlock(…) | Release mutual exclusion. |
Function Name | Description |
---|---|
rpc_call(…) | Execute a function passed as an input on a remote processor. From the inputs, it is possible to choose whether rpc_call is blocking or non-blocking. |
rpc_check(…) | Check the execution status of a certain function call on a specific core (non-blocking). |
rpc_wait(…) | Wait for the conclusion of a function on a specific core (blocking). |
Input Parameter | Description |
---|---|
int flags | The first parameter specifies how the function is executed. Between the two mainpossibilities we find: - RPC_SYNC, request will be blocking until completion. - RPC_ASYNC, request will be executing asynchronously. |
void *func | It’s the pointer to the function to be executed on the specified core. |
int core_caller | Indication of the core that executed the rpc_call function. |
int core_helper | Core on which the pointed function will be executed. |
int n_parameters | Number of parameters that the pointed function takes as input. |
int *ret | Pointer to the return variable of the specified function. |
varargs | Input parameters to the previously specified function. |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Scrugli, M.A.; Meloni, P.; Sau, C.; Raffo, L. Runtime Adaptive IoMT Node on Multi-Core Processor Platform. Electronics 2021, 10, 2572. https://doi.org/10.3390/electronics10212572
Scrugli MA, Meloni P, Sau C, Raffo L. Runtime Adaptive IoMT Node on Multi-Core Processor Platform. Electronics. 2021; 10(21):2572. https://doi.org/10.3390/electronics10212572
Chicago/Turabian StyleScrugli, Matteo Antonio, Paolo Meloni, Carlo Sau, and Luigi Raffo. 2021. "Runtime Adaptive IoMT Node on Multi-Core Processor Platform" Electronics 10, no. 21: 2572. https://doi.org/10.3390/electronics10212572
APA StyleScrugli, M. A., Meloni, P., Sau, C., & Raffo, L. (2021). Runtime Adaptive IoMT Node on Multi-Core Processor Platform. Electronics, 10(21), 2572. https://doi.org/10.3390/electronics10212572