1. Introduction
Efficient energy management is very challenging in Internet of Things (IoT) edge devices [
1]. On one hand there is the increasing demand of more near-sensor computing capabilities, on the other hand, strict constraints have to be set on the power consumption to maximize the lifetime of an IoT node, which is in many cases battery-supplied.
Researchers have responded to this challenge by proposing new SoC architectures, e.g., parallel- ultra-low-power multi-core computing platforms, and power management strategies [
2]. Similar approaches are proposed also by industry, e.g., by adopting heterogeneous multi-core architectures where a low-power control core (e.g., Arm® Cortex®-M4 core (Arm and Cortex are registered trademarks of Arm Limited (or its subsidiaries) in the US and/or elsewhere)) with modest computing capabilities is coupled with a more capable core (e.g., Arm Cortex-A7 core) for compute-intensive tasks.
In this scenario, well-known techniques such as clock-gating and power-gating are widely used to minimize the power consumption of inactive sub-modules of an SoC. However, restoring the functionality of SoC subsystems that are clock-gated or power-gated during energy saving states could require a non-negligible amount of time and energy. Additionally, in many event-driven applications, requiring fast time response or continuous event processing, duty-cycling computation phases, or even occasionally entering a sleep state can be unfeasible.
Shut-down-based power management can be complemented by more advanced energy saving techniques such as Dynamic Voltage and Frequency Scaling (DVFS). DVFS is the process of adapting, at run-time, the frequency and the supply voltage of a digital circuit; typically, this is obtained in a closed loop regulation, using as a feedback parameters such as core workload or desired core idleness. The dynamic power consumption of digital circuits has linear dependency from the frequency, and it is in a quadratic relationship with the supply voltage. Therefore, such a technique is very effective to significantly reduce the energy consumption of digital circuits.
Unfortunately, the hardware and software infrastructure for DVFS available in heterogeneous edge devices is not yet as mature as the one integrated into many high-end and mobile multi-core application processors. Such development is not trivial since, in sharp contrast with high-end multi-cores, dual-core MCUs have most of the times heterogeneous ISAs, non-uniform memory hierarchy and neither cache coherency nor shared virtual memory support.
The NXP i.MX 7ULP is a low-power low-cost Arm Cortex-M4/Cortex-A7-based SoC which belongs to this device category. The chip has been fabricated in 28 nm Fully Depleted Silicon On Insulator (FD-SOI) technology, and it features an advanced power management infrastructure that enables dynamic power mode adaptation. The main contribution of this paper is the development and qualification of the first (to our knowledge) DVFS-based power management software infrastructure for this exemplary heterogeneous dual-core MCUs. Our power manager achieves up to 45% of power consumption reduction during the active state of the SoC, without any penalty on the application execution performance.
The remainder of this paper is organized as follows.
Section 2 gives an overview of the related work.
Section 3 describes the i.MX 7ULP SoC.
Section 4 describes the power manager. In
Section 5 we describe the experimental setup used to characterize the power manager.
Section 6 presents the results in terms of power consumption reduction, while
Section 7 provides concluding remarks.
2. Related Work
Among several techniques to increase the energy efficiency, Dynamic Voltage and Frequency Scaling (DVFS) represents a well established, effective [
3] and low-complexity strategy. Despite technological scaling, which reduces the margins for supply voltage regulation, DVFS still represents a viable energy saving techniques for modern platforms operating near threshold [
4]. Those platforms reach high energy efficiencies thanks to a combination of voltage scaling and adaptive body biasing to avoid performance degradation [
5,
6]. In [
7,
8], authors present two approaches where DVFS is controlled by dedicated hardware units on embedded microprocessors. Similarly, in [
9] DVFS is applied on high-end CPUs for cloud infrastructures.
More sophisticated techniques rely on gathering information on the status of the system to guarantee a desired quality of service. In [
10], authors exploit a combination of DVFS and workload profiling in a closed loop configuration.In [
11,
12], to optimize the efficiency of Linux-based systems, DVFS is operated in combination with advanced power-aware scheduling algorithms that exploit a combination of system-level statistics. Finally, in [
13] machine learning algorithms are used to gather higher level information from the applications running on the core [
13].
In this paper, we propose a power management unit implementation that takes inspiration from the power management policies surveyed above and adopted on many Linux-capable application processors. The proposed power manager can perform DVFS on a new low-cost low-power ARM A7-M4 SoC architecture for IoT edge (e.g., the NXP i.MX 7ULP SoC). Compared to the available power management implementations [
14], our approach targets lower-end devices where techniques such as DVFS are typically not applied because of the lack of hardware/software support. As with what happens in high-end CPUs based systems, the proposed power management gathers system level information, e.g., the idleness of the system, and exploit this information to optimize the use of available energy. The regulation of the power mode is operated by a software implemented power manager running in a low-power real-time domain of the SoC. Unlike application-specific DVFS techniques used in special-purpose accelerators, our approach relies on minimal knowledge of the system, since it exploits very high-level information such as the CPU idle time, making it suitable to be ported on different platforms with similar hardware features.
3. The i.MX 7ULP Platform
i.MX 7ULP is a new low-cost low-power Arm Cortex-M4/Cortex-A7-based SoC architecture for IoT edge. The chip has been developed by NXP and fabricated in 28 nm FD-SOI technology. The architecture of the microprocessor is divided into two processing domains hosted by two separate power domains. (i) The application domain is built around an Arm Cortex-A7 core. This processing domain can boot Linux-based embedded operating systems, offering high flexibility for application development. The main target of this processing domain are general purpose and computationally demanding workloads. (ii) The real-time domain is built around an Arm Cortex-M4 core. This domain targets low-power low-complexity sections of an application that presents strict requirements in terms of predictable tasks execution time, e.g., timing-critical tasks such as sensor sampling and IO event handling.
The SoC features flexible hardware support for power management. Each of the two power domains has a dedicated register set that can be used to store pre-defined parameters such as supply voltage, clock frequency divider and body-biasing. The switching between pre-configured power modes is governed by a dedicated System Mode Controller (SMC).
4. Power Manager
The goal of the power manager proposed in this paper is to minimize the power consumption when the system is in an active state, all the buses are clocked, and cores have complete access to all peripherals. In the rest of the paper, we will refer to this system configuration as RUN power state. The power manager operates an opportunistic voltage and frequency scaling on the application power domain to reduce the energy consumption when the average workload of the A7 core is lower than 100%. In this context, a statically selected power mode, tailored for a worst-case workload, is inefficient, and the same task could be executed operating the core at a lower voltage and frequency.
To limit the intrusiveness of the approach, and to have a deterministic time response to workload variations, we implemented the power manager on the real-time application domain. More specifically, the power manager is part of the firmware application running on the M4 core, which serves as a software Power Management Unit (PMU).
Figure 1 illustrates the block diagram of the power manager and the main sub-modules.
To operate the DVFS, we statically selected the RUN power mode. In this power state, the system uses the main PLL to generate a reference clock frequency in the range from 800 MHz to 300 MHz; the forward body-bias can be applied to the transistors, the RAM is powered on, and the system can tolerate DVFS. In this scenario, we introduced multiple Virtual Power Modes (VPM). Each VPM is a set of values that the power manager uses to override configuration values for Voltage, clock Frequency and Body-bias activation of the RUN power mode. The number of VPMs can be arbitrarily configured by the user, their configuration values are stored in the M4 data memory.
The criterion on which the VPM of the system is switched is the idleness of the A7 core. This information is obtained directly from the operating system CPU statistics. The power manager compares the measured idleness with a user-specified target idleness, and determines how to adjust the performance of the system to match the target idleness. Algorithm 1 describes the performance adjustment procedure. The idleness observation time can vary in a range from 100 ms to 5 s depending on the application requirements.
Algorithm 1:This pseudo-code procedure illustrates the steps that are performed by the power manager at every iteration of the power mode regulation. |
- 1:
while !new power mode requestdo - 2:
wait for a new request - 3:
end while - 4:
if target idleness > curr. idleness then - 5:
increase supply voltage - 6:
while !Supply voltage change do - 7:
wait - 8:
end while - 9:
increase clock freq. - 10:
while !clock freq. change do - 11:
wait - 12:
end while - 13:
else - 14:
decrease clock freq. - 15:
while !clock freq. change do - 16:
wait - 17:
end while - 18:
decrease supply voltage - 19:
while !Supply voltage change do - 20:
wait - 21:
end while - 22:
end if - 23:
power mode transition complete
|
4.1. Real-Time Domain
The kernel of the power manager is part of the firmware running on the real-time domain, and it is constituted by 4 main sub-modules that operate synchronously to perform (i) the idleness acquisition from the application domain, (ii) the target versus current idleness comparison, (iii) the power mode selection and the power mode configuration registers modification, (iv) the voltage and the frequency scaling.
4.2. Application Domain
The power manager uses the idleness of the core in a given time interval preceding the power mode regulation as a feedback signal. This information is made available by the operating system that collects statistics on the use of the core. To retrieve this information and transfer it to the real-time domain, we developed (i) a module to track the idleness and (ii) a Linux-driver module that acts as an interface with the power manager.
5. Experimental Setup
In this section, we describe the experimental setup for the measurements presented in
Section 6. In our experiments, we focused on the dynamic reduction of the energy consumption of the application power domain only, since it represents more than 90% of the power consumption of the entire SoC. Operating frequency and voltage of the real-time power domain have been statically selected as the minimum values that allow accomplishing the power management task.
In our power management implementation, we introduced six virtual power modes available for dynamic power mode selection.
Table 1 reports the configuration values related to the core supply voltage and clock frequency of every VPM. Supply voltages have been arbitrarily chosen according to a quadratic curve, operating frequencies have been selected as the maximum ones achievable by the core at the related supply voltages. To maximize performance, forward body-bias is active in all VPMs [
15]. To verify that the system was able to operate at a given frequency, we executed several benchmarks on the core, selected among those provided by the
stress-ng benchmark suite [
16].
The power mode regulation period has been chosen as 5 s. This value can be modified to adjust responsiveness of the controller to the idleness variation, the minimum regulation period is approximately of 100 ms; this value is bounded by the communication latency with the external voltage regulator hosted on the i.MX 7ULP evaluation board and does not constitute an intrinsic limitation of the i.MX 7ULP system.
To fully characterize the power manager and evaluate the benefits in terms of power saving we performed the following measurements on the i.MX 7ULP evaluation board:
(i) we fixed the VPM, and we measured its power consumption. We performed the measurements for each of the VPMs reported in
Table 1, and, at each VPM, we swept the CPU usage in a range of 5% to 95%. The desired load of the core has been precisely changed by varying the number of tasks launched in a fixed time period of 5 s. The use information of the core has been obtained by accessing the core use and idleness statistic made available by the Linux kernel. In our experiment, we verified that by executing a small number of tasks the core use was below 5%, and we linearly increased the number of task launched on the core within the reference period until the core use reached 95%.
(ii) we evaluated the power consumption at different workload levels when the controller is adjusting the power mode. We analyzed the response of the power manager when the target idleness was in a range of 5% to 95%, and similarly to the previous case, the workload on the core was swept in a range of 5% to 95%.
Section 6 shows the results of the two experiments; the power consumption has been measured at the evaluation board 5 V power supply connector. The measurement conditions are the following:
Operating temperature T = 25 °C
Test benchmark for operating frequency validation and power estimation = stress-ng test suite
Power measurement equipment = Keysight N6705B power analyzer (time resolution of 20 s)
6. Results
In this section, we present the results in terms of power consumption reduction when the power manager is active.
Figure 2a shows the power consumption of the i.MX 7ULP evaluation board when a VPM is fixed. From the static analysis, we can conclude that the dynamic power mode adaptation allows saving up to 28% of power when the system is in RUN mode, and in absence of core activity.
The power consumption reported in the plot includes the power consumption of other components soldered on the board, e.g., the external regulator. The only contribution excluded from this analysis is the power consumption of the on-board external DRAM chip.
In the second experiment we measured the power consumption of the board when the power manager is regulating the VPM.
Figure 2b reports the power consumption when the workload on the core increases linearly from from 5% (leftmost part of the plot) to 95% (rightmost part of the plot). The goal of this experiment is show, at run-time, how the PMU can promptly adapt the VPM to the workload scenario, without reducing the number of tasks completed in a reference time interval, and eventually settling on a desired average idleness level.
Figure 2b shows that regardless of the selected target idleness, tasks can be released always with the same timing on the core. During the dynamic VPM adaptation, the actual core use depends on the power mode selected by the controller, which tries to restore the target idleness value. In this operating mode, the time to complete a tasks is not known in advance, because the CPU operates at a variable frequency. Please note that the quality of service, which in this context is represented by meeting the task deadlines is not degraded when the VPM is dynamically changed (i.e., the core computational bandwidth is never saturated). The
Figure 2b reports that the average power consumption is always equal or lower than the case where a VPM is statically selected. Always in
Figure 2b, it can be observed that the target idleness specified by the user affects the power manager behavior.
Table 2 summarizes the peak power consumption reduction for several user-specified values of target idleness.
From the results reported in
Table 2, it is evident that when the target idleness is too high with respect to the actual core workload, the power manager cannot explore the entire set of available states. On the contrary, in all those cases where the power manager can operate the system at the lowest power modes (Target idleness = 50%, 30%, 10%), it is possible to achieve more than 43% of average power consumption reduction and a peak power consumption reduction of 45%. The power consumption improvement reported in our measurements refers to the case where VPM0 was statically selected, and the core executed the same workload.
In our experiment, we assumed that the number of operation performed at each task execution is constant. In
Figure 2b, we observe that every 5 s, a new set of task is released on the core, originating a spike in the power trace caused by the core wake-up. At each experiment run, which was performed with a different target idleness, the core never saturated the computational bandwidth. Therefore, the number of task executed by the core was constant for all the experiments, while the power consumption significantly decreased, resulting in less energy consumed by the core.
7. Conclusions
In this paper, we presented the implementation of a power management unit capable of operating an idleness aware dynamic power mode adaptation. We validated the proposed approach on a novel low-cost low-power heterogeneous MCU for IoT edge fabricated in 28 nm SOI technology. The proposed power manager allows achieving a theoretical power consumption reduction higher than the 28% in absence of core activity. We confirmed the expectations during the operation of the system, reporting more than 43% of average power consumption reduction when the target idleness is selected according to the real idleness of the application, and a peak power consumption reduction of 45%.