Next Article in Journal
A Comparative Study of Stochastic Model Predictive Controllers
Previous Article in Journal
Audio-Based Aircraft Detection System for Safe RPAS BVLOS Operations
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Task-Level Aware Scheduling of Energy-Constrained Applications on Heterogeneous Multi-Core System

1
Institute of VLSI Design, Zhejiang University, Hangzhou 310027, China
2
Digital Grid Research Institute, Artificial Intelligence and Chip Application Research Department, CSG, Guangzhou 510623, China
3
Electric Power Research Institute, China Southern Power Grid (CSG), Guangzhou 510623, China
4
Hangzhou Sec-Chip Technology Co., Ltd., Hangzhou 310027, China
*
Author to whom correspondence should be addressed.
Electronics 2020, 9(12), 2077; https://doi.org/10.3390/electronics9122077
Submission received: 10 November 2020 / Revised: 1 December 2020 / Accepted: 3 December 2020 / Published: 5 December 2020
(This article belongs to the Section Computer Science & Engineering)

Abstract

:
Minimizing the schedule length of parallel applications, which run on a heterogeneous multi-core system and are subject to energy consumption constraints, has recently attracted much attention. The key point of this problem is the strategy to pre-allocate the energy consumption of unscheduled tasks. Previous articles used the minimum value, average value or a power consumption weight value as the pre-allocation energy consumption of tasks. However, they all ignored the different levels of tasks. The tasks in different task levels have different impact on the overall schedule length when they are allocated the same energy consumption. Considering the task levels, we designed a novel task energy consumption pre-allocation strategy that is conducive to minimizing the scheduling time and developed a novel task schedule algorithm based on it. After getting the preliminary scheduling results, we also proposed a task execution frequency re-adjustment mechanism that can re-adjust the execution frequency of tasks, to further reduce the overall schedule length. We carried out a considerable number of experiments with practical parallel application models. The results of the experiments show that our method can reach better performance compared with the existing algorithms.

1. Introduction

Computer systems nowadays must perform much better than ever before, ensuring the simultaneous running of many applications. By using heterogeneous multi-core processors and increasing the number of processor cores, it is possible to improve the performance while keeping energy consumption at the bay [1,2,3,4,5]. From small, embedded devices to large data centers, heterogeneous multi-core systems have been widely used. It is expected that in the near future, the number of heterogeneous processors and cores in these systems will increase dramatically [6,7,8]. On the other hand, although the performance of such systems has been greatly improved, the power consumption is also increasing. Huge energy consumption has caused various problems, such as economy, environment, technology and so on [9,10,11,12,13]. Therefore, energy consumption is one of the main design constraints for such heterogeneous multi-core systems. A well-known typical mechanism for reducing power consumption of computing systems is dynamic voltage and frequency scaling (DVFS), which is realized to achieve the balance between energy consumption and performance by reducing the power supply voltage and frequency of the processor at the same time, when the task is running [1,14,15,16,17,18,19]. Therefore, energy consumption constrained task scheduling on processors with adjustable voltage and frequency has attracted extensive research [20,21,22,23].
In the existing research, works related to the energy consumption of heterogeneous multi-core processors, some of the objectives are to minimize the energy consumption under certain performance requirements. Others are to minimize or maximize other indicators, such as performance and reliability, under certain energy consumption constraints [24,25,26,27,28,29,30,31]. Our studies fall into the second category. To be more precise, we study the following problem: minimizing the schedule length of energy consumption constrained parallel applications on heterogeneous multi-core systems. There have been some studies in recent years that are consistent with our goals. In order to determine the energy consumption constraint of each task, the common method in previous studies is to pre-allocate the energy consumption of the unscheduled tasks [20,21,22,23]. Therefore, these studies focus on how to allocate the overall energy consumption constraints to each task reasonably. Xiao et al. [20] came up with an algorithm called MSLECC (minimizing schedule length of energy consumption constrained). It takes the minimum energy consumption of a task as its pre-allocation energy consumption. The drawback of this method is that it is unfair to low priority tasks. High priority tasks take up too much energy consumption, so that low priority tasks can only be allocated to low-power processor cores and close to the lowest execution frequency due to the energy consumption constraint, which will lead to an increase in the overall schedule length. After that, some studies improved the energy pre-allocation method, so that the energy pre-allocation is fair to each task. One study pre-allocated the energy consumption according to the task execution time proportion [21], another study used the average energy consumption as the pre-allocation energy consumption of each task [22], and another study used a defined task energy consumption weight value as the pre-allocation energy consumption of tasks, whose algorithm is called ISAECC [23]. However, it is not the best practice to treat each task fairly and the task level is an important issue when pre-allocating energy consumption to the tasks. We pre-allocate more energy consumption to the tasks in the task levels with fewer tasks, because when their execution time becomes shorter, the overall scheduling length is more likely to decrease. In addition, previous articles ignored the negative impact of local optimal scheduling algorithm. Therefore, we develop a task execution frequency re-adjustment mechanism that also uses DVFS to further reduce the schedule length. To summarize, the main contributions of this article are as follows.
  • We design a novel energy pre-allocation strategy considering task level and prove its feasibility.
  • We develop a novel scheduling algorithm to minimize the schedule length under energy consumption constraints based on the new energy pre-allocation strategy.
  • We introduce a frequency re-adjustment mechanism after task scheduling to reduce the negative impact of local optimization.
  • We evaluate our algorithm based on real parallel applications. The experimental results consistently prove the superiority and competitiveness of our algorithm.
The structure of this article is as follows. Section 2 reviews some existing studies that are relevant to us. Section 3 gives some preliminaries related to the problem of minimizing the schedule length for energy consumption constrained parallel applications. In Section 4, we present our approach for this problem. In Section 5, we discuss and analyze the experimental results. Finally, we conclude the paper in Section 6.

2. Related Work

Energy saving design technology based on DVFS was first proposed by [1]. Nowadays, DVFS has been widely used in multi-core task scheduling problems related to energy consumption. Reference [2] studied the problem of minimizing the schedule length of independent sequential applications with energy consumption constraints. In Reference [29], the task scheduling problem with energy consumption constraints is considered as a combinatorial optimization problem. In Reference [31], the authors consider three constraints (energy consumption, deadline and reward). These studies are mainly focused on homogeneous systems, so they are different from our study.
In addition to the above research, many researchers have studied the task scheduling problem on heterogeneous multi-core systems. For example, reference [32] proposed an energy-saving workflow task scheduling algorithm based on DVFS. Huang et al. [33] proposed an enhanced energy-saving scheduling algorithm to minimize energy consumption under the condition of satisfying a certain performance level. Rusu et al. [31] added constraints and proposed an efficient algorithm for minimizing energy consumption under multiple constraints. The goal of these studies is generally contrary to ours. We study the minimization of the schedule length under energy consumption constraints, while those studies focus on minimizing energy consumption under other constraints.
There are also a lot of excellent studies that are closely related to our study. For example, a representative paper proposed the classical Heterogeneous Earliest Finish Time (HEFT) algorithm, which was developed to minimize the schedule length in heterogeneous multicore systems. The application model and energy consumption model they use are consistent with ours, but they do not consider the constraints of energy consumption. At present, the studies close to us should be the four articles mentioned in Section 1 [20,21,22,23]. They pre-allocate energy consumption to each task according to the minimum energy consumption of the task [20], the execution time ratio of the task [21], the overall average energy consumption [22], or the defined task energy consumption weight [23]. What is not considered in the above studies is that different task hierarchies have different impacts on the overall schedule length. Moreover, they ignored the negative impact of the local optimal characteristics of scheduling algorithm on the schedule length. Based on this, we propose a novel energy pre-allocation strategy considering task level, and, in order to reduce the negative impact of local optimal characteristics, we develop a scheduling task execution frequency re-adjustment mechanism. Finally, our method achieves better performance than previous studies.

3. Models and Preliminaries

In this section, we first introduce the application model (Section 3.1), then the energy consumption model (Section 3.2), and next we describe in detail the issues that need to be addressed (Section 3.3). Finally, we briefly introduce the current situation and reveal its limitations (Section 3.4). Table 1 shows the main notations we use.

3.1. Application Model

As in previous studies [20,21,22,23,24,34], we also use directed acyclic graph (DAG) to represent parallel application models. As for the processor cores, we define U = { u 1 , u 2 , , u k , , u | U | } to represent a collection of processor cores, where | U | is defined as the number of processor cores. Note that for any set X , we use | X | to denote its size. We define the DAG application model as G = { N , W , C } . N = { n 1 , n 2 , , n i , , n | N | } represents the set of nodes in the graph, that is, the set of tasks in the application. Due to the heterogeneous nature of the processor, the execution time of n i N on different processor cores is different. W refers to a matrix with size | N | × | U | , where w i , k denotes the execution time for n i to run on u k with the maximum frequency. C denotes the weight of edges between connected nodes in DAG, that is, the communication time between tasks. c i , j C represents the communication time from n i to n j . If c i , j = 0 , it means there is no communication from n i to n j . We define p r e d ( n i ) and s u c c ( n i ) as the set of direct predecessor tasks and the set of direct successor tasks of task n i . For example, p r e d ( n 2 ) = { n 1 } and s u c c ( n 2 ) = { n 8 , n 9 } . We define n e n t r y and n e x i t as the entry task and the exit task of an application. In Figure 1, n e n t r y and n e x i t are n 1 and n 10 .
Figure 1 shows an example of a parallel application based on DAG with ten tasks. Each node in Figure 1 represents a task, and the values on the edges between connecting nodes represent the communication time between the two nodes if they are not assigned to the same processor core. For example, the value 18 on the edge between n 1 and n 2 indicates that the communication time between n 1 and n 2 is 18.
Assuming that there are three heterogeneous processor cores { u 1 , u 2 , u 3 } in the system, Table 2 shows the execution time of the tasks in Figure 1 running on each processor core with the maximum frequency. For example, the first number 14 in Table 2 indicates that the execution time of n 1 running on u 1 with the maximum frequency is 14.

3.2. Energy Model

In DVFS technology, the relationship between supply voltage and operating frequency is almost linear. Therefore, DVFS will also adjust the supply voltage when adjusting the clock frequency. Similar to [20,21,22,23], we use frequency regulation to indicate simultaneous regulation of supply voltage and frequency. In this article, we use the same energy model as the references [20,21,22,23]. Therefore, the calculation formula of system power consumption with respect to frequency is as follows:
P ( f ) = P s + h ( P i n d + P d ) = P s + h ( P i n d + C e f f m ) .
In the above equation, P s denotes static power and can only be removed when the system is completely powered down. P i n d is a constant that represents the frequency-independent dynamic power, that is, it corresponds to power independent of CPU processing speed. P d denotes frequency-dependent power, including the power primarily consumed by the CPU and any power that depends on the system processing frequency f . h denotes the system state, specifically, h = 1 means the system is active and the application is executing; h = 0 means the system is in the sleep mode or powered down. C e f denotes the effective capacitance and m denotes the dynamic power exponent and is no smaller than 2. C e f and m are constants related to the processor system.
Our study is in the active state of the system ( h = 1 ), so dynamic power consumption is the main part of the whole energy consumption. Considering the unmanageability of static power consumption, this article, like references [20,21,22,23], does not consider static power consumption. Therefore, the calculation formula of system power consumption in this article becomes the following equation:
P ( f ) = P i n d + C e f f m .
Due to the heterogeneity of processors, each processor should have its own parameters. Assuming that the frequency range of the processor u k is from the lowest frequency f m i n to maximum frequency f m a x temporarily, we define the following sets of parameters:
  • The set of P i n d : { P 1 , i n d , P 2 , i n d , , P | U | , i n d } ;
  • The set of P d : { P 1 , d , P 2 , d , , P | U | , d } ;
  • The set of C e f :   { C 1 , e f , C 2 , e f , , C | U | , e f } ;
  • The set of m : { m 1 , m 2 , , m | U | } ;
The set of execution frequency:
{ { f 1 , m i n , f 1 , a , , f 1 , m a x } , { f 2 , m i n , f 2 , a , , f 2 , m a x } , , { f | U | , m i n , f | U | , a , , f | U | , m a x } } .
The execution time of the task n i on the processor core u k with the frequency f k , h can be obtained by the following equation:
w i , k , h = w i , k × f k , m a x f k , h .
Then the energy consumption E ( n i ,   u k ,   f k , h ) of the task n i on the processor core u k with the frequency f k , h can be obtained by the following equation:
E ( n i ,   u k ,   f k , h ) = ( P k , i n d + C k , e f × ( f k , h ) m k ) × w i , k × f k , m a x f k , h .
Therefore, the energy consumption of application G will be
E ( G ) = i = 1 | N | E ( n i , u p r ( i ) , f p r ( i ) , h z ( i ) ) .
As a result of the P i n d , E is not monotonic with f and less f does not always result less energy. Therefore, we can get the minimum value of energy-effective frequency by finding the minimum value of Equation (4). Similar to [20,21,22,23], we define the minimum value of energy-effective frequency as f e e . After calculation, we can get that
f e e = P i n d ( m 1 ) C e f m .
When the execution frequency is less than f e e , it is meaningless to continue to reduce the frequency, because this will increase energy consumption. Therefore, the range of execution frequency variation is [ f l o w , f m a x ] , where f l o w = max ( f m i n , f e e ) . The new set of execution frequency becomes as follows:
  { { f 1 , l o w , f 1 , a , , f 1 , m a x } , { f 2 , l o w , f 2 , a , , f 2 , m a x } , , { f | U | , m i , f | U | , a , , f | U | , m a x } } .

3.3. Preliminaries

3.3.1. Problem Description

The problem to be solved in this study is to assign a suitable frequency and processor core to each task, and minimize the schedule length of the application under the condition that the energy consumption of the application does not exceed the energy consumption constraint [35,36,37,38].
First, we define the earliest start time (EST) and the earliest finish time (EFT) of tasks. Given a task n i executed on processor u k , its earliest start time (EST) is denoted as E S T ( n i , u k ) , which is computed as
E S T ( n i , u k ) = m a x ( a v a i l [ k ] , max n j p r e d ( n i ) { A F T ( n j ) + c i , j } ) .
where a v a i l [ k ] is the earliest available time of processor core u k , that is, all tasks executed on processor core u k have been completed, and processor core u k is ready to execute new tasks. A F T ( n j ) is the actual finish time of task n j . c i , j is the actual communication time between task n i and n j . If n i and n j are assigned to the same processor core, c i , j = 0 ; otherwise, c i , j = c i , j .
The earliest finish time (EFT) of task n i executed on processor u k with frequency f k , h is the earliest start time plus the execution time of task n i , which is computed as
E F T ( n i , u k , f k , h ) = E S T ( n i , u k , f k , h ) + w i , k × f k , max f k , h .
We define   S L ( G ) as the schedule length of application, where
S L ( G ) = A F T ( n e x i t ) .
We define E g i v e n ( G ) as the given energy consumption constraint of application G. Therefore, the problem to solve can be expressed as minimizing S L ( G ) while
E ( G ) = i = 1 | N | E ( n i , u p r ( i ) , f p r ( i ) , h z ( i ) ) E given ( G ) .
where u p r ( i ) denotes the processor core assigned to the task n i , and f p r ( i ) , h z ( i ) denotes the execution frequency assigned to the task n i .

3.3.2. Effective Range of Energy Consumption Constraint

Since the execution time of each task on each processor core is known, we can obtain the minimum and maximum energy consumption of n i represented by E m i n ( n i ) and E m a x ( n i ) respectively by traversing all processors. E m i n ( n i ) and E m a x ( n i ) perform task n i at minimum and maximum frequencies, respectively. The equations are as follows:
E m i n ( n i ) = min u k U E ( n i , u k , f k , m i n ) ,
E m a x ( n i ) = max u k U E ( n i , u k , f k , m a x ) .
Therefore, the minimum and maximum energy consumption of application G can be computed as follows:
E m i n ( G ) = i = 1 | N | E m i n ( n i ) ,
E m a x ( G ) = i = 1 | N | E m a x ( n i ) .
It should be noted that the given energy consumption constraint has a reasonable range. If E g i v e n ( G ) < E m i n ( G ) , the energy consumption constraint can never be satisfied; if E g i v e n ( G ) > E m a x ( G ) , the energy constraint can always be met. Both of the above situations are unreasonable, so the reasonable range of the given energy consumption constraint is E m i n ( G ) E g i v e n ( G ) E m a x ( G ) .

3.3.3. Task Priority Determination

Before scheduling, we need to determine the priority of tasks. Similar to [20,21,22,23], we use the upward rank value ( r a n k u ) as the criterion to determine the priority of tasks. r a n k u is defined as follows:
r a n k u ( n i ) = k = 1 | U | w i , k | U | + max n i succ ( n i ) { c i , j + r a n k u ( n j ) } .
The priority of tasks is sorted in descending order of r a n k u , that is, the higher r a n k u of a task, the higher the priority of it. Table 3 shows the upward rank values of all the tasks in Figure 1. Therefore, the task priority list of the application in Figure 1 will be { n 1 , n 3 , n 4 , n 2 , n 5 , n 6 , n 9 , n 7 , n 8 , n 10 } .

3.4. The ISAECC Method

In this subsection, we review the existing method that is closest to us and reveal its limitations.

3.4.1. Method Description

The ISAECC method is proposed in [23], and it consists of several major steps:
  • It prioritizes tasks by using the upward rank value which is defined in Section 3.3.3;
  • It uses a self-defined energy consumption weight value to give each unscheduled task a pre-allocated energy consumption;
  • It calculates the energy consumption constraint of each task according to the given energy consumption constraint of the whole application and the pre-allocated energy consumption value of each task;
  • According to the order of the task priority list, it assigns each task the processor core and frequency that can minimize its EST time by traversing each processor core and optional execution frequency.
For the sake of generality, we use { n o ( 1 ) , n o ( 2 ) , , n o ( | N | ) } to represent the task priority order. Assuming that the currently scheduled task is n o ( j ) , then the scheduled task set is { n o ( 1 ) , n o ( 2 ) , , n o ( j 1 ) } and the unscheduled task set is { n o ( j + 1 ) , n o ( j + 2 ) , , n o ( | N | ) } . Therefore, when scheduling the task n o ( j ) , the overall energy consumption of application G can be expressed as
E o ( j ) ( G ) = x = 1 j 1 E ( n o ( x ) , u p r ( o ( x ) ) , f p r ( o ( x ) ) , h z ( o ( x ) ) ) + E ( n o ( j ) ) + y = j + 1 | N | E p r e ( n o ( y ) ) ,
where E p r e ( n o ( y ) ) denotes the pre-allocation energy consumption of task n o ( y ) .
According to the energy consumption constraints shown in Equation (8), we can get
E o ( j ) ( G ) E g i v e n ( G ) .
Therefore, we can get
E ( n o ( j ) ) E g i v e n ( G ) x = 1 j 1 E ( n o ( x ) , u p r ( o ( x ) ) , f p r ( o ( x ) ) , h z ( o ( x ) ) ) y = j + 1 | N | E p r e ( n o ( y ) ) .
Let the energy consumption constraint of task n o ( j ) be
E g i v e n ( n o ( j ) ) = E g i v e n ( G )   x = 1 j 1 E ( n o ( x ) , u p r ( o ( x ) ) , f p r ( o ( x ) ) , h z ( o ( x ) ) ) y = j + 1 | N | E p r e ( n o ( y ) ) .
With Equation (17), we only need to consider the energy consumption constraint of each task which is shown as follows:
E ( n o ( j ) ) m i n { E g i v e n ( n o ( j ) ) , E m a x ( n o ( j ) ) } .
Therefore, the key problem is how to determine the pre-allocation energy consumption ( E p r e ( n o ( j ) ) ) of each task. The central idea of the method ISAECC used is to pre-allocate the energy consumption for unscheduled tasks by a weight mechanism. First, they define the improvable energy of application G called E i e ( G ) , which is computed as
E i e ( G ) = E g i v e n ( G ) E m i n ( G ) .
Then, they define E a v e ( n i ) and E a v e ( G ) as the energy consumption level of task n i and the energy consumption level of application G. E a v e ( n i ) and E a v e ( G ) are computed as
E a v e ( n i ) = E m i n ( n i ) + E m a x ( n i ) 2 ,
E a v e ( G ) = E m i n ( G ) + E m a x ( G ) 2 .
Next, they define e l ( n i ) as the weight of energy consumption level of task n i , which is computed as
e l ( n i ) = E a v e ( n i ) E a v e ( G ) .
After that, they calculated the pre-allocated energy consumption for task n i as follows:
E p r e ( n i ) = m i n { E w a ( n i ) , E m a x ( n i ) }
where E w a ( n i ) is computed as
E w a ( n i ) = E i e ( G ) × e l ( n i ) + E m i n ( n i ) .
After determining the pre-allocated energy consumption of each task, the task scheduling can be completed according to steps 3 and 4, described at the beginning of this section (Section 3.4.1).

3.4.2. Limitations of ISAECC

ISAECC solves the problem of increasing schedule length caused by unfair energy constraint allocation of low priority tasks by MSLECC in [20]. However, we find that it is not the best practice to treat each task fairly, because tasks in different levels have different impacts on the whole application. For example, in Figure 1, the task n 1 and task n 10 should be assigned more energy than other tasks, because their execution time can affect the overall schedule length of the application G directly. It is obvious that if the execution time of task n 1 or task n 10 is shortened or lengthened under other same conditions, the overall schedule length will shorten or lengthen the same amount accordingly. Other tasks like n 2 cannot be compared to n 1 and n 10 . If the execution time of task n 2 is shortened or lengthened under other same conditions, the schedule length of application G may not change obviously, or even not change at all. Therefore, we should consider the levels of the tasks and the number of tasks in each level when pre-allocating the energy consumption of tasks. In addition, previous studies ignored the negative impact of the local optimal characteristics of scheduling algorithm on the schedule length. In our design, these problems have been greatly improved.

4. Our Solution

In this section, we introduce our new strategy in detail. First, we introduce a new task energy pre-allocation method considering task level (Section 4.1). Then, we give a new task scheduling algorithm to minimize the schedule length under the constraint of energy consumption (Section 4.2). Finally, we describe the task execution frequency re-adjustment mechanism we added after getting the preliminary scheduling results (Section 4.3).

4.1. The New Task Energy Pre-Allocation Method

The central idea of our pre-allocation method is to consider the levels of tasks. The impact of tasks in different hierarchies on the overall schedule length of an application is different. For example, in Figure 1, if we shorten the execution time of task n 1 by increasing its execution frequency, the schedule length of application G will certainly shorten the corresponding time; but if we shorten the execution time of task n 2 , the schedule length of application G is unlikely to change much, because there are still many tasks in a similar position to it. Therefore, a more reasonable approach is to appropriately pre-allocate more energy consumption to task n 1 in the case of Figure 1. Based on this idea, we put forward a new energy pre-allocation method, based on the weight value of energy consumption and the levels of tasks.

4.1.1. Method Description

We define the level of tasks as follows:
{ L ( n e n t r y ) = 0 L ( n i ) = max n x p r e d ( n i ) { L ( n x ) + 1 } .
We define N l = { n i , n j , } as the set of tasks contained in level l, where L ( n i ) = L ( n j ) = L ( ) = l . The number of tasks in level l is | N l | . We can have that the maximum of the level of tasks is L ( n e x i t ) .
We define the improvable energy of application G ( E i e ( G ) ) and the energy consumption level of task n i ( E a v e ( n i ) ) the same as ISAECC. We define that the energy consumption level of the task level l is the sum of energy consumption level in level l, which is computed as
E a v e ( N l ) =   n i N l E a v e ( n i ) .
We define e l ( n i , l ) as the energy consumption weight of task n i in its level l, which is computed as
e l ( n i , l ) = E a v e ( n i ) E a v e ( N l ) .
We define E v a r ( N l ) as the variation energy consumption of N l , which is computed as
E v a r ( N l ) = E a v e ( N l ) K ,   K = { | N l | ,   | N l | < | U | | U | ,   | N l | | U | .
Correspondingly, the variation energy consumption of application G is computed as
E v a r ( G ) = l E v a r ( N l ) .
The energy consumption weight of N l in application G can be defined as
e l ( N l ) = E v a r ( N l ) E v a r ( G ) .
We define E i e ( N l ) as the improvable energy of N l , which is computed as
E i e ( N l ) =   E i e ( G ) × e l ( N l ) .
Therefore, we can get the new energy pre-allocation formula of n i as follows:
E p r e ( n i ) = m i n { E w a ( n i ) , E m a x ( n i ) } ,
where
E w a ( n i ) = E i e ( N l ) × e l ( n i , l ) + E m i n ( n i ) .

4.1.2. Feasibility of the Task Energy Pre-Allocation Mechanism

In order to prove the feasibility of our method, we need to prove the following theorem: Given an application G, if the unscheduled tasks are pre-allocated energy consumption according to our method, then each task n j can satisfy Equation (15).
We use mathematical induction to prove the above theorem. First, we need to prove task n o ( 1 ) can satisfy Equation (15), and the other tasks are all unscheduled. By Equations (14), (22)–(29), we can have
E o ( 1 ) ( G ) = E ( n o ( 1 ) ) + y = 2 | N | E p r e ( n o ( y ) ) E ( n o ( 1 ) ) + y = 2 | N | E w a ( n o ( y ) ) =   E ( n o ( 1 ) ) + y = 1 | N | E w a ( n o ( y ) ) E w a ( n o ( 1 ) ) =   E ( n o ( 1 ) ) + E g i v e n ( G ) E w a ( n o ( 1 ) ) .
We know that
E w a ( n o ( 1 ) ) = E i e ( N l ) × e l ( n i , l ) + E m i n ( n i ) E m i n ( n o ( 1 ) ) .
We can at least find a situation in which
E ( n o ( 1 ) ) = E m i n ( n o ( 1 ) ) .
In other words, at least when E ( n o ( 1 ) ) = E m i n ( n o ( 1 ) ) , we can have
E o ( 1 ) ( G ) E ( n o ( 1 ) ) + E g i v e n ( G ) E w a ( n o ( 1 ) ) E g i v e n ( G ) .
From the above derivation, we prove that task n o ( 1 ) can satisfy Equation (15).
Then, we assume that task n o ( j ) can satisfy Equation (15). That is
E o ( j ) ( G ) = x = 1 j 1 E ( n o ( x ) , u p r ( o ( x ) ) , f p r ( o ( x ) ) , h z ( o ( x ) ) ) + E ( n o ( j ) ) + y = j + 1 | N | E p r e ( n o ( y ) ) = x = 1 j E ( n o ( x ) , u p r ( o ( x ) ) , f p r ( o ( x ) ) , h z ( o ( x ) ) ) + y = j + 1 | N | E p r e ( n o ( y ) ) E g i v e n ( G ) .
The above formulation can be written as
x = 1 j E ( n o ( x ) , u p r ( o ( x ) ) , f p r ( o ( x ) ) , h z ( o ( x ) ) ) E g i v e n ( G ) y = j + 1 | N | E p r e ( n o ( y ) ) .
Next, we prove task n o ( j + 1 ) can satisfy Equation (15). By Equation (30), we can have
E o ( j + 1 ) ( G ) = x = 1 j E ( n o ( x ) , u p r ( o ( x ) ) , f p r ( o ( x ) ) , h z ( o ( x ) ) ) + E ( n o ( j + 1 ) ) + y = j + 2 | N | E p r e ( n o ( y ) ) E g i v e n ( G ) y = j + 1 | N | E p r e ( n o ( y ) ) + E ( n o ( j + 1 ) ) + y = j + 2 | N | E p r e ( n o ( y ) ) = E g i v e n ( G ) + E ( n o ( j + 1 ) ) E p r e ( n o ( j + 1 ) ) .
From Equations (28) and (30), we can have
E p r e ( n o ( j + 1 ) ) = m i n { E w a ( n o ( j + 1 ) ) , E m a x ( n o ( j + 1 ) ) } E m i n ( n o ( j + 1 ) ) .
Therefore, at least when E ( n o ( j + 1 ) ) = E m i n ( n o ( j + 1 ) ) , we can have
E o ( j + 1 ) ( G ) E g i v e n ( G ) + E ( n o ( j + 1 ) ) E p r e ( n o ( j + 1 ) ) E g i v e n ( G ) .
In summary, given an application G, if the unscheduled tasks are pre-allocated energy consumption according to our method, then each task n j can satisfy Equation (15). The feasibility of our method has been proved.

4.2. The Proposed Algorithm for Minimizing Schedule Length

In this section, we show our new task scheduling algorithm in Algorithm 1. In the algorithm, Line 1 is to prioritize tasks in the input application; Lines 2–10 calculate some required values for each task, each level and the application G; Lines 11 and 12 calculate the pre-allocation energy consumption of each task; Lines 13–26 are to select processor and frequency for each task; Lines 27 and 28 are to calculate the actual energy consumption E(G) and the final schedule length SL(G).
Algorithm 1. A new scheduling algorithm for minimizing schedule length.
Input: G = {N, W, C}, U, E g i v e n ( G )
Output: SL(G), E(G)
 1:  Sort tasks in a list t l by descending order of r a n k u ;
 2:  for ( i ,   n i N ) do
 3:    Compute E m i n ( n i ) and E m a x ( n i ) ; //By (10) (11)
 4:    Compute E a v e ( n i ) ; //By (21)
 5:  for ( l ,   1 l L ( n e x i t ) ) do
 6:    Compute E a v e ( N l ) ; //By (23)
 7:    Compute E v a r ( N l ) ; //By (25)
 8:  Compute E m i n ( G ) and E m a x ( G ) ; //By (12) (13)
 9:  Compute E a v e ( G ) ; //By (22)
 10:  Compute E v a r ( G ) ; //By (26)
 11:  for ( i ,   n i N ) do
 12:    Compute E p r e ( n i ) ; //By (29)
 13:  while ( t l ) do
 14:     n i = t l .out();
 15:     A F T ( n i ) = ;
 16:    Compute E g i v e n ( n i ) ; //By (18)
 17:    for ( k ,   u k U ) do
 18:      for ( f k , h ,   f k , h [ f k , l o w , f k , m a x ] ) do
 19:        Compute E ( n i ,   u k ,   f k , h ) ; //By (4)
 20:        if ( E ( n i ,   u k ,   f k , h ) >   E g i v e n ( n i ) ) then
 21:          continue;
 22:        Compute E F T ( n i ,   u k ,   f k , h ) ; //By (8)
 23:        if ( E F T ( n i ,   u k ,   f k , h ) <   A F T ( n i ) ) then
 24:          Let u p r ( i ) = u k and f p r ( i ) , h z ( i ) = f k , h ;
 25:           E ( n i ,   u p r ( i ) ,   f p r ( i ) , h z ( i ) ) =   E ( n i ,   u k ,   f k , h ) ;
 26:           A F T ( n i ) =   E F T ( n i ,   u p r ( i ) ,   f p r ( i ) , h z ( i ) ) ;
 27:  Compute actual energy consumption E ( G ) ;
 28:  Compute the schedule length S L ( G ) ;
 29:  return SL(G), E(G).
For each task, selecting the processor with the minimum EFT has complexity O ( | N | × | U | × | F | ) , where | F | represents the maximum number of discrete frequencies from f k , l o w to f k , m a x . Therefore, the complexity of Algorithm 1 is O ( | N | 2 × | U | × | F | ) the same as ISAECC in [23].

4.3. The Task Execution Frequency Re-Adjustment Mechanism

Through the new task energy pre-allocation method in Section 4.1 and the new task scheduling algorithm in Section 4.2, we can get the preliminary scheduling results. Other methods generally end here such as those in [20,21,22,23], but they do not realize that the scheduling results can be optimized to further shorten the schedule length. The same as the scheduling algorithms in [20,21,22,23], the algorithm in Section 4.2 makes tasks finish as soon as possible when scheduling them, which is not entirely reasonable. Premature completion of some tasks cannot shorten the overall schedule length, but will take up more energy consumption. Therefore, we introduce the concept of the latest finish time of tasks [28,39,40], which is defined as follows:
{ L F T ( n e x i t ) = A F T ( n e x i t ) L F T ( n i ) = m i n { min n j s u c c ( n i ) { A S T ( n j ) c i , j } , A S T ( n d n ( i ) ) } ,
where n d n ( i ) represents the downward neighbor task of n i , that is, n d n ( i ) is on the same processer core as n i and it is the first case after n i .
Through Equation (32), we can replace AFT of tasks with LFT to delay the finish time of some tasks without increasing the schedule length. As the execution time of tasks is prolonged, their running frequency will be reduced accordingly, which can save some energy consumption. Therefore, the new execution time of task n i is changed from A F T ( n i ) A S T ( n i ) to L F T ( n i ) A S T ( n i ) , and the new frequency of task n i can be changed as follows:
f p r ( i ) , n h z ( i ) = f p r ( i ) , h z ( i ) × A F T ( n i ) A S T ( n i ) L F T ( n i ) A S T ( n i ) .
The frequency range of task n i is [ f p r ( i ) , l o w , f p r ( i ) , m a x ] , so f p r ( i ) , n h z ( i ) should be
f p r ( i ) , n h z ( i ) = m a x { f p r ( i ) , h z ( i ) × A F T ( n i ) A S T ( n i ) L F T ( n i ) A S T ( n i ) , f p r ( i ) , l o w } .
Therefore, the actual execution time (AET) of task n i will be
A E T ( n i ) = w i , p r ( i ) × f p r ( i ) , m a x f p r ( i ) , n h z ( i ) .
Finally, the new A S T ( n i ) should be updated as
S T ( n i ) = L F T ( n i ) A E T ( n i ) = L F T ( n i ) w i , p r ( i ) × f p r ( i ) , m a x f p r ( i ) , n h z ( i ) .
On the basis of above equations, the algorithm to save energy after preliminary scheduling is shown in Algorithm 2. In Line 2, we reorder the tasks in descending order of the actual finish time (AFT) of tasks according to the scheduling results of Algorithm 1. In Lines 4–10, A F T ( n i ) , A S T ( n i ) and f p r ( i ) , h z ( i ) are updated. In Line 11, we compute the new E ( n i ) using the above updated values. In Lines 12 and 13, we compute the saved energy E s a v e ( G ) .
Algorithm 2. The method to save energy.
Input: G = {N, W, C}, U, E g i v e n ( G )
Output: E s a v e ( G )
 1:  Call Algorithm 1 to get the preliminary scheduling results;
 2:  Sort the tasks in the list a l according to the descending A F T ( n i ) order of values;
 3:  while ( a l ) do
 4:     n i = t l .out();
 5:    Compute L F T ( n i ) ; //By (33)
 6:    Compute the new frequency f p r ( i ) , n h z ( i ) ; //By (34)
 7:    Compute the new A E T ( n i ) ; //By (35)
 8:    Update A F T ( n i ) L F T ( n i ) ;
 9:    Update A S T ( n i ) L F T ( n i ) A E T ( n i ) ; //By (36)
 10:    Update f p r ( i ) , h z ( i ) f p r ( i ) , n h z ( i ) ;
 11:    Compute the new E ( n i ) ; //By (4)
 12:  Compute the new energy consumption of G E n e w ( G ) ; //By (5)
 13:  Compute the saved energy E s a v e ( G ) = E ( G ) E n e w ( G ) ;
 14:  return E s a v e ( G ) .
Generally speaking, after the task scheduling is completed, there will be remaining energy consumption that is not used up as follows:
E r e m ( G ) = E g i v e n ( G ) E ( G ) .
Therefore, the energy consumption that can be reused is
E r e u ( G ) = E r e m ( G ) + E s a v e ( G ) .
After calculating the energy consumption that can be reused, we need to re-allocate the energy consumption to the tasks that can directly affect the overall schedule length. Directly affecting the overall schedule length means that according to how much the task running time changes, the overall schedule length will also change. For example, task n 1 in Figure 1, the total schedule length will be shortened or extended as much as its execution time is shortened or extended. However, due to the diversity of task models, it is very difficult to find out which tasks can directly affect the overall schedule length. Therefore, we adopt a more direct way: If the application model level is strict (the tasks in level l only communicate with the tasks in their adjacent levels which are level l+1 and level l−1), we re-allocate the reused energy consumption ( E r e u ( G ) ) to the tasks that have not reached the highest execution frequency in the levels whose | N | = 1 ; otherwise, we only re-allocate the reused energy consumption to n e n t r y and n e x i t . For ease of description, we define N d a s as the set of tasks that can directly affect the overall schedule length.
After the assignment object of E r e u ( G ) is determined, we need to determine the allocation proportion. We define the maximum energy consumption of n i N d a s on processer core u p r ( i ) is
E m a x ( n i , u p r ( i ) ) = ( P p r ( i ) , i n d + C p r ( i ) , e f × ( f p r ( i ) , m a x ) m p r ( i ) ) × w i , p r ( i ) .
Therefore, the growable energy consumption of n i N d a s on processer core u p r ( i ) will be
E g r o ( n i ) = E m a x ( n i , p r ( i ) ) E ( n i ) .
We take the values of E g r o ( n i ) as the allocation proportion of E r e u ( G ) . Therefore, the reused energy consumption of n i N d a s will be
E r e u ( n i ) = E r e u ( G ) × E g r o ( n i ) i = 1 | N | E g r o ( n i ) .
Adding E r e u ( n i ) and E ( n i ) which is computed in Algorithm 2, we can get the energy consumption that n i N d a s can use will be
E u s e ( n i ) = E r e u ( n i ) + E ( n i ) .
According to E u s e ( n i ) , we can find the new frequency f p r ( i ) , n h of n i N d a s by traversing the execution frequency on processor core u p r ( i ) . Then we can compute the shortened actual execution time of n i N d a s as follows:
A E T s h o r ( n i ) = w i , p r ( i ) × f p r ( i ) , m a x × ( 1 f p r ( i ) , h 1 f p r ( i ) , n h ) .
Therefore, the new length of the application G will be
S L n e w ( G ) = S L ( G ) n i N d a s A E T s h o r ( n i ) .
Combined with the Algorithms 1 and 2, the task execution frequency re-adjustment mechanism is shown in Algorithm 3. Lines 1 and 2 call Algorithms 1 and 2 to get the preliminary scheduling results and E s a v e ( G ) . Line 3 compute the reused energy consumption E r e u ( G ) . Lines 6 and 7 calculate some required values for each task belonging to N d a s . Lines 8–14 are to select the new frequency for each task belonging to N d a s . Finally, we compute the new schedule length of the application G S L n e w ( G ) after the task execution frequency re-adjustment mechanism in Line 15. The value of S L n e w ( G ) is the final minimum schedule length we get.
In general, our method includes Algorithms 1–3. Algorithm 1 is a task energy pre-allocation strategy, and Algorithms 2 and 3 constitute the task execution frequency re-adjustment mechanism. For a given application, we operate in the order of Algorithms 1, Algorithm 2 and Algorithm 3, then we can get the scheduling result of minimizing the schedule length of the application.
Algorithm 3. The task execution frequency re-adjustment mechanism.
Input: G = {N, W, C}, U, E g i v e n ( G )
Output:   S L ( G )
 1:  Call Algorithm 1 to get the preliminary scheduling results;
 2:  Call Algorithm 2 to get E s a v e ( G ) ;
 3:  Compute E r e u ( G ) ; //By (38)
 4:  for ( n i , n i N d a s ) do
 5:     E ( n i ) = 0 ;
 6:    Compute E g r o ( n i ) ; //By (40)
 7:    Compute E u s e ( n i ) ; //By (42)
 8:    for ( f p r ( i ) , h ,   f p r ( i ) , h [ f p r ( i ) , l o w , f p r ( i ) , m a x ] ) do
 9:      Compute E ( n i ,   u p r ( i ) ,   f p r ( i ) , h ) ; //By (4)
 10:      if ( E ( n i ,   u p r ( i ) ,   f p r ( i ) , h ) >   E u s e ( n i ) ) then
 11:        continue;
 12:      if ( E ( n i ,   u p r ( i ) ,   f p r ( i ) , h ) > E ( n i ) ) then
 13:        Let E ( n i ) =   E ( n i ,   u p r ( i ) ,   f p r ( i ) , h ) ;
 14:        Let   f p r ( i ) , n h =   f p r ( i ) , h ;
 15:    Compute S L n e w ( G ) ; //By (44)
 16:    Update S L ( G ) S L n e w ( G ) ;
 17:  return S L ( G ) .

5. Experiments

In this section, we use four algorithms, MSLECC [20], WALECC [21], EECC [22] and ISAECC [23], which are the same as the goal of this article to compare with our proposed method. The configuration of the experimental platform is AMD Ryzen 5 2500U CPU @ 2.00 GHz, 8 GB RAM (Santa Clara, CA, USA), 64-bit Windows 10 Home Edition. The whole set of codes is mainly implemented by C and scripts. The final schedule length SL(G) is the only evaluation standard of these algorithms.
The parameters of the processors and applications as follows: 10   ms w i , k 100   ms , 10   ms c i , j 100   ms , 0.03 P k , i n d 0.07 , 0.8 C k , e f 1.2 , 2.5 m k 3.0 , and f k , m a x = 1 GHz. The execution frequency is discrete, and the precision is 0.01 GHz. The simulated heterogeneous platform for testing the problem of minimizing the schedule length uses four processor cores.
We chose two DAG models to evaluate our algorithm, which are two real-world applications (Fast Fourier transform and Gaussian elimination).

5.1. Fast Fourier Transform Application

We first consider the fast Fourier transform (FFT), Figure 2a shows an example of the FFT parallel application with ρ = 4 . The parameter ρ can represent the size of application models. For FFT application, the number of tasks is | N | = ( 2 × ρ 1 ) + ρ × log 2 ρ , and ρ = 2 x where x is an integer. We can see that there are four exit tasks (task 12, 13, 14, 15) in the FFT graph in Figure 2a, and there will be ρ exit tasks in the FFT graph with parameter ρ . In order to match the application model in Section 3.1, we add a dummy exit task whose execution time is 0. We also connect the dummy exit task to the last ρ exit tasks, and we set their communication time is 0. For example, Figure 2b shows the changed FFT parallel application with ρ = 4 which is added a dummy exit task.
Experiment 1. In order to observe the performance on different energy consumption constraints, this experiment is carried out to compare the final schedule length values of the FFT application for varying energy consumption constraints. We use the FFT application with ρ = 32 , that is, the number of tasks is 233 ( | N | = 233 ). We set the energy consumption constraints E g i v e n ( G ) = E m i n ( G ) + M 233 ( E m a x ( G ) E m i n ( G ) ) , where 1 M 232 and M is an integer.
Table 4 shows the details of the final schedule lengths of FFT application with ρ = 32 for varying E g i v e n ( G ) by using all the algorithms, and a more intuitive feeling can be performed through Figure 3. It can be seen that our algorithm has the obvious advantage on the schedule length S L ( G ) compared to other algorithms. From the experimental results, we can get that our method outperforms MSLECC by about 28.13%~36.35%, and it outperforms the newest method ISAECC by about 3.65%~10.48%. The results of WALECC and EECC are similar to that of ISAECC.
Further, we can find that the gaps between our method and other algorithms increase when E g i v e n ( G ) decreases. This is because all tasks can be assigned to a relatively large energy consumption constraint when the energy consumption constraint is large, so that the impact of task level is smaller. Moreover, at this time, most tasks belonging to N d a s have reached the maximum frequency, and cannot continue to increase the frequency to shorten the schedule length. In addition, as we expected, the larger E g i v e n ( G ) is, the shorter schedule length we can obtain.
Experiment 2. In order to observe the algorithm performance under different number of tasks, an experiment is carried out to compare the final schedule length values of the FFT application for varying number of tasks. The parameter ρ is changed from 8 to 256. In order to get relatively obvious results, we set the energy consumption constraints to a relatively small value: E g i v e n ( G ) = E m i n ( G ) + E m a x ( G ) E m i n ( G ) 8 .
Table 5 shows the results of FFT applications for different number of tasks by using all the algorithms, and a more intuitive feeling can be performed through Figure 4. The results show that our method has better performance than other algorithms. Our method outperforms MSLECC by about 21.24%–31.10%, and outperforms ISAECC by about 4.80%–7.32%. From the results, we can find that the more tasks we have, the better our method performs. This is because that there will be more different hierarchies in FFT graphs as the number of tasks increases, so that our new energy pre-allocation strategy will become more advantageous.

5.2. Gaussian Elimination Application

Similarly, in Gaussian elimination (GE) application, we define ρ as the size of the application, and the number of tasks can be calculated by | N | = ρ 2 + ρ 2 2 . Figure 5 shows the GE application model with ρ = 5 .
Experiment 3. This experiment compares the final schedule length values of GE application for varying energy consumption constraints. We use the GE application with ρ = 21 , that is, the number of tasks is 230 ( | N | = 230 ). We set the energy consumption constraints E g i v e n ( G ) = E m i n ( G ) + M 230 ( E m a x ( G ) E m i n ( G ) ) , where 1 M 229 and M is an integer.
Table 6 shows the results of the final schedule lengths of GE application with ρ = 21 for varying E g i v e n ( G ) by using all the algorithms, and a more intuitive feeling can be performed through Figure 6. We can see that our method still performs better than other algorithms, specifically, it outperforms MSLECC by about 14.69%–34.55%, and outperforms ISAECC by about 3.94%–9.60%, but the improvement is not obvious compared with FFT models. This is because that the level of GE application is not as strict as the FFT application, so that the effect of our method has diminished.
Experiment 4. This experiment compares the final schedule length values of GE application for varying number of tasks. We set that E g i v e n ( G ) = E m i n ( G ) + E m a x ( G ) E m i n ( G ) 4 and change the number of tasks.
Table 7 shows the results of the final schedule lengths of GE application for varying number of tasks by using all the algorithms, and a more intuitive feeling can be performed through Figure 7. Our method still has a better effect on the schedule length than other algorithms which performs better 16.00%–28.85% than MSLECC and 3.36%–10.98% than ISAECC.

5.3. Analysis and Summary of Experimental Results

We can obtain that our method has better performance compared with the other algorithms from the experimental results. However, when E g i v e n ( G ) becomes larger or the number of tasks is relatively small, the advantage of our method is not particularly obvious. The former is because that all tasks can be allocated to a relatively large energy consumption constraint when E g i v e n ( G ) is large, so that the impact of task level will be smaller. The latter is because that the experimental results are more accidental and cannot reflect the general law when the number of tasks is small. We can also find that our method performs better on FFT models than on GE models with similar E g i v e n ( G ) and the number of tasks. This is because that the task level of GE application is not as strict as the FFT application, so that the effect of our method has diminished. In summary, our method is more applicable when the energy constraints are stringent, the number of tasks is large, or the task level of the application is strict.

6. Conclusions

In this article, we propose a novel method to minimize the scheduling length of energy-constrained applications which run on a heterogeneous multi-core system. Our method mainly includes two parts: a novel task energy pre-allocation strategy and the schedule algorithm based on it; a re-adjustment mechanism of task execution frequency after preliminary scheduling. The core idea of our method is that the tasks in different hierarchies have different impacts on the whole application and the negative impact of local optimal scheduling should be reduced. Our method can be integrated into actual multi-core embedded systems, and it is particularly suitable for wearable devices, mobile robots and other products with high requirements for energy saving and performance. We carry out a considerable number of experiments with two practical parallel application models (FFT and GE). The results of experiments show that our method is generally superior to other existing algorithms. However, the experimental results also demonstrate the limitations of our method, which are that our method does not offer much of an advantage when the energy constraints are not stringent, the number of tasks is small or the task level of the application is not strict.
In the future, we will improve and extend our method, and some further studies will be done. The points that can be further studied are as follows:
  • Consider other factors that affect the length of application scheduling, such as the way to determine the priority of tasks.
  • Explore ways to improve the limitations of our method and make it more universal.
  • Integrate our method into an actual embedded multi-core system and test its performance.
  • Extend our method to study other indicators in multi-core task scheduling, such as reliability.

Author Contributions

Conceptualization, M.J. and X.J.; methodology, K.H., M.J. and X.J.; software, M.J.; validation, K.H., M.J. and S.C.; formal analysis, M.J. and X.J.; investigation, M.J., S.C., X.L. and W.T.; resources, K.H., X.J. and Z.L.; data curation, M.J., X.L. and W.T.; writing—original draft preparation, M.J.; writing, review and editing, K.H., M.J., X.J. and D.X.; visualization, K.H.; supervision, K.H. and X.J.; project administration and funding acquisition, K.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Key R&D Program of China (2020YFB0906000, 2020YFB0906001).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Weiser, M.; Welch, B.; Demers, A.; Shenker, S. Scheduling for Reduced CPU Energy; Springer: Boston, MA, USA, 1994. [Google Scholar]
  2. Li, Keqin. Scheduling Precedence Constrained Tasks with Reduced Processor Energy on Multiprocessor Computers. IEEE Trans. Comput. 2012, 61, 1668–1681. [Google Scholar] [CrossRef]
  3. Xie, G.; Zeng, G.; Li, R.; Li, K. Energy-Aware Processor Merging Algorithms for Deadline Constrained Parallel Applications in Heterogeneous Cloud Computing. IEEE Trans. Sustain. Comput. 2017, 2, 62–75. [Google Scholar] [CrossRef]
  4. Kang, L.; Wang, Z.J.; Quan, Z.; Wu, W.; Guo, S.; Li, K.; Li, K. An Efficient Method for Optimizing PETSc on the Sunway TaihuLight System. In Proceedings of the 2018 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), Guangzhou, China, 8–12 October 2018. [Google Scholar]
  5. Islam, M.A.; Ren, S.; Mahmud, A.H.; Quan, G. Online Energy Budgeting for Cost Minimization in Virtualized Data Center. IEEE Trans. Serv. Comput. 2016, 9, 421–432. [Google Scholar] [CrossRef]
  6. Zhang, J.; Wang, Z.J.; Quan, Z.; Yin, J.; Chen, Y.; Guo, M. Optimizing power consumption of mobile devices for video streaming over 4G LTE networks. Peer Peer Netw. Appl. 2017, 11, 1101–1114. [Google Scholar] [CrossRef]
  7. Ranjan, R.; Wang, L.; Zomaya, A.Y.; Georgakopoulos, D.; Sun, X.H.; Wang, G. Recent advances in autonomic provisioning of big data applications on clouds. IEEE Trans. Cloud Comput. 2015, 3, 101–104. [Google Scholar] [CrossRef]
  8. Xu, C.; Wang, K.; Li, P.; Guo, S.; Luo, J.; Ye, B.; Guo, M. Making Big Data Open in Edges: A Resource-Efficient Blockchain-Based Approach. IEEE Trans. Parallel Distrib. Syst. 2019, 30, 870–882. [Google Scholar] [CrossRef]
  9. Jin, P.; Hao, X.; Wang, X.; Yue, L. Energy-Efficient Task Scheduling for CPU-Intensive Streaming Jobs on Hadoop. IEEE Trans. Parallel Distrib. Syst. 2018, 30, 1298–1311. [Google Scholar] [CrossRef]
  10. Choi, Y.; Rhu, M. PREMA: A Predictive Multi-Task Scheduling Algorithm for Preemptible Neural Processing Units. In Proceedings of the 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), San Diego, CA, USA, 22–26 February 2020. [Google Scholar]
  11. Xu, C.; Wang, K.; Guo, M. Intelligent Resource Management in Blockchain-Based Cloud Datacenters. IEEE Cloud Comput. 2018, 4, 50–59. [Google Scholar] [CrossRef]
  12. Fu, Z.; Tang, Z.; Yang, L.; Liu, C. An Optimal Locality-Aware Task Scheduling Algorithm Based on Bipartite Graph Modelling for Spark Applications. IEEE Trans. Parallel Distrib. Syst. 2020, 31, 2406–2420. [Google Scholar] [CrossRef]
  13. Xie, G.; Zeng, G.; Liu, Y.; Zhou, J.; Li, R.; Li, K. Fast Functional Safety Verification for Distributed Automotive Applications during Early Design Phase. IEEE Trans. Ind. Electron. 2017, 65, 4378–4391. [Google Scholar] [CrossRef]
  14. Cho, H.; Kim, C.; Sun, J.; Easwaran, A.; Park, J.D.; Choi, B.C. Scheduling Parallel Real-Time Tasks on the Minimum Number of Processors. IEEE Trans. Parallel Distrib. Syst. 2019, 31, 171–186. [Google Scholar] [CrossRef]
  15. Gharehbaghi, K.; Koçer, F.; Külah, H. Optimization of Power Conversion Efficiency in Threshold Self-Compensated UHF Rectifiers with Charge Conservation Principle. IEEE Trans. Circuits Syst. I Regul. Pap. 2017, 64, 2380–2387. [Google Scholar] [CrossRef]
  16. Sandoval, F.; Poitau, G.; Gagnon, F. Hybrid Peak-to-Average Power Ratio Reduction Techniques: Review and Performance Comparison. IEEE Access 2017, 5, 27145–27161. [Google Scholar] [CrossRef]
  17. Chen, C.Y. An Improved Approximation for Scheduling Malleable Tasks with Precedence Constraints via Iterative Method. IEEE Trans. Parallel Distrib. Syst. 2018, 29, 1937–1946. [Google Scholar] [CrossRef]
  18. Huang, K.; Zhang, X.; Zheng, D.; Yu, M.; Jiang, X.; Yan, X.; de Brisolara, L.B.; Jerraya, A.A. A Scalable and Adaptable ILP-Based Approach for Task Mapping on MPSoC Considering Load Balance and Communication Optimization. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 2018, 38, 1744–1757. [Google Scholar] [CrossRef]
  19. Zahaf, H.E.; Lipari, G.; Bertogna, M.; Boulet, P. The Parallel Multi-Mode Digraph Task Model for Energy-Aware Real-Time Heterogeneous Multi-Core Systems. IEEE Trans. Comput. 2019, 68, 1511–1524. [Google Scholar] [CrossRef]
  20. Xiao, X.; Xie, G.; Li, R.; Li, K. Minimizing Schedule Length of Energy Consumption Constrained Parallel Applications on Heterogeneous Distributed Systems. In Proceedings of the 2016 IEEE Trustcom/BigDataSE/I SPA, Tianjin, China, 23–26 August 2016. [Google Scholar]
  21. Hu, F.; Quan, X.; Lu, C. A Schedule Method for Parallel Applications on Heterogeneous Distributed Systems with Energy Consumption Constraint. In Proceedings of the 3rd International Conference on Multimedia Systems and Signal Processing, Shenzhen, China, 28 April 2018; pp. 134–141. [Google Scholar]
  22. Li, J.; Xie, G.; Li, K.; Tang, Z. Enhanced Parallel Application Scheduling Algorithm with Energy Consumption Constraint in Heterogeneous Distributed Systems. J. Circuits Syst. Comput. 2019, 28, 1950190. [Google Scholar] [CrossRef]
  23. Quan, Z.; Wang, Z.J.; Ye, T.; Guo, S. Task Scheduling for Energy Consumption Constrained Parallel Applications on Heterogeneous Computing Systems. IEEE Trans. Parallel Distrib. Syst. 2019, 31, 1165–1182. [Google Scholar] [CrossRef]
  24. Ren, J.; Su, X.; Xie, G.; Yu, C.; Tan, G.; Wu, G. Workload-Aware Harmonic Partitioned Scheduling of Periodic Real-Time Tasks with Constrained Deadlines. In Proceedings of the 2019 56th ACM/IEEE Design Automation Conference (DAC), Las Vegas, NV, USA, 2–6 June 2019. [Google Scholar]
  25. Rapp, M.; Sagi, M.; Pathania, A.; Herkersdorf, A.; Henkel, J. Power- and Cache-Aware Task Mapping with Dynamic Power Budgeting for Many-Cores. IEEE Trans. Comput. 2019, 69, 1–13. [Google Scholar] [CrossRef]
  26. Li, K. Power and performance management for parallel computations in clouds and data centers. J. Comput. Syst. Ences 2016, 82, 174–190. [Google Scholar] [CrossRef]
  27. Ye, T.; Wang, Z.J.; Quan, Z.; Guo, S.; Li, K.; Li, K. ISAECC: An Improved Scheduling Approach for Energy Consumption Constrained Parallel Applications on Heterogeneous Distributed Systems. In Proceedings of the 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS), Singapore, 11–13 December 2018. [Google Scholar]
  28. Xie, G.; Jiang, J.; Liu, Y.; Li, R.; Li, K. Minimizing Energy Consumption of Real-Time Parallel Applications using Downward and Upward Approaches on Heterogeneous Systems. IEEE Trans. Ind. Inform. 2017, 13, 1068–1078. [Google Scholar] [CrossRef]
  29. Li, K. Performance Analysis of Power-Aware Task Scheduling Algorithms on Multiprocessor Computers with Dynamic Voltage and Speed. IEEE Trans. Parallel Distrib. Syst. 2008, 19, 1484–1497. [Google Scholar]
  30. Li, K.; Tang, X.; Li, K. Energy-Efficient Stochastic Task Scheduling on Heterogeneous Computing Systems. IEEE Trans. Parallel Distrib. Syst. 2014, 25, 2867–2876. [Google Scholar] [CrossRef]
  31. Rusu, C. Maximizing rewards for real-time applications with energy constraints. Acm Trans. Embed. Comput. Syst. 2003, 2, 537–559. [Google Scholar] [CrossRef]
  32. Tang, Z.; Qi, L.; Cheng, Z.; Li, K.; Khan, S.U.; Li, K. An Energy-Efficient Task Scheduling Algorithm in DVFS-enabled Cloud Environment. J. Grid Comput. 2016, 14, 55–74. [Google Scholar] [CrossRef]
  33. Huang, Q.; Su, S.; Li, J.; Xu, P.; Shuang, K.; Huang, X. Enhanced energy-efficient scheduling for parallel applications in cloud. In Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012), Ottawa, ON, Canada, 13–16 May 2012; pp. 781–786. [Google Scholar]
  34. Meng, J.; Tan, H.; Li, X.Y.; Han, Z.; Li, B. Online Deadline-Aware Task Dispatching and Scheduling in Edge Computing. IEEE Trans. Parallel Distrib. Syst. 2020, 31, 1270–1286. [Google Scholar] [CrossRef]
  35. He, Q.; Guan, N.; Guo, Z. Intra-Task Priority Assignment in Real-Time Scheduling of DAG Tasks on Multi-Cores. IEEE Trans. Parallel Distrib. Syst. 2019, 30, 2283–2295. [Google Scholar] [CrossRef]
  36. Lin, X.; Wang, Y.; Xie, Q.; Pedram, M. Task Scheduling with Dynamic Voltage and Frequency Scaling for Energy Minimization in the Mobile Cloud Computing Environment. IEEE Trans. Serv. Comput. 2015, 8, 175–186. [Google Scholar] [CrossRef]
  37. Xie, G.; Chen, Y.; Liu, Y.; Wei, Y.; Li, R.; Li, K. Resource Consumption Cost Minimization of Reliable Parallel Applications on Heterogeneous Embedded Systems. IEEE Trans. Ind. Inform. 2017, 13, 1629–1640. [Google Scholar] [CrossRef]
  38. Anselmi, J.; Doncel, J. Asymptotically Optimal Size-Interval Task Assignments. IEEE Trans. Parallel Distrib. Syst. 2019, 30, 2422–2433. [Google Scholar] [CrossRef] [Green Version]
  39. Cheng, D.; Zhou, X.; Lama, P.; Ji, M.; Jiang, C. Energy Efficiency Aware Task Assignment with DVFS in Heterogeneous Hadoop Clusters. IEEE Trans. Parallel Distrib. Syst. 2017, 29, 70–82. [Google Scholar] [CrossRef]
  40. Canon, L.C.; Marchal, L.; Simon, B.; Vivien, F. Online Scheduling of Task Graphs on Heterogeneous Platforms. IEEE Trans. Parallel Distrib. Syst. 2019, 31, 721–732. [Google Scholar] [CrossRef] [Green Version]
Figure 1. An example of a directed acyclic graph (DAG)-based parallel application with ten tasks.
Figure 1. An example of a directed acyclic graph (DAG)-based parallel application with ten tasks.
Electronics 09 02077 g001
Figure 2. Fast Fourier transform (FFT) application model with ρ = 4 : (a) original FFT application model; (b) FFT application model with a dummy exit task.
Figure 2. Fast Fourier transform (FFT) application model with ρ = 4 : (a) original FFT application model; (b) FFT application model with a dummy exit task.
Electronics 09 02077 g002
Figure 3. Final schedule length of FFT application varying E g i v e n ( G ) .
Figure 3. Final schedule length of FFT application varying E g i v e n ( G ) .
Electronics 09 02077 g003
Figure 4. Final schedule length of FFT application varying number of tasks.
Figure 4. Final schedule length of FFT application varying number of tasks.
Electronics 09 02077 g004
Figure 5. Gaussian elimination (GE) application model with ρ = 5 .
Figure 5. Gaussian elimination (GE) application model with ρ = 5 .
Electronics 09 02077 g005
Figure 6. Final schedule length of GE application varying E g i v e n ( G ) .
Figure 6. Final schedule length of GE application varying E g i v e n ( G ) .
Electronics 09 02077 g006
Figure 7. Final schedule length of GE application varying number of tasks.
Figure 7. Final schedule length of GE application varying number of tasks.
Electronics 09 02077 g007
Table 1. The main notations we use.
Table 1. The main notations we use.
NotationDescription
w i , k Execution time of task n i on the processor core u k with the maximum frequency
c i , j Communication time from n i to n j
p r e d ( n i ) The set of direct predecessor tasks of task n i
s u c c ( n i ) The set of direct successor tasks of task n i
n e n t r y Entry task of an application
n e x i t Exit task of an application
E ( n i , u k , f k , h ) The energy consumption of the task n i on the processor core u k with the frequency f k , h
E S T ( n i , u k ) The earliest start time of task n i running on processor core u k
E F T ( n i , u k , f k , h ) The earliest finish time of task n i running on processor core u k with frequency f k , h
A S T ( n i ) The actual start time of task n i
A E T ( n i ) The actual execution time of task n i
A F T ( n i ) The actual finish time of task n i
L F T ( n i ) The latest finish time of task n i
L ( n i ) The level of task n i
u p r ( i ) The processor core allocated to task n i
f p r ( i ) , h z ( i ) The execution frequency allocated to task n i on processor core u p r ( i )
E g i v e n ( n i )  The calculated energy consumption constraint of task n i
E p r e ( n i )  The pre-allocated energy consumption of task n i
E g i v e n ( G )  The given energy consumption constraint of application G
E ( G )  The energy consumption of application G
S L ( G )  The schedule length of application G
Table 2. Execution time of tasks on different processors with the maximum frequency of the application in Figure 1.
Table 2. Execution time of tasks on different processors with the maximum frequency of the application in Figure 1.
Task n 1 n 2 n 3 n 4 n 5 n 6 n 7 n 8 n 9 n 10
Core
u 1 141311131213751821
u 2 161913813161511127
u 3 918191710911142016
Table 3. Upward rank values for tasks of the application in Figure 1.
Table 3. Upward rank values for tasks of the application in Figure 1.
Task n 1 n 2 n 3 n 4 n 5 n 6 n 7 n 8 n 9 n 10
r a n k u 1087780806963.3342.6735.6744.3314.67
Table 4. Results of FFT applications with ρ = 32 for varying E g i v e n ( G ) .
Table 4. Results of FFT applications with ρ = 32 for varying E g i v e n ( G ) .
E g i v e n ( G ) S L ( G )
MSLECCWALECCEECCISAECCOur method
208610,9898023795578146846
271459154491456444173887
334246443302348733903108
404840522918293729662779
522633812546260924842388
616829472253231522732190
Table 5. Results of FFT applications for varying number of tasks.
Table 5. Results of FFT applications for varying number of tasks.
ρ S L ( G )
MSLECCWALECCEECCISAECCOur Method
8805654679666634
1616191331135812941227
3236642960277728142641
6484336340631463265922
12818,95413,89213,97514,13113,168
25643,52431,98232,81432,35429,987
Table 6. Results of GE applications with ρ = 21 for varying E g i v e n ( G ) .
Table 6. Results of GE applications with ρ = 21 for varying E g i v e n ( G ) .
E g i v e n ( G ) S L ( G )
MSLECCWALECCEECCISAECCOur Method
243668266144591259735692
276361555389560353985128
366555194457452143244119
653547213776350234183090
719138643093321431863043
874933973012312930172898
Table 7. Results of GE applications for varying number of tasks.
Table 7. Results of GE applications for varying number of tasks.
ρ S L ( G )
MSLECCWALECCEECCISAECCOur Method
81056897911893847
1321851794180517371599
2136983269308232143106
4118,84514,50414,72914,56313,534
8339,58431,37531,55131,42230,032
10059,95146,21048,23847,91642,657
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Huang, K.; Jing, M.; Jiang, X.; Chen, S.; Li, X.; Tao, W.; Xiong, D.; Liu, Z. Task-Level Aware Scheduling of Energy-Constrained Applications on Heterogeneous Multi-Core System. Electronics 2020, 9, 2077. https://doi.org/10.3390/electronics9122077

AMA Style

Huang K, Jing M, Jiang X, Chen S, Li X, Tao W, Xiong D, Liu Z. Task-Level Aware Scheduling of Energy-Constrained Applications on Heterogeneous Multi-Core System. Electronics. 2020; 9(12):2077. https://doi.org/10.3390/electronics9122077

Chicago/Turabian Style

Huang, Kai, Ming Jing, Xiaowen Jiang, Siheng Chen, Xiaobo Li, Wei Tao, Dongliang Xiong, and Zhili Liu. 2020. "Task-Level Aware Scheduling of Energy-Constrained Applications on Heterogeneous Multi-Core System" Electronics 9, no. 12: 2077. https://doi.org/10.3390/electronics9122077

APA Style

Huang, K., Jing, M., Jiang, X., Chen, S., Li, X., Tao, W., Xiong, D., & Liu, Z. (2020). Task-Level Aware Scheduling of Energy-Constrained Applications on Heterogeneous Multi-Core System. Electronics, 9(12), 2077. https://doi.org/10.3390/electronics9122077

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop