1. Introduction
The use of embedded systems is nowadays spreading at an increasing speed, to all aspects of modern life as well as all phases of industrial production. The processing capability of multicore systems permits multiple embedded applications on a single shared hardware platform. Nevertheless, multicore systems add many sources of indeterminism, leading to a number of execution delays. These sources of indeterminism mainly involve shared hardware resources, such as buses, caches, and memories. It is necessary to analyse the temporal model in the context of a multicore system and not just when it is running without contenders [
1]. Specifically, the interference appears when cores contend for these shared resources. In addition, in highly critical systems, the static plan must take the interference into account. If not, the system’s feasibility may be jeopardized.
When this interference is taken into account, the model and the schedulability analysis is often limited to a particular type of hardware resource and even a particular type of memory.
Recently, in [
2], a new task model is proposed that takes into account the interference of hardware shared resources in the context of multicore hard real-time systems. This model is general for any kind of hardware resource. However, there is no schedulability test for this new task model. This interference delay can be large and highly variable, posing a major challenge to the schedulability of real-time critical multicore systems.
Contribution: This paper proposes a schedulability test for multicore real-time systems for the model presented by Ref. [
2], since in this work they did not present any schedulability analysis.
An allocation algorithm to minimise this interference is also proposed and compared with existing allocators. Our proposal is valid for both fixed and dynamic priorities.
The novelty of the contribution is the consideration of dynamic priority scheduling in a model that considers interference due to shared hardware resources. We use a general model that can be used with different types of shared hardware resources in contrast with other works that assume a very specific kind of resource. Other works assume only fixed priorities or that the interference is only valid for a specific type of shared resource [
2].
The paper is organized as follows:
Section 2 presents the relevant works in partitioned multicore systems scheduling. In
Section 3, the system model used is presented.
Section 4 presents the contention aware utilisation factor that is the basis for the schedulability analysis in
Section 5. In
Section 6 the allocation algorithm that minimises the interference is proposed. The evaluation of the proposal is presented in
Section 7, while the conclusions and further work are given in
Section 8.
2. Related Works
There is a lot of research about real-time multicore systems scheduling, and one of the most relevant surveys in this area is [
3]. In multicore scheduling there are two main branches depending on the criticality required: partitioned and global scheduling. Since our scope of application is hard real-time, from now on we will assume partitioned scheduling. Partitioned multicore scheduling involves two phases: task to core allocation and task scheduling of each core.
The allocation of tasks to cores can be solved with bin packing techniques. We know that this problem is NP-hard in the strong sense [
4]. Some of the most popular heuristics, cited in Refs. [
5,
6] are Worst fit (WF), First fit (FF), and Best fit (BF). Coffman et al. [
7] describes many of these algorithms in detail.
It is important to note that the previous allocators do not take into account the unpredictability produced by shared hardware resources. In Ref. [
8] a comprehensive analysis of all possible sources of indeterminism is presented. These sources are classified into primary and secondary. They consider primary sources such as caches, memory, FSB, and memory controller and secondary sources such as power saving strategies, hardware-prefetching, system management interrupts, and translation look-aside buffer.
Additionally, in [
1] a state-of-the-art about contention delays is presented. This topic is also analysed in [
9], where the sources of timing interference in single-core, multi-core, and distributed systems are presented. As stated in this paper, memory interference can render a system infeasible. It is shown in [
10] that it is possible that memory interference can cause a worst-case response time increase of almost 300%, even for tasks that spend only 10% of their time fetching memory on an eight-core system.
There are some works that reduce interference delays by using modified task models. In [
11], a new model called PREM (predictable execution model) is proposed. This proposal divides one task into two phases: communication and execution. A similar technique is used in [
12], which schedules the system with the goal of minimising the makespan by letting the scheduler decide when it is appropriate to avoid interference. For DAG task models, Ref. [
13] proposes a scheduling method that applies the LET (Logical Execution Time) paradigm and considers the communication timing between nodes to reduce interference delay due to shared hardware resources.
Other works, such as Ref. [
14], proposes a feedback control scheme where critical and non-critical tasks are separated and assigned to different partitions for ensuring the execution of the critical tasks, so a hypervisor manages the multicore hardware system in order to limit memory bus access in non-critical cores measured with Performance Monitor Counters. In Ref. [
15], the authors propose an analysis of memory contention as an optimisation problem, which tries to minimize memory interference. They split tasks intro three phases and consider multiple memory transactions issued during each phase. Other approaches, such as the one presented in Ref. [
16], reduce contention using synchronisation-based interference models and specific memory allocation schemes.
The survey in Ref. [
17] provides an overview on timing verification techniques for multicore real-time systems until 2018. This survey considers single shared resources (memory bus, shared cache, DRAM) and also multiple resources, which is what this work focuses on. The most relevant works come from the Multicore Response Time Analysis framework in Ref. [
18], which provides a general approach to timing verification for multicore systems. They omit the notion of worst case execution time (WCET) and instead directly target the calculation of task response times through execution traces. They start from a given mapping of tasks to cores and assume fixed-priority preemptive scheduling.
Other works cited in the survey, such as Refs. [
19,
20] consider the amount of time for shared resources accesses and the maximum number of access segments, which is out of the scope of this work.
Regarding mapping and scheduling, the survey presents several works grouped by the techniques that are used: bin-packing algorithms, genetic algorithms, ILP and MILP (Integer/Mixed Integer Linear Programming), etc. ILP and MILP techniques consider scratch-pad memories with different objectives: to minimise the initiation intervals of task graphs, to minimise the WCETs, to minimize the worst case response time (WCRT) of task graphs, etc. The presented techniques are not scalable to large numbers of cores in the system and the authors present heuristics that are scalable.
There are some works that introduce interference due to memory contention as a new parameter in the temporal model. In Ref. [
21] WCRA (Worst Case number of shared Resource Accesses) is defined and added to the WCET. In the same way, Ref. [
22] proposed the concept of interference-sensitive Worst-Case Execution Time (isWCET). A dynamic approach is presented in Ref. [
23] so that, depending on the progress of each kernel, the dependencies of the isWCET schedules are reduced or completely eliminated. The concept of isWCET is similar to our work, but the proposals are centred around minimising the effect of interference with new scheduling methods while our work focuses on allocation algorithms and schedulability conditions. In Ref. [
24] memory interference is analysed in a similar way to our work, as it is also represented as a parameter of the temporal model. However, their work is based on the interference produced only by the DRAM memory while our proposal is agnostic with respect to the shared hardware resource used. Their work also considers fixed priority scheduling while our proposal considers dynamic priorities.
The work in Ref. [
25] provides the Multicore Resource Stress and Sensitivity (MRSS) task model that characterises how much stress each task places on resources and its sensitivity to such resource stress. This work considers different types of interference (limited, direct, and indirect) and fixed priority scheduling policies. In contrast to this work, the task-to-cores allocation is known a priori.
In Ref. [
2], the interference due to contention is added to the temporal model. Instead of adding it to the WCET, they propose a scheduling algorithm that computes the exact value of interference and an allocator that tries to reduce this total interference.
In Ref. [
26], partitioned scheduling that considers interference while making partition and scheduling decisions is presented. They present a mixed integer linear programming model to obtain the optimal solution but with a high computation cost and they also propose approximation algorithms. They only consider implicit deadline models. This paper differentiates between isolated WCET and the WCET with interference and overhead. They define an Inter-Task interference matrix, in which each element of the matrix is the interference utilisation between two tasks, considering the inflated WCET when two tasks run together. This work is similar to Ref. [
2], but in Ref. [
2] a general model is considered, valid for any type of shared hardware resource while Ref. [
26] only interference due to cache sharing is considered.
Our work continues the research of Ref. [
2], so we use the same temporal model to provide a schedulability bound. Therefore, our model does not only provide a schedulability analysis for both fixed and dynamic priorities with a general model for interference, but also a new allocation algorithm that minimizes this bound.
3. Problem Definition and Task Model
The aim of this work is to provide a schedulability condition and an interference-conscious allocator for the task model presented below. This task model is the same as the one presented in Ref. [
2].
We suppose a multicore system with
m cores (
) where a task set
of
n independent tasks should be allocated. Each task
is represented by the tuple:
where
is the WCET,
is the period and
is the interference. We assume implicit deadlines, so the deadlines are equal to periods. When we refer to
, we mean the core in which
is allocated.
The hyperperiod of the task set,
H, is defined as the least common multiple (
lcm) of the periods of all the periodic tasks:
We define A
as the number of activations that
has throughout
H:
Since the goal of this paper is to obtain a schedulability test, we will assume a synchronous task system.
To visualize the concept of
, we can observe
Figure 1, where the computation time (
) of task
is represented. As part of this computation time, the time in which the task performs read and/or write operations in memory has been differentiated (depicted with diagonal lines), as an example of access to a shared hardware resource. It is possible that when the task wants to do one of these r/w operations, the requested resource is busy. Or the other way around, that when the task is accessing this hardware resource, other tasks are blocked from accessing this resource. In any case, this time is the time considered as interference caused to other tasks in other cores
. In general, the term
is the time the task takes to access shared hardware resources. Although
is part of
, during the time the task is accessing the shared resource, other tasks on other cores will be delayed. So this interference time is defined independently of
, as will be used to represent the delay caused to other tasks. Although the parameter
is defined in [
2], it is worth describing in detail by means of an example.
Example of task set execution with interference: We have a task set composed of three tasks and three cores. Every task is allocated to a different core.
and
request access to the same shared hardware resource, as
. However,
will not be affected by the interference of the other tasks, as
.
The scheduling of this system is represented in
Figure 2.
As we can observe in
Figure 2, on one hand, task
does not receive any interference from the other tasks, so its execution time is not affected by the contention. On the other hand,
and
suffer an increase in their execution time. At the beginning, both tasks (
and
) suffer a delay of 1 unit because of the interference between them. In
t = 6,
is released again and coincides with the first job of
. In that moment, each task provokes a unit of interference in the execution of the other task. We represent the interference as unit times in unified blocks, positioned in the beginning of the execution in order to facilitate the calculation but always preserving the real magnitude of the contention.
Once the model has been defined, we will now present some parameters necessary for the development of the following sections.
From
Figure 2, it is deduced that, if contention is considered, the total utilisation of a task does not only depend on its computation time and period, but also on the interference received from other tasks.
We consider that the interference parameter refers to a single type of hardware resource. If there are several types of hardware resources, the model would have to extend to as many interference parameters as there are hardware resources. Considering parameter I for all interference types would result in an even more pessimistic model.
It should be taken into account that a task will cause interference to other task when the following conditions are met:
and are not allocated to the same core ();
and are active at the same time;
and have at least 1 unit of interference (, )
Therefore, the real utilisation (we refer to real utilisation to the utilisation that includes the interference delay) of a task,
, is:
where
and
are the utilisation due to the interference caused by other tasks to
:
where
the total interference that a task
receives from other tasks in a period
.
Hence, the utilisation of a core would be the sum of all the task utilisations:
And the utilisation of all the system would be the sum of all the core utilisations:
4. Contention Aware Utilisation Factor
In this section, we are going to provide an upper bound to
, so we will be able to provide a schedulability test. This bound will be called
. First, we need two definitions, which were also introduced in Ref. [
2]:
Definition 1 ([
2])
. A task is defined as a receiving task when it accesses shared hardware resources and it suffers an increment of its computation time due to the interference produced by other tasks allocated to other cores. Definition 2 ([
2])
. A task is defined as a broadcasting task when it accesses shared hardware resources and it provokes an increment of computation time in other tasks allocated to other cores due to contention. If , is neither broadcasting nor receiving task. If , will be a broadcasting and receiving task if there is at least one task in other core whose .
To estimate , we will calculate the maximum total interference that will depend on the maximum number of activations of a broadcasting task that fall within a receiving task.
Definition 3. Let (in what follows, the expression means that is a broadcasting task and is a receiving task) be the maximum total interference that can cause to in a period of , .
Definition 4. Let be the maximum number of activations of the broadcasting task that fall within an activation of the receiving task, .
From the two previous definitions, it is clear that:
Note that
is an upper bound because the maximum number of interferences does not have to occur in all activations. Besides,
only takes into account the interference caused by
. To calculate
, we need to consider all tasks allocated to other cores. Therefore:
From Equation (
10), it is easy to see that we have reduced the study of
to the study of
.
Worst Case Estimation of
To calculate , we need to know the exact value of . To do this, we will calculate assuming the following conditions:
We will have further considerations of the case in which .
Let us study the total interference received by
. First, the maximum number of activations of
that fall within an activation of
is calculated:
being
Equation (
11) expresses the relationship between the periods of broadcasting and receiving task modified by factor
K and minus 1 to reflect the number of activations of the broadcasting task that fall within a period of the receiving task. With the division of periods, we obtain the whole times that a task is included in the other and we apply the ceiling function in order to obtain the greatest integer number of times. Moreover, the worst scenario happens if one of the tasks is shifted one unit of time. That is why the minus one is added to the formula.
Then, two cases are possible:
Harmonic periods, with zero or small residues in the division of periods. Then, ;
Non-harmonic periods. Then, .
The above definition of is very pessimistic in the sense that it considers that the maximum interference occurs equally in all activations. This is not always true, as it depends on how the activations of the two tasks coincide.
Once the definition of
is provided, let us consider the case in which
(the period of the broadcasting task is shorter than the period of the receiving task) with the following example: let us suppose two tasks
and
allocated to a dual-core platform, with
.
Figure 3 and
Figure 4 represent the maximum interference that may be produced when tasks
and
are executed simultaneously. Note that, for the sake of simplicity, we have not depicted the tasks computation times, but only the interference.
Figure 3 considers that
is the broadcasting task and
the receiving task. In this case, the period of the broadcasting task
is greater than the period of the receiving task
, and the total received interference is equal to
. This scenario has been studied previously.
Let us consider now that
is the broadcasting task and
is the receiving task, as depicted in
Figure 4. In this case, the period of the broadcasting task is shorter than the period of the receiving task. Could we apply Equation (
11)? We are going to prove it numerically. Suppose that
and
. Applying Equation (
11), the maximum number of activations of the broadcasting task that fall within an activation of the receiving task is
. As seen in
Figure 4,
never falls three times within an activation of
. So Equation (
11) can not be applied when the period of the broadcasting task
is shorter than the period of the receiving task.
Then, we are proposing a methodology to relate the total interference between a broadcasting and a receiving task, regardless of the lengths of their periods.
Theorem 1. The ratio of interference received and broadcast by to is: Proof. At time 0, as both tasks are released, interference is always introduced. Moreover, every time any task is released,
units of interference may be introduced. In particular, it happens as many times as activations the tasks have in a hyperperiod minus one, because the first activation has already been considered. Then, the total number of times is:
Then, the maximum number of interferences that a task
provokes in
is:
Then, let us have a look at the total interference received by task
j,
. If we repeat the previous process, the maximum number of interferences that the task
provokes in
is:
It is easy to deduce that the only difference between both situations is the coefficient of interference of the task that provokes the interference.
Therefore, we can conclude that the total interference that task
j provokes to task
i is related with the total interference that task
i provokes to task
j as follows:
Then, the total number of interferences between two tasks may be calculated interchangeably, from the point of view of both tasks. Thus, from now on and for calculus purposes, this work considers that, when two tasks interfere, the task with the biggest period provokes the interference on the task with the shortest period. If it is not the case, we will apply Equation (
12).
To conclude with the example in
Figure 3 and
Figure 4, as we deduced that
, from Equation (
12):
and this result coincides with that depicted in
Figure 4.
5. Schedulability Analysis
Once an upper value is given for
, we can estimate an upper bound of the utilisation of a receiving task
,
, taking into account interferences due to contention.
Note that, if is not a receiving task, then .
This upper bound is always greater or equal to the real utilisation of the receiving task
i,
. This results from the fact that:
Therefore, the upper bound of the utilisation of each core is defined as the sum of the upper bounds of the utilisations of the tasks that belong to that core, . Similarly, the upper bound of the system utilisation is defined as the sum of the upper bound of the utilisations of all cores (or all tasks), . Then, the system will be schedulable if:
The upper bound of the utilisation of each task is less or equal to
for fixed priorities and to one for dynamic priorities, Ref. [
27]:
The upper bound of the utilisation of each core
is less or equal to one:
As a consequence of (i) and (ii), the upper bound of the system utilisation is less or equal to the number of cores [
28],
m:
As utilisations are overestimated, we can conclude that if previous conditions are accomplished, the system will be schedulable.
Note that this is a necessary but not a sufficient condition.
Example
Let us show the procedure to be followed with an example: let us consider the following task set,
with
,
, and
, allocated to a system with three cores.
is allocated to core
,
, in
, and
, in
. Once the tasks are allocated to cores, the Earliest Deadline First (EDF) algorithm [
27] schedules tasks in each core. The actual execution of the task set is shown in
Figure 5.
We are applying the method described in
Section 4 to calculate the upper bound of the system utilisation. Then we will compare it with the actual system utilisation, shown in
Figure 5.
Step 1: Define and . The task with the greatest period is the broadcasting task , and the receiving task is the task with the shortest period, . In this example, and . As , is neither receiving nor broadcasting.
Step 2: Calculate the maximum number of times that task
falls within an activation of
,
. As depicted in
Figure 6,
falls twice within the second activation of
. It may be calculated applying Equation (
11):
Step 3: To calculate the maximum total interference that
can cause to
. First, we calculate the number of activations of task
in a hyperperiod,
. Following Equation (4):
The maximum total interference is only received in the second activation of
, as shown in
Figure 6, but we consider the worst case, in which all activations receive the maximum of interference from
.
Step 4: Calculate the total interference received by the other receiving tasks,
, as shown in
Figure 7. As the period of
is greater than the period of
, we apply the Equation (
12):
Step 5: Calculate the upper bound of the utilisation of each task.
From
Figure 5, the actual utilisations of the tasks are:
It is deduced that the actual utilisations of the tasks are equal or less than the upper bounds estimated in this paper. Therefore, the system is schedulable.
6. Task Allocation Algorithms
In the following, we first briefly discuss several existing allocation techniques. We then propose a new mapping strategy. Allocation is a problem that appears with multicore systems. It comes to answer the question: which processor will execute each task?
The main disadvantage of the partitioning approach to multicore scheduling is that the task allocation problem is analogous to the bin packing problem and is known to be NP-Hard [
29].
6.1. Overview of Existing Heuristic Bin-Packing Algorithms
Refs. [
5,
6] detailed some of the most well-known bin-packing heuristics:
First Fit (FF). Each item is always packed into the first bin where it fits. A new bin is open if any item does not fit in the open bin;
Worst Fit (WF). Each item is placed into the open bin with the largest amount of room remaining. If it does not fit any bins, a new bin is opened.
One problem with the previous mentioned heuristics is that, if the items are not ordered properly, large items could not be packed efficiently. Therefore, items should be ordered effectively to avoid these problems. One way is to pack the items in order of decreasing weights or utilizations. This way, WF becomes WFDU (Worst Fit Decreasing utilisation), and FF becomes FFDU (First Fit Decreasing utilisation).
6.2. Overview of Aceituno’s Method
In Ref. [
2], a task model that takes into account interference delays due to contention of shared hardware resources is proposed. Moreover, three tasks-to-cores allocation methods are also presented.
In that paper, a discrepancy-based method is presented, which is defined as the difference between the maximum and minimum utilisation of a multicore system, . One of the algorithms is focused on reducing the discrepancy, UDmin, and the other on maximizing it, UDmax. On the one hand, UDmin behaves as the heuristic WFDU, as both balance the load among cores. They present excellent schedulability ratios (98.1% and 100%, respectively) but reach high rates of increment of utilisation due to the system interference (2.3%). On the other hand, UDmax behaves as FFDU, as it unbalances the load among cores. In contrast to UDmin/WFDU, it reaches a low rate of increment of utilisation but their rates of schedulability are very low (around 43%). The Wmin allocator accounts for the possible interference produced for each task and it provides a low increment of utilisation (0.266%) and high schedulability ratio (up to 89%), with respect to previous algorithms.
6.3. Proposed Allocator Considering the Interference: Imin
Previous bin-packing and discrepancy-based algorithms do not take into account the interference produced when two or more tasks allocated to different cores coincide in execution. However, Wmin does take interference into account, although not exactly, but based on a binary matrix (W) of possible interference between tasks on other cores. This allocator tries to minimise the number of 1 s in the matrix, which represent a possible interference.
In this subsection, a new allocation algorithm to minimise the interference is proposed—Imin. As it has been analysed in previous sections,
is the upper bound of the utilisation taking into account the maximum possible interference. The allocator Imin obtains the task allocation to cores that minimises this upper bound as much as possible. This allocator is based on integer linear programming to obtain an optimal solution. For that, we define a set of parameters and variables shown in
Table 1.
As has been previously stated, the objective is:
s.t:
Constraint (
15) assures that a task is allocated to one and only one core. Equation (16) sets the value of the extra utilisation of a task provoked by the interference in its upper bound. Equation (17) sets the value of the total utilisation taking account the interference in its upper bound, which means in its maximum value, as it has been explained and established in this paper. The total utilisation per core is calculated as the sum of the utilisations of the tasks that belong to that core (Equation (18)) and its value should be less or equal to 1 (Equation (19)). Equations (20) and (21) represent the decision variable domains.
8. Conclusions
This paper has proposed a schedulability analysis for task models that consider the delay produced by the contention of hardware shared resources in a hard real-time multicore system. Then, an allocation algorithm that minimises this contention, Imin, has been proposed and evaluated by comparison with other existing allocators.
After the results of the experimental evaluation, we can conclude that the allocation algorithm proposed in this paper, Imin, has the best rates in terms of percentages of schedulability and increased utilisation. In addition, Imin presents good results in terms of solution times. So, it is reasonable to affirm that Imin may be an eligible option as an allocator for the implementation of a multicore system, especially in those systems where the priority is to maximize the scheduling rate or to minimise the contention produced in hardware shared resources.
We plan to further investigate how to improve Imin in order to achieve a lower utilisation rate without decreasing the schedulability. We also plan to find a less pessimistic upper bound for the utilisation and extend the model to the constrained deadlines.