Next Article in Journal
Effect of Situational and Individual Factors on Training Load and Game Performance in Liga Femenina 2 Basketball Female Players
Previous Article in Journal
An Observer-Based Current Sensor-Less Control Scheme for Grid-Following Converters
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Hardware Security-Monitoring Architecture Based on Data Integrity and Control Flow Integrity for Embedded Systems

School of Electronic and Information Engineering, Beihang University, Beijing 100191, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2022, 12(15), 7750; https://doi.org/10.3390/app12157750
Submission received: 2 July 2022 / Revised: 26 July 2022 / Accepted: 26 July 2022 / Published: 1 August 2022

Abstract

:
As technology evolves, embedded systems access more networks and devices, which means more security threats. Existing security-monitoring methods with a single parameter (data or control flow) are not effective in detecting attackers tampering with the data or control flow of an embedded system. However, simply overlaying multiple security methods will result in excessive performance overhead for embedded systems. In this paper, we propose a novel hardware security-monitoring architecture that extracts DI (data integrity) digests and CFI (control flow integrity) tags to generate reference information when the program is offline. To monitor the indirect jumping behavior, this paper maps the legal target addresses into the bitmap, thus saving the search time. When the program is loaded, the reference information and the bitmap are safely loaded into the on-chip memory. The hardware monitoring module designed in this paper will check the DI summary and CFI tags in real time while executing the program. The architecture proposed in this paper has been implemented on the Xilinx Virtex 5 FPGA platform. Experimental results show that, compared with existing protection methods, the proposed approach in this paper can effectively detect multiple tampering-type attacks on the data and control flow of the embedded system, with a performance overhead of about 6%.

1. Introduction

With the development of embedded systems toward intelligence and networking, more and more embedded devices have recently been used in many security-sensitive fields, such as modern industry, transportation, aerospace, and military fields [1,2,3]. However, these embedded systems access more networks and devices, which means more potential security threats, such as code injection attacks (CIAs) [4], code reuse attacks (CRAs) [5], hardware Trojan (HT) [6], buffer overflow attacks [7], and side-channel attacks [8]. Generally, these threats mainly arise from untrusted IPs, vulnerable hardware and software, and insecure communications with other devices [1]. By tampering with key data (including opcodes, global variables, dynamic data, etc.) or hijacking the control flow of the program, attackers can use these threats to achieve their goals, including system functions changing, information leakage, and worse system damage. Therefore, to cope with these security challenges, the embedded system designer needs to propose a novel embedded system protection method which can resist multiple security attacks with relatively low resource overhead and performance overhead [2,3].
The key data and control flow of embedded systems during program execution are the main targets of attackers. Therefore, the data integrity (DI) and control flow integrity (CFI) of embedded systems must be guaranteed. Malicious attacks can be divided into hardware-level attacks and software-level attacks based on their sources [1]. Hardware Trojan attacks are one of the typical representatives of hardware-level attacks [6]. An HT hidden in the internal logic of the System-on-Chip (SoC) is pre-designed to be activated through certain parameters to perform specific functions, such as destroying key data and hijacking the control flow. Software-level attacks mainly use security vulnerabilities to damage the DI or CFI of embedded systems by CIAs [4] or CRAs [5], thereby destroying instruction execution or hijacking the control flow.
To defend against these attacks, various software- or hardware-based protection technologies have been proposed to monitor the DI and the CFI of embedded systems when the program is executed. Compared with the software-based methods, hardware-assisted security-monitoring methods use fewer processor resources and have faster processing speeds [9]. Therefore, our research is based on hardware-assisted security-monitoring technology. In 2018, members of our lab proposed a hardware security method for monitoring code integrity, and the performance overhead is less than 3.45% according to the selected benchmarks [10]. Later, they proposed a hardware-enhanced protection method that protects the integrity and confidentiality of dynamic data at runtime with a performance overhead of less than 2.27% [11]. In the past two years, other members of our lab have made several optimizations to the hardware-assisted security-monitoring unit to reduce the performance overhead to about 2% while protecting the data integrity of embedded systems [1,2,3]. However, after testing, these approaches have struggled to detect attacks against the control flow integrity. Therefore, we added jump address monitoring to the original DI protection mechanism, which could protect the CFI of test programs to some extent, but the performance loss increases to about 9% [12].
This paper takes the data integrity and control flow integrity of the execution program as the main protection objects, focusing on the analysis of the hardware Trojan, code reuse attacks, and buffer overflow attacks on the tampering behavior of key data and the operation of hijacking the control flow. Furthermore, the hybrid information integrity-monitoring architecture (HIIMA) is proposed to protect embedded systems’ key data from being tampered with key data or their control flow hijacked by an attacker when executing the program. The basic block (BB) divided according to the jump instruction is the minimum monitoring granularity of HIIMA. At the compiling time, the binary code of BBs is calculated by a hash function to obtain an integrity digest, and the possible execution order of BBs is mapped into a control flow graph (CFG) and a bitmap. Each BB in the CFG is assigned a unique label. At runtime, the hardware-monitoring module (HMM), which is the core part of the overall architecture, will monitor whether the digest and the Hamming distance (HD) between labels are changed. For indirect jump instructions, HMM will check whether the jump operation is legal by detecting the value of the point on the bitmap corresponding to the target address of the instruction.
The summary of our contributions is as follows:
  • We propose a hardware-based data integrity (DI) and control flow integrity (CFI) security-monitoring architecture. The architecture extracts the digest representing the DI information, and the label and bitmap representing the CFI information when the program is offline, and implements the integrity check of a variety of parameters through the hardware-monitoring module when the program is running, ensuring that malicious attacks on the data and control flow will be detected.
  • The proposed CFI protection mechanism that takes into account indirect jumps can effectively resist attacks against the control flow of embedded systems, such as CRAs, with lower resource overhead. For direct-jump-type instructions, the mechanism connects the BBs containing these instructions into CFG, and assigns a label to each BB according to the specific HD. For indirect-jump-type instructions, this mechanism maps the legal target address into a bitmap, which reduces the required storage space and speeds up the search.
  • We implemented the proposed architecture in hardware and evaluated the security capabilities, hardware implementation overhead, and performance overhead of the architecture based on the selected benchmarks. Experimental results show that the architecture can effectively resist tampering attacks on the data and control flow of embedded systems, and the performance overhead is about 6%.
The rest of this paper is organized as follows. Section 2 introduces the related work on the DI protection methods and CFI protection methods of embedded systems based on hardware monitoring. Section 3 introduces the preliminary knowledge about our proposed architecture, including BB extraction, CRAs, and the threat model. In Section 4, the proposed method is analyzed in detail, including the overall structure, data integrity protection mechanism, control flow integrity protection mechanism, and the HMM. Section 5 evaluates the security capability, resource overhead, and performance overhead of the HIIMA. Section 6 concludes the work completed in this paper and the results achieved.

2. Related Work

In recent decades, considerable integrity protection methods based on hardware monitoring have been proposed to ensure the security of embedded systems when executing programs. In general, according to the different protection objects, these methods can be divided into two groups: for data integrity (DI) and for control flow integrity (CFI).
DI protection methods based on hardware monitoring usually use an integrity check algorithm, such as a hash function, to map the protected data block into a fixed-size binary number, called a digest [13]. The hardware-based security monitor calculates the real-time digest and compares it with the offline digest. If the two digests are inconsistent, it is considered that the data block has been tampered with by attackers, and its DI has been destroyed [11]. Yan et al. proposed a method to protect the integrity of dynamic data by using split sequence numbers [14]. Its tree structure was used to protect dynamic data from replay attacks, but multiple accesses to the tree nodes cause a large time slot delay. In [15], an architectural support for protecting the DI of an application was proposed, which could protect application data from physical attacks. A hardware-assisted detection method was proposed to protect DI [16]. By enhancing the memory hierarchy, a security tag was assigned to each data to indicate the DI. In [17], a cost-effective DI tag-generation design was proposed. This method provided flexibility for various security levels, but the generated data tag had a certain correlation with the data itself, resulting in a higher rate of tag conflicts. In [18], an improved method for generating DI digests for embedded processor memory was proposed. Through bit flipping and nonlinear Galois Field Multiplication (GFM) operations, the input data added randomness to protect the design from integrity attacks of any selected value. Researchers proposed a hardware-based DI verification mechanism for embedded systems [19]. They attached an extended memory to the on-chip data cache to store integrity tags. However, this method increased the amount of chip area. Moreover, this hardware structure needed to modify the cache in the CPU core, which was not conducive to the transplantation of embedded systems. The existing DI protection methods can effectively detect the behavior of tampering with the program data, but cannot detect the behavior of hijacking the control flow in time.
Since CRAs can tamper with the control flow without injecting malicious code, the traditional DI protection methods, such as Write xor eXecute (W⊕X), Address Space Layout Randomization (ASLR), and stack canaries, have failed [20,21]. Attackers hijack the control flow of embedded systems mainly by modifying the return address in the stack, such as return-oriented programming (ROP) attacks [22], or directly tampering with the jump address, such as jump-oriented programming (JOP) attacks [23]. In order to resist these attacks, scholars have proposed many related solutions, such as the buffer-based shadow call stack [24,25], and the hardware-assisted flow integrity extension method [26]. The label-based CFI protection method was to monitor the control flow integrity of the program execution by checking the label implanted in the start code of each BB in advance [27,28,29,30]. These methods added additional code to the Instruction Set Architecture (ISA), not only needed to modify the hardware circuit in use, but also needed to modify the entire compilation link tool. Zhang et al. proposed a new hardware-assisted CFI protection method which used physical unclonable functions (PUF) to encrypt the return address of the function and the target address of the jump instruction to protect the CFI of embedded systems [31].
These protection methods mainly monitor a single parameter (data or control flow), and cannot protect both the DI and the CFI of embedded systems. Therefore, Arun et al. not only monitored the DI label, but also assigned a CFI label to each BB [32]. When the program was running, the hardware-monitoring module would calculate the DI label and the CFI label of the currently executed BB. If the DI label or the CFI label was not the expected value, it was considered that the DI or CFI had been destroyed. However, this method did not consider indirect jumps. We previously tried to combine the CFI protection mechanism with the DI protection mechanism to not only monitor any tampering with the key data, but also to discover the jump address modified by the attacker [12]. However, this approach led to high system-performance overhead.
Different from existing security studies that only protect data or the control flow of embedded systems, we propose a new architecture that can monitor the data integrity and control flow integrity of the execution program in real time. The architecture extracts the basic blocks of the program at the compiling time and calculates the static data digest. At the same time, the architecture allocates CFI labels to BBs containing direct jump instructions, and builds bitmaps of legal target addresses based on the type of indirect jump instructions. The HMM will calculate the dynamic data digest and compare it with the static digest to complete data integrity verification. For CFI verification, the HMM will calculate the Hamming distance (HD) between the CFI labels of BBs and compare it with the initial design value. In addition, for indirect jump instructions, the HMM will search for the value of the corresponding point of the target address in the bitmap to determine whether the jump is legal.

3. Preliminaries

In this section, the extraction of BBs is elaborated on first, followed by two specific implementation methods of CRAs: ROP attacks and JOP attacks. In the end, the threat model of this paper is described in detail.

3.1. Basic Block Extraction

Considering the security and performance overhead, we choose basic blocks (BBs) as the granularity of the proposed integrity-monitoring architecture. The BB is defined as a fine-grained code fragment which only contains instructions that will be executed sequentially [12]. To avoid modifying the ISA, the jump instructions are used to divide BBs. In general, the start address of each BB is the target address of the previous jump instruction, and the end address is the address of the next jump instruction. Without a memory management unit, the BB uniquely corresponds to a start address, which means that the corresponding BB can be found by searching for the start address.
Different instruction sets have different types of jump instructions. To obtain the internal details of the embedded processor core, we select the open-source embedded processor OpenRISC1200 (OR1200) for research, which is a 32-bit scalar RISC processor with a Harvard micro architecture [11]. In OR1200 ISA, there is a clock cycle gap from the execution of a jump instruction to the code running within the jump target address, which is called the delay slot. Therefore, the actual start address and end address of BBs need to consider the delay slot. The jump instruction types of OR1200 ISA are shown in Table 1. The BB extraction in OR1200 ISA is shown in Figure 1.
In Table 1, there are mainly three types of jump instructions in the OR1200 processor instruction set, namely, direct jump (l.j and l.jal), branch jump (l.bf and l.bnf), and indirect jump (l.jr and l.jalr). There is one target address for direct jump, which is the effective address (EA). There are two target addresses for branch jumps. According to the flag being 1 or 0, the target address may be an EA, or it may be the instruction address immediately following the end address of the current BB, which is the end address plus 0x4. For example, in Figure 1, the jump instruction contained in BB1 is l.bf 2020. When the flag is 1, the target address of l.bf is EA, which is 0x2020. When the flag is 0, the target address of l.bf is the end address of BB1 plus 0x4, that is, 0x202c + 0x4 = 0x2030. Due to the start address of BB being the target address of the previous jump instruction, the start address of BB2 is 0x2020, and the start address of BB3 is 0x2030. The jump target address of an indirect jump instruction is related to the register value used in the instruction. The interrupt return instruction (l.rfe) is a special jump instruction. Since the proposed method does not involve interruption, l.rfe is not considered.

3.2. Code Reuse Attacks

Code Reuse Attacks (CRAs) use the original code in the program to build small fragments of the existing code, which are called gadgets. By ingeniously linking the gadgets, attackers can use the buffer overflow vulnerability to overwrite the return address or jump address on the stack with the start address of the gadget chain, hijack the control flow of the program and ultimately obtain system privileges. In what follows, we will discuss the implementation of ROP attacks and JOP attacks in the OR1200 embedded processor.

3.2.1. ROP Attacks

When executing the calling function (l.jal or l.jalr), the OR1200 processor will push the return address into the general register r9 and then execute the instruction at the first address of the target function. When it needs to return, the r9 pops up the return address. Then, the CPU executes the instruction l.jr r9 and executes the instruction at the return address. However, when the attacker uses the buffer overflow vulnerability to overwrite the value of r9, the OR1200 processor cannot execute the program normally. After receiving an ROP attack, the value stored in the r9 will be the return address designed by the attackers. These return addresses all point to gadgets constructed by the attacker using the code of the program itself. In the end, the attacker can hijack the control flow of embedded systems and destroy the program execution.

3.2.2. JOP Attacks

The principle of JOP attacks is similar to that of ROP. The difference is that JOP attacks mainly use indirect jump instructions to hijack the control flow and execute malicious actions. During program execution, the attacker can change the values in registers by a buffer overflow vulnerability. When the OR1200 processor executes the indirect jump instruction or indirect function call instruction, the target address taken from the register is the address that the attacker constructed.

3.3. Threat Model

We assume that the source code is compiled and linked in a secure software environment. The binary file generated in this way has not been tampered with by any attackers. This assumption is acceptable because the programmer can ensure that the binary code is consistent with the expected design when the software development is completed. We define the internal area of the OR1200 embedded processor as the trusted zone, and all areas outside the chip, including all interfaces and wires connected to the chip, are defined as the untrusted zone. In other words, any malicious attackers cannot tamper with the pipes, registers, Cache, and any signals inside the embedded processor. Since the hardware-monitoring module we designed is on-chip, it is also assumed to be immune to various attacks. Figure 2 illustrates the threat model considered in this paper.
In Figure 2, attackers can attack the embedded system in various forms, including software attacks, physical attacks, and side channel attacks. Physical attacks can be hardware Trojan. Software attacks can be CIAs, CRAs, and buffer overflow attacks. The side-channel attack can steal the key information of an external security module and crack its encryption algorithm. Through the above attacks, malicious attackers can destroy data integrity, hijack the control flow, and ultimately lead to invalid program execution or a leakage of key information. Even the external security module is considered untrustworthy due to security risks such as information leakage.

4. Proposed Method

In this section, we will elaborate on the proposed hybrid information integrity-monitoring architecture (HIIMA), including the overall structure, the DI protection mechanism, the CFI protection mechanism, and the Hardware-Monitoring Module (HMM).

4.1. Hybrid Information Integrity Monitoring Architecture

By monitoring DI and CFI, the HIIMA proposed in this paper combines the DI protection mechanism with the CFI protection mechanism, and can protect the embedded systems’ key data from being tampered with or their control flow hijacked by any attackers when executing programs. The OR1200 processor is selected as the CPU Core, which has an in-order five-stage pipeline consisting of the instruction fetch (IF) stage, the instruction decode (ID) stage, the execution (EX) stage, the memory access (MA) stage, and the write-back (WB) stage. To protect the CPU Core, the HMM will monitor OR1200 pipeline signals, especially the signals of the program counter (PC) and the instruction register (IR).
In Figure 3, when the program is offline, through the BB extraction process shown in Figure 1, the detailed information about each BB is generated, including the start address, the end address, the binary code of all internal instructions, and the only jump instruction. By processing this information, we obtain the 64-bit hybrid integrity reference information (HIREF) of each BB, the function entry address bitmap (FEAddress Bitmap), and the BB start address bitmap (BBAddress Bitmap). The HIREF includes the 16-bit start address of the BB (BBstart), the 16-bit integrity digest calculated by an LHash algorithm (LHashStatic), the 16-bit control flow integrity label (CFIlabel), and the 16-bit special Hamming distance (HDspecial).
If all the values of PC are included in the HIREF, it will cause a large storage overhead. Therefore, HIIMA only monitors the lower 16 bits of PC, which means BBstart has only 16 bits. Since the actual useful value of the start address is PC [31:2], the address extracted by the HMM at runtime is PC [17:2]. This will provide programmers with up to 256 KB of addressable space. For applications with a larger amount of code, it is necessary to increase the width of BBstart.
The binary code of all instructions in the BB is processed through the hash function (LHash) [33] to obtain an integrity digest (LHashStatic). Once an attacker tampered with key data, such as instructions or static variables, the binary code of the BB will be changed. LHash guarantees that in this case, the digest must also change. Therefore, HMM can determine whether the data has been tampered with by checking whether the integrity digest has changed.
Figure 1 and Table 1 illustrate that the target address of the direct jump instruction can be put forward through static analysis. These target addresses correspond to the start address of BBs. Linking these BBs according to the target address can construct the control flow graph (CFG). After algorithm processing, each BB in the CFG can be assigned a CFI label (CFIlabel). By calculating the Hamming distance (HD) between each CFIlabel, the execution order of BBs can be known. In addition, the Hamming distance information, in some special cases (HDspecial), should also be recorded.
Since the target address of the indirect jump instruction cannot be obtained through static analysis, these instructions are protected by limiting the target address range. In this paper, we restrict the target address of an indirect jump instruction to only be the start address of a BB, and the target address of an indirect function call instruction can only be the entry address of a function. Through the bitmap algorithm, we map the legal target address into the FEAddress Bitmap and the BBAddress Bitmap.
When the binary file of the source program is loaded onto the Flash, all HIREF will be stored in the Integrity Parameter Memory of the HMM in ascending order in a secure loading manner. In addition, the FEAddress Bitmap and BBAddress Bitmap will also be stored in the Bitmap Memory in a secure loading manner. The HMM will monitor the abnormal conditions during program execution by checking the values of IR and PC in real time. When the program is running, HMM extracts the binary instruction and PC address from the ID stage, and compares the start address of the currently executed BB with BBstart. Then, the HMM compares the dynamically calculated LHash value (LHashDynamic) with LHashStatic. At the same time, for the currently executed instruction is a direct-jump-type instruction, HMM will calculate the HD between the CFIlabel of the BB where the instruction is located and the CFIlabel of the BB to be executed, and judge whether the HD is legal. If the currently executed instruction is an indirect-jump-type instruction, HMM will check whether the value of the point on the bitmap corresponding to the target address is 1 and judge whether the jump is legal. If all checks are successful, it is considered that the current DI and CFI have not changed. If all checks are not successful, the DI and/or CFI are considered to have been destroyed.

4.2. Data Integrity Protection Mechanism

The DI protection mechanism refers to calculating the dynamic integrity digest (LHashDynamic) of each BB when the program is running, and then comparing it with the static digest (LHashStatic) calculated when the program is offline. If the results are inconsistent, the instructions contained in the BB have been tampered with. The hash function can be used to calculate the integrity digest of each BB. However, hardware implementation of the traditional hash function will take up a substantial amount of resources. LHash can meet the DI calculation of embedded systems at a low cost and limited hardware resources [33]. Considering security and area consumption, the LHash calculation module uses a 128-bit internal permutation structure, and the output is a 16-bit binary number as the DI digest of each BB. The size of the output digest can also be adjusted if higher security is desired, but the consumption of on-chip hardware resources will also increase.

4.3. Control Flow Integrity Protection Mechanism

We propose a CFI protection mechanism that considers indirect jumps for OR1200. For BBs containing direct-jump-type instructions, they are linked according to the target address of the jump instruction in them. By this way, the CFG is obtained. A unique binary number is assigned to the BB, represented by each node in the CFG as its CFI label. The HD between the label of the parent node and the child node is fixed, and the value of HD is set by the user in advance.
HD is a number used to denote the difference between two binary strings. Equation (1) explains how to calculate the HD between two binary numbers of the same size. In Equation (1), r and s are n bits binary numbers, and ⊕ denotes an exclusive OR. For example, the HD of 0110 and 1100 is 2.
HD ( r , s ) = i = 1 n 1 r [ i ] s [ i ]
For indirect-jump-type instructions, all the address spaces of the program will be traversed, and the possible target addresses of these instructions will be marked. Moreover, these legal target addresses will be mapped into bitmaps.

4.3.1. Control Flow Graph

The CFG represents all the paths traversed during the execution of a program. It uses the form of graph to show the possible flow of execution of all BBs when the program is executed. Each node in the CFG represents a BB, and the directional arcs represent the jump of BBs in the control flow. In Table 1 and Figure 1, the target addresses of direct jump instructions, branch jump instructions, and direct function call instructions can be obtained when extracting BBs. Among them, there is only one target address for the direct jump instruction and the direct function jump instruction, and there are two target addresses for the branch jump instruction. For example, the target address of l.jal 159b4 is 0x159b4, and the target addresses of l.bf 2020 are 0x2020 and 0x2030.
To evaluate the HIIMA, we select real-life embedded applications of various scales from MiBench [1,18,28] as the test program. Table 2 shows the characteristics of the selected benchmarks. In Table 2, only a few BBs contain jump instructions that are indirect jump instructions or indirect function call instructions, indicating that most of the BBs are in the CFG. However, due to the lack of the target address of the indirect jump address, the CFG is not completely connected.
In addition, the jump addresses of different BBs may be repeated. If the BBs with repeated jump targets are included when constructing the CFG, the entire CFG will be very large and complex, which is not conducive to assigning a CFIlabel to each BB. Therefore, when constructing the CFG for the BB whose jump target address is the second occurrence, we no longer link it with other BBs. After the above operations, the CFG will be divided into subgraphs. Each subgraph does not contain indirect jump information, and there are no repeated jump targets in subgraphs.
Figure 4 shows an example of extracting CFI reference information, including CFIlabel and HDspecial. The first step is to extract the CFG. The nodes in CFG include three parameters. The value of Start_Addr is the start address of each BB. For direct jump instructions and direct function call instructions, the value of the Target_Addr_Left is their target address. For branch instructions, the value of Target_Addr_Left is the target address after the jump condition is met, and the value of Target_Addr_Right is the target address when the jump condition is not met. According to these parameters, BBs can be linked to form a CFG. To avoid the influence of repeated jumps and indirect jumps, the CFG will be divided into subgraphs. Equation (2) describes how to calculate the size of the CFIlabel [31]. NumChild is the maximum number of children nodes for a BB, and NumParent is the maximum number of parent nodes. [log2N] is the integer part of log2N, where N is the number of BBs in the subgraph. In Figure 4, the subgraph contains five BBs and three levels. The first level has two BBs (BB2 and BB3), the second level has one BB (BB4), and the third level has one BB (BB5). Its NumChild is 2 and NumParent is 1. Thus, the SIZElabel is max(2, 1, [log25]) + 1, which is 3. The HD and the code of root node (CodeRoot) are user-defined parameters that affect the value of the label. It should be noted that HD ranges from 1 to [SIZElabel/2], and the size of CodeRoot is SIZElabel bits.
Size label = max ( Num Child , Num Parent , [ log 2 N ] ) + 1
Through the label allocation algorithm, each BB in the subgraph corresponds to a unique binary code as its CFIlabel. In addition, for special cases, the corresponding HD will also be recorded. Algorithm 1 introduces the detailed steps of the label allocation algorithm. For display convenience, we assume that the CodeRoot is 000 and the HD is 1. Therefore, CFIlabel(1) is 000. In the first level, since BB2 is the left child node of BB1, c can be 001, 010, or 100. We choose 001 as the CFIlabel of BB2. Since BB3 is the right child node of BB1, at this time, the eligible c is 010 or 100. We choose 100 as CFIlabel(3). In the second level, since BB4 is the left child node of BB3, c can be 101 or 110. If CFIlabel(4) is 101, the HD between CFIlabel(4) and CFIlabel(2) will be 1. However, according to the subgraph, it is illegal to jump from BB2 to BB4. Thus, CFIlabel(4) must be 110. In the third level, c can only be 111, which is assigned to CFIlabel(5). Some special cases must be considered:
  • A BB loops back to itself, such as BB2 in Figure 4. In this case, the HD between codes assigned will always be equal to zero, not the value selected by the user. This will cause false alarms.
  • The jump address destination is duplicated. The jump destination addresses of BB1 and BB2 are both BB3. In this case, the HD between the CFIlabel of BB2 and BB3 will be 2, not the value selected by the user. However, in fact, it is possible for the program to execute BB2 first and then BB3.
Therefore, the special HD (HDspecial) generated in these cases will also be recorded and used together with the CFIlabel as the CFI reference information of each BB.
Algorithm 1 Assigning Labels CFIlabels to BBs
Input: Subgraph, HD and CodeRoot
Output: CFIlabel and HDspecial
1:    CFIlabel(i) ← The label of the BBi in the Subgraph.
2:    HDspecial(i)(j) ← The jth special HD of the BBi in the Subgraph.
3:    CFIlabel(1) = CodeRoot; // CodeRoot has SIZElabel bits.
4:    for Subgraph still has BBs at the current level do          // It has d levels.
5:              if      BBi is the left child node of BBx then        // BBi is in the bth level
6:                        Generate a SIZElabel -bit binary number c,
                       the Hamming distance between c and CFIlabel(x) is HD and
                       c !⊆{CFIlabel(y)| 1≤ y ≤i};                       //c is not among the existing labels
7:                       CFIlabel(i) = c;
8:              else   Jump to Step 6;
9:              end if
10:            Output CFIlabel(i);
11:    end for
12:    Retrieve the BBs of the next level in Subgraph and jump to Step 4;
13:    if     BBi has M special cases
14:            for 1 ≤ j ≤ M do
15:                  Calculate the HD of the jth cases and obtain HDspecial(i)(j);
16:                  Output HDspecial(i)(j);
17:            end for
18:    end if

4.3.2. Indirect Jump

Due to the fact that indirect jump address cannot be accurately valued through static analysis, the protection mechanism by using CFG cannot monitor the CFI of the BBs containing indirect jump instructions or indirect function call instructions.
We propose a CFI protection mechanism with lower time complexity for indirect jumps. When BBs are extracted, we can obtain the addresses of all instructions of the benchmark, the entry addresses of all functions, and the start addresses of all BBs. For the OR1200 embedded system, there are only three special indirect jump situations:
  • l.jalr rB. The target address of an indirect function call instruction must be the entry address of a function.
  • l.jr r9. This instruction functions as a return instruction. Its target address can only be the start address of a BB.
  • l.jr rB. The target address of an indirect jump instruction must also be the start address of a BB.
After the above analysis, the legal address range of all indirect jump instructions can be determined. When performing an indirect jump execution, the HMM can detect whether the target address is within the legal address range. This situation can be abstracted as a mathematical problem, that is, knowing N addresses arranged in sequence, of which M addresses are marked, and quickly judging whether a certain address A is within the marked address range. The bitmap algorithm can be used to solve this mathematical problem. It uses a bit to mark the value corresponding to an element. If this element exists in a certain space, its value is set to 1; otherwise, it is set to 0.
b = B   /   0 x 4 ,   B   is   in   the   range   of   M   marked   addresses i = [ b   /   64 ] ,   i = 0 ,   1 ,   2 ,     ,   1 + [ 100 , 000 / 64 ] j = b   %   64 ,   j = 0 ,   1 ,   2 ,     ,   63
For example, N = 100,000. We assume that one bit in the bitmap corresponds to 8 bytes. Create an array a[] of length 1+[100,000/64]. Thus, a[0] occupies 64 bits in the memory, corresponding to the decimal number 0.63. Similarly, a[2] corresponds to the decimal number 64.127, and so on. Create an array b[] with a length of 64, representing a 64-bit decimal number. Traverse 100,000 addresses. If a certain address B is marked, the value of its corresponding array a[i]b[j] is set to 1. The calculation method is shown in Equation (3). This paper assumes that the instructions of the program start from address 0x0 and are stored in the memory in ascending order. Since the OR1200 processor is a 32-bit processor, the value of address B divided by 0x4 represents the b-th address of the program. For example, if B = 0x2000, b = 0x2000/0x4 = 0x800 = 2048, i = [2048/64] = 32, j = [2048 % 64] = 0. Thus, the value of the array a[32]b[0] corresponding to address 0x2000 is 1. The array a[i]b[j] is the point on the bitmap. For the marked address only, the value of its corresponding point on the bitmap is 1, and the values of other points on the bitmap are all 0. For address A, the corresponding values of i and j can be calculated according to Equation (3). By judging whether the value of a[i]b[j] is 1, the HMM can quickly determine whether the address A is within the marked address range. Because S is determined, which is the total number of addresses for each benchmark, the time complexity of the bitmap algorithm is O(1). In addition, the bitmap size is the size of the benchmark divided by 64.
Figure 5 describes the process of extracting the function entry address bitmap (FEAddress Bitmap) and the BB start address bitmap (BBAddress Bitmap). For the FEAddress Bitmap, if there is a function start address in every 8 bytes, the corresponding bit in the bitmap will be set to 1; otherwise, it will be set to 0. When executing an indirect jump instruction, the HMM only needs to calculate the i and j of the target address through Equation (3) and check whether the value of the corresponding a[i]b[j] is 1, saving an amount of search time and storage resources.

4.4. Hardware Monitoring Module

The HMM checks the IR and PC in the pipeline stage to detect for abnormal behavior of the embedded processor. The IR of the ID stage is more stable and appears earlier than other stages. Therefore, it is appropriate to select the signals in the ID stage as the monitoring signals. To monitor DI and CFI when the program is executed, the HMM mainly implements four functions: BB check, DI check, CFI check, and bitmap search.

4.4.1. Basic Block Check

When the program is running, the HMM starts to work. By analyzing the binary code of each instruction in the ID stage, the program is also divided into multiple BBs. Each BB starts from the instruction in the target address after the last jump and ends with the delay slot instruction after the next jump instruction. The HMM will check whether the effective value (PC [17:2]) of the starting address of the currently executed BB has corresponding pre-stored data (BBstart) in the Integrity Parameter Memory. If not, this means that the data has been maliciously attacked, affecting the normal extraction of BBs.

4.4.2. Data Integrity Check

During program execution, when the HMM detects a new BB and starts to execute, the LHash calculation unit and DI check unit in the HMM will be activated. The instructions that are executed sequentially in the BB and the corresponding storage address are continuously inputted into the hardware-implemented LHash calculation unit until the instruction (branch or jump instruction) that marks the end of the BB is detected. At the same time, from the arrival of the first instruction of the BB, the starting address is sent to the Integrity Parameter Memory to find the pre-stored static integrity digest (LHashStatic). The LHash calculation unit maps the binary codes of all instructions of the currently executed BB into 96-bit binary numbers. Through the same selection method as when offline, the 16-bit dynamic data integrity digest (LHashDynamic) is obtained. Then, it will be compared with the LHashStatic. If there are differences, the DI has been destroyed by attackers.

4.4.3. Control Flow Integrity Check

HMM can identify the type of jump instruction through its opcode. If the currently executed jump instruction is a direct-jump-type instruction, the HMM will start the CFI check unit. When the program is loaded, the register named Label 1 is loaded with the CFIlabel of the first BB in the program. When the first instruction of the new BB is detected, the CFIlabel of the BB is stored in a register named Label 2, and the Hamming Distance Unit (HDU) is enabled. After that, the instruction recognition unit sends a signal to increase the value of the Label Controller from 0 to 1, and move the CFIlabel in Label 2 to Label 1. HDU calculates the HD between the CFIlabel in Label 1 and Label 2 and compares it with the HD determined offline. If the result is consistent, the CFI has not changed. If it is inconsistent, extract the HDspecial of the BB in its current execution, and check whether the HD calculated by the HDU is within the range of the HDspecial. If not, it is considered that the control flow has been hijacked by attackers. When it is judged that the CFI has not changed, the value of the Label Controller changes from 1 to 0, and the Label 2 register is cleared.

4.4.4. Bitmap Search

If the jump instruction of the currently executed BB is an indirect-jump-type instruction, the CFI checking method will no longer be applicable to this type of situation due to the lack of a CFIlabel during offline analysis. To solve this problem, the HMM will determine whether the jump instruction is an indirect jump instruction (l.jr) or an indirect function call instruction (l.jalr) according to the opcode, and the HMM will extract the jump target address of these instructions. If it is an indirect jump instruction, the HMM will find the point corresponding to the target address in the BBAddress Bitmap. Only when the value of the point is 1 is the target address judged to be legal. If it is an indirect function call instruction, the HMM will retrieve the point corresponding to the target address in the FEAddress Bitmap. Compared with the searching algorithm proposed in [12], the bitmap-based search method proposed in this paper requires only one clock cycle to determine whether the target address of the indirect jump instruction is legal.

5. Experiments and Results

In this section, we build a hardware-verified platform based on the OR1200 processor to evaluate the effectiveness of the HIIMA, the cost of hardware implementation, and the system performance overhead.

5.1. Platform and Benchmarks

We use a Digilent Genesys development board to build the embedded SoC on a Xilinx Virtex 5 FPGA, and implement the hardware-based integrity-monitoring module inside the OR1200 processor. The SoC is implemented in a fully synchronous manner, and its configuration is shown in Table 3. To evaluate the performance overhead, a timer is embedded in the SoC to record the program execution time and the extra time consumed by the integrity protection method proposed in this paper.
Since the security architecture mentioned is oriented to application scenarios based on security industrial control, we selected nine groups of programs or algorithms frequently used in the industrial control field in Mibench Benchmark. In addition, in order to evaluate the functions and costs of the proposed architecture when executing larger-scale programs, a set of open-source OpenECC is selected as a benchmark [34]. We use the OR1200-based GNU cross-compilation tool chain to compile and link these programs on a completely safe host computer, extract BBs, and generate HIREF and bitmaps.

5.2. Performance Evaluation

This section mainly evaluates the security mechanism proposed in this paper from three aspects: security capability, hardware implementation overhead, and performance overhead. These tests are all carried out on the OR1200 processor-based embedded system experimental platform.

5.2.1. Security Capability Evaluation

The Hybrid Information Integrity Monitoring Architecture (HIIMA) designed by us can realize the integrity protection of key data and control flows when the embedded system executes programs, and detect malicious behaviors of attackers tampering with data or hijacking the control flow in real time. Figure 2 illustrates the attack methods and attack locations of the OR1200 embedded systems. Since the extracted hybrid integrity reference information (HIREF) and bitmaps (FEAddress Bitmap and BBAddress Bitmap) are loaded into the designated memory of the HMM through the secure channel when the program is loaded, no external security module is required to encrypt the reference information. Therefore, side-channel attacks are not effective for HIIMA. In this section, software attacks and physical attacks will be used to simulate the tampering of key data and control flows by an attacker, and the security analysis will be performed separately.
We mainly simulate data tampering attacks in three ways:
  • To simulate the tampering of the program code by an attacker, we randomly select a parameter in the source file of each benchmark test program, such as input variables and static variables, and tamper with the value of the parameter. Then, we ran the tampered program on the test platform to observe whether HIIMA can detect the attack. Each experiment was repeated 1000 times.
  • To simulate the attacker’s attack on the external memory, we designed a fault implant circuit on the Flash. When the binary code is loaded on the Flash, the binary code and address of the program can be tampered with. Each set of experiments randomly selected 1000 attack samples to launch attacks on corresponding locations.
  • To simulate an attacker’s attack on dynamic data, we have implanted an attack circuit inside DDR2. When the embedded system is running the program, if too many dynamic data are generated, this part of the data will be stored in an off-chip memory, such as DDR2. When the data are needed, the embedded system reads the data from the off-chip memory. The attack circuit will randomly tamper with certain data written to DDR2. Each experiment will be repeated 1000 times.
Table 4 shows the detection rate of the HIIMA proposed in this paper for three types of data attacks. DI Check represents the probability that HIIMA discovers that data has been tampered with through the DI check mechanism. CFI Check indicates the probability that HIIMA finds that the control flow has been tampered with through the CFI check mechanism. The data in Table 4 show that even if the attack only aims at the data, the control flow of programs may change. However, the rate of discovering tampering attacks on data through the CFI protection mechanism is less than the rate of discovering tampering attacks through the DI protection mechanism. Therefore, many tampering-type attacks against data will not be detected if the CFI check mechanisms are only used.
In addition, compared to attacks on data in the memory, attacks on code and attacks on dynamic data are easier to detect. The reason is that the binary may contain library functions that the program does not use. Even if attackers tampered with the data of these functions, their code would not have been executed. Thus, the HIIMA did not detect attacks. This does not mean that attackers have bypassed the HIIMA and tampered with the data. As soon as these tampered data are extracted from the memory and ready for execution, the HIIMA can discover them at the ID stage.
The attack methods against the control flow are as follows:
  • Modifications for direct jumps. The attackers can use the buffer overflow attack to overwrite the value of the general-purpose register, so that the target address of the direct jump instruction and the direct function call instruction becomes the value set by the attacker.
  • Modifications for branch jumps. The attacker can attack the value of the register to change the target address of the branch instruction. In addition, the attacker can also change the judgment condition of the branch jump by changing the value of the flag.
  • Modification for the return address. Using the ROP attack, the attacker can modify the return address of the program to hijack the control flow.
  • Modifications for indirect jumps. For indirect jump instructions and indirect function call instructions, an attacker can use the JOP attack to modify the target address of the indirect jump and hijack the control flow.
In order to simulate the above-mentioned attacks, we designed a fault implant circuit inside the embedded system, which can modify the binary code and general registers, and realize the malicious tampering of direct jumps, branch jumps, return addresses, and indirect jumps. For each benchmark, 1000 attack samples were randomly selected for attack. Every time the program is executed, the fault implant circuit implants an attack sample into the embedded processor and judges whether the HMM has completed the real-time monitoring function of the attack according to whether the HMM gives a fault abnormal signal. The statistical results of attacks on the control flow are shown in Table 5.
Facing the direct jump and branch jump attacks, the HMM has a high attack-detection rate. For the attack behavior of the return address, indirect jump, and indirect function call, HMM has a missed detection rate. For indirect function call instructions, since we only limit the legal range of the destination address, when the address represented by the register happens to be a function entry that should not be called, the HMM will miss the detection. For the return address and indirect jump instructions, the HMM will also miss the detection due to similar reasons. However, in general, the HMM has high detection rates for attacks on return addresses, indirect jumps, and indirect calls, which are all above 90%.

5.2.2. Hardware Implementation Overhead

The consumption of on-chip hardware resources by HIIMA proposed in this paper mainly comes from two parts:
First of all, the Integrity Parameter Memory and Bitmap Memory inside the embedded processor consume a lot of storage space. Table 6 describes the storage space required for the reference information and bitmaps of the selected benchmarks. In Table 6, the size of the CFIlabel is calculated by Equation (4). The more BBs the benchmark contains, the larger the size that is required for its CFIlabel. Since the CFIlabel size of OpenECC is 16 bits, in order to facilitate the unified extraction of reference information, the CFIlabel size of all benchmarks is set to 16 bits. The storage space of the HIREF, the storage space of the FEAddress Bitmap, and the storage space of the BBAddress Bitmap can be obtained by Equation (4).
In Equation (4), TotalBB is the total number of BBs. The total number of BBs for each benchmark can be found in Table 2. SIZEprogram is the size of the benchmark. The size of each benchmark can be found in Table 2. As shown in Figure 3, the size of the HIREF of each BB is 64 bits. Therefore, the storage space required for the reference information of each benchmark is the total number of BBs multiplied by 64 bits, and then converted into KB. According to the bitmap algorithm, we can map the address of the benchmark to points. Since we set one bit on the bitmap to correspond to 8 bytes, the data that originally occupied 64 bits now only occupy 1 bit. Therefore, the size of the bitmap is the size of the benchmark divided by 64. Because all addresses need to be traversed when creating the FEAddress Bitmap and BBAddress Bitmap, the size of the two bitmaps is the same. Total is the storage space required for the monitor model of each benchmark. The monitor model contains reference information and two bitmaps. Thus, the size of Total is the size of the reference information plus the size of the two bitmaps.
The hardware-monitoring module (HMM) also consumes some on-chip resources. We use Xilinx ISE 14.6 to synthesize the secure embedded processor designed in this paper, and its hardware resource consumption is shown in Table 7. The logic resources consumed by the proposed HMM (excluding FPGA BRAM) account for about 10% of the SoC, while on-chip storage resources consume more than 90%. The reason is that all the monitor models are loaded into the on-chip memory, which causes the amount of on-chip memory resource overhead. To tackle this issue, monitor models can be encrypted and loaded into the off-chip memory. However, this requires the additional encryption and decryption algorithm modules on the chip [1,11], which also consume a certain amount of on-chip logic resources. In addition, since the off-chip security module will be attacked by side-channel attacks and physical attacks, the off-chip monitoring model cannot guarantee security.
The   size   of   HI REF :   SIZE REF = 64 × Total BB 8 × 1024 The   size   of   FE Address   Bitmap = SIZE program 64 The   size   of   BB Address   Bitmap = SIZE program 64 Total = 64 × Total BB 8 × 1024 + SIZE program 64 + SIZE program 64

5.2.3. Performance Overhead Evaluation

The embedded system has a relatively simple structure and limited performance. The introduction of security mechanisms will cause the performance of embedded systems to decrease. The embedded processor runs these benchmarks in turn, and records the average Cycles Per Instruction (CPI) of the processor to evaluate the performance overhead caused by the HMM. Table 8 lists the average CPI of these 10 benchmarks with and without HMM, and with the corresponding performance overhead. It can be seen from Table 8 that for each benchmark, the performance overhead caused by HMM is less than 9%, with an average value of 6.18%.
There are two factors that cause performance overhead:
  • The LHash algorithm. After receiving all the instructions of a BB, it needs three clock cycles to calculate the digest (LHashDynamic). Therefore, the HMM requires more clock cycles to check the data integrity of each BB.
  • The search for BB reference information. When the program is running, the HMM needs to search for the corresponding reference information in the Integrity Parameter Memory according to the start address of the currently executed BB. The larger the size of the HIREF, the longer the search time required.
To reduce the performance overhead, embedded system developers can use the hash functions with faster calculation speeds. However, these methods will inevitably increase on-chip resource overhead. At present, most of the hash functions with faster calculation speeds adopt the parallel structure, and their hardware implementation cost is much higher than that of the LHash algorithm [1].

5.3. Comparison with Other Integrity Protection Methods

We compare HIIMA with recent integrity protection methods, mainly considering their security and practicability [1,5,10,11,12,23,31,32]. Their security means that the protection method can deal with multiple security threats and prevent attackers from tampering with data or hijacking the control flow. Practicality is evaluated through the resource overhead and performance overhead. Table 9 demonstrates the comparison results. Researchers usually use different metrics to evaluate the resource overhead of the proposed methods, such as the occupied slices, the increase in the code size or the binary size, the area, and the storage overhead. However, when evaluating the performance overhead, they usually use a metric such as CPI, which can illustrate the increase in the time it takes for the embedded system to execute the benchmarks before and after adding the proposed method. The purposes of [1,11] are to protect the integrity and confidentiality of dynamic data when embedded systems execute programs. The purpose of [10] is to ensure the runtime code’s integrity in the embedded systems. Ref. [5] proposed the MicroGuard to securely execute applications and resist code reuse attacks (CRAs). Ref. [23] aimed to protect the CFI of the executing program. Ref. [31] proposed the HCIC to resist CRAs without extending ISAs, modifying the compiler, or leaking the encryption/decryption key. Ref. [12] proposed the Security-Monitoring Unit (SMU) to monitor both the DI and the CFI of running instructions. Ref. [32] proposed the Dynamic Sequence Checker (DSC) to verify the validity of the control flow, and the DSC could also check the DI of BBs. The methods proposed in [1,10,11], and [12] are the previously published results by members of our lab. Furthermore, the FPGA, benchmarks, and the design software used in these references are essentially the same as those used in this paper.
These methods [1,10,11] mainly protect the data integrity. Both [1,11] protect the normal operation of the program by checking the integrity of dynamic data. The performance overhead of [1] is only 2.65%, and the performance overhead of [11] is less than 2.27%. However, neither [1] nor [11] can prevent attackers from hijacking the control flow. Due to the protection object being outside the chip, both [1,11] require additional encryption modules, which consume an amount of on-chip resources. In addition, Ref. [1] and Ref. [11] are also difficult to resist side-channel attacks. Ref. [10] checks the integrity of the binary code, and its performance overhead is less than 3.45%. Ref. [10] also cannot prevent attackers from destroying the CFI.
These methods [5,23,31] mainly protect the control flow integrity. Ref. [5] can resist CRAs for large programs, but this method will greatly increase the program code size, and the performance overhead is more than 19%. Ref. [23] presents a lightweight CFI solution for bare-metal embedded systems. This method only needs an increase of less than 4% in additional code to resist ROP attacks and JOP attacks. However, its performance overhead is unstable. For some programs, the performance overhead of [23] is less than 1%, while for some programs, the performance overhead will be 193%. Ref. [31] proposes a new hard-ware-assisted control-flow-checking method to resist CRAs, introducing a negligible 0.95% runtime overhead and 0.78% binary size overhead on average. The above-mentioned methods ignore the protection of data integrity, and sometimes, it is impossible to detect the tampering behavior of the data by the attacker.
These methods [12,32] take into account both DI and CFI. They all use hash functions to monitor the integrity of program data. Ref. [32] proposes a Dynamic Sequence Checker (DSC) to check the CFI of BBs. A DSC can detect CRA gadgets, and the average performance overhead is 4.7%. However, Ref. [32] only uses the shadow stack to protect the return address of the program, ignoring the protection of indirect jumps. Ref. [12] protects the CFI of indirect jumps by limiting the target address range. The legal addresses are stored in the form of a secondary list. Despite the optimized search algorithm, they still need to perform multiple searches to determine whether the target address is in the secondary address list. Its performance overhead is 9.33% on average. In addition, the size of reference information in [12] is equal to the size of the protected program.
Different from these methods, the proposed HIIMA can monitor the DI and CFI of the execution program at the same time. Once an attacker tampered with key data or manipulated the control flow, HMM would discover these abnormal behaviors in real time. The experimental results show that HIIMA satisfies the balance between security and practicability. However, it is undeniable that our proposed method also has limitations. The program may be interrupted when it is actually running. After the interrupt is completed, the embedded processor will return to the previous state to continue processing the program. In this case, our proposed CFI protection mechanism cannot identify whether the control flow change caused by the interrupt is legal. The CFI after the interruption can be protected by backing up and verifying the status parameters of the program. In addition, if external data participates in the program while it is running, HIIMA cannot monitor the integrity of the external data. In this case, the proposed method can be combined with dynamic information flow tracking technology to solve the problem.

6. Conclusions

This paper proposes a hybrid information integrity-monitoring architecture (HIIMA) to protect the embedded system’s key data from being tampered with or their control flow hijacked when executing the program. This hardware-based security architecture introduces a data integrity (DI) protection mechanism and a control flow integrity (CFI) protection mechanism. When the program is offline, the binary code of the program is divided into basic blocks (BBs), and the start address of each BB, the static digest generated by the LHash, and the CFI label generated by the label allocation algorithm are extracted, and the final composition is the hybrid integrity reference information. To monitor the indirect jump instructions, this paper maps the legal target address into bitmaps. The reference information and bitmaps will be stored in the designated on-chip memory through a secure path when the program is loaded. When the program is running, the HIIMA will check the DI and CFI of the currently executing BB based on the address signal and the instruction signal. By building a hardware test platform to evaluate the security, resource overhead, and performance overhead of the proposed method, the results show that the HIIMA can effectively resist attacks on the data and control flow with a low performance overhead and low resource overhead.

Author Contributions

Conceptualization, Q.H.; methodology, Q.H.; software, Q.H. and Z.Z.; validation, Q.H., J.W. and D.X.; formal analysis, Z.Z. and J.M.; investigation, Q.H. and D.X.; resources, X.W.; data curation, J.L. and J.Z.; writing—original draft preparation, Q.H.; writing—review and editing, Q.H.; supervision, X.W.; project administration, X.W.; funding acquisition, X.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (Grants No. 60973106 and No. 81571142), the Key Project of the National Natural Science Foundation of China (Grant No. 61232009), and the National 863 Project of China under Grant No. 2011AA010404.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

We sincerely appreciate the Shaoxing Yangyu AI Chip CO., LTD, Zhejiang 312035, China, for providing financial support and technical resources. The method proposed in this paper may be applied to the smart chip and its equipment of this company in the future.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Zhang, Z.; Wang, X.; Hao, Q.; Xu, D.; Zhang, J.; Liu, J.; Ma, J. High-Efficiency Parallel Cryptographic Accelerator for Real-Time Guaranteeing Dynamic Data Security in Embedded Systems. Micromachines 2021, 12, 560. [Google Scholar] [CrossRef] [PubMed]
  2. Zhang, Z.; Wang, X.; Hao, Q.; Xu, D.; Wang, J.; Liu, J.; Ma, J.; Zhang, J. Hardware-Implemented Security Processing Unit for Program Execution Monitoring and Instruction Fault Self-Repairing on Embedded Systems. Appl. Sci. 2022, 12, 3584. [Google Scholar] [CrossRef]
  3. Wang, X.; Zhang, Z.; Hao, Q.; Xu, D.; Wang, J.; Jia, H.; Zhou, Z. Hardware-Assisted Security Monitoring Unit for Real-Time Ensuring Secure Instruction Execution and Data Processing in Embedded Systems. Micromachines 2021, 12, 1450. [Google Scholar] [CrossRef] [PubMed]
  4. Kaur, M.; Raj, M.; Lee, H.N. Cross Channel Scripting and Code Injection Attacks on Web and Cloud-Based Applications: A Comprehensive Review. Sensors 2022, 22, 1959. [Google Scholar]
  5. Salehi, M.; Danny, H.; Crispo, B. MicroGuard: Securing Bare-Metal Microcontrollers against Code-Reuse Attacks. In Proceedings of the IEEE Conference on Dependable and Secure Computing (DSC), Hangzhou, China, 18–20 November 2019; pp. 1–8. [Google Scholar]
  6. Yang, D.; Gao, C.; Huang, J. Quantitative Assessment and Grading of Hardware Trojan Threat Based on Rough Set Theory. Appl. Sci. 2022, 12, 5576. [Google Scholar] [CrossRef]
  7. Pedreira, V.; Barros, D.; Pinto, P. A Review of Attacks, Vulnerabilities, and Defenses in Industry 4.0 with New Challenges on Data Sovereignty Ahead. Sensors 2021, 21, 5189. [Google Scholar] [CrossRef] [PubMed]
  8. Mukhtar, N.; Mehrabi, M.A.; Kong, Y.; Anjum, A. Machine-Learning-Based Side-Channel Evaluation of Elliptic-Curve Cryptographic FPGA Processor. Appl. Sci. 2019, 9, 64. [Google Scholar] [CrossRef] [Green Version]
  9. Wang, W.; Liu, M.; Du, P.; Zhao, Z.; Tian, Y.; Hao, Q.; Wang, X. An Architectural-Enhanced Secure Embedded System with a Novel Hybrid Search Scheme. In Proceedings of the 2017 International Conference on Software Security and Assurance (ICSSA), Altoona, PA, USA, 24–25 July 2017; pp. 116–120. [Google Scholar]
  10. Wang, X.; Wang, W.; Xu, B.; Du, P. A fine-grained hardware security approach for runtime code integrity in embedded systems. J. Univers. Comput. Sci. 2018, 24, 515–536. [Google Scholar]
  11. Wang, W.; Zhang, X.; Hao, Q.; Zhang, Z.; Xu, B.; Dong, H.; Xia, T.; Wang, X. Hardware-enhanced protection for the runtime data security in embedded systems. Electronics 2019, 8, 52. [Google Scholar] [CrossRef] [Green Version]
  12. Wang, X.; Zhao, Z.; Xu, D.; Zhang, Z.; Hao, Q.; Liu, M. An M-Cache based security monitoring and fault recovery architecture for embedded processor. IEEE Trans. Large Scale Integr. Syst. 2020, 28, 2314–2327. [Google Scholar] [CrossRef]
  13. Du, P.; Wang, X.; Wang, W.; Li, L.; Xia, T.; Li, H. Hardware-assisted integrity monitor based on lightweight hash function. IEICE Electron. Express 2018, 15, 20180107. [Google Scholar] [CrossRef] [Green Version]
  14. Yan, C.; Englender, D.; Prvulovic, M.; Rogers, B.; Solihin, Y. Improving cost, performance, and security of memory encryption and authentication. In Proceedings of the 33rd Annual International Symposium on Computer Architecture, Boston, MA, USA, 17–21 June 2006; pp. 179–190. [Google Scholar]
  15. Gelbart, O.; Leontie, E.; Narahari, B.; Simha, R. Architectural support for securing application data in embedded systems. In Proceedings of the IEEE International Conference on Electro/Information Technology, Ames, IA, USA, 18–20 May 2008; pp. 19–24. [Google Scholar]
  16. Arora, D.; Ravi, S.; Raghunathan, A.; Jha, N.K. Architectural support for run-time validation of program data properties. IEEE Trans. Large Scale Integr. Syst. 2007, 15, 546–559. [Google Scholar] [CrossRef]
  17. Hong, M.; Guo, H.; Hu, S.X. A cost-effective tag design for memory data authentication in embedded systems. In Proceedings of the 2012 International Conference on Compilers, Architectures and Synthesis for Embedded Systems, Tampere, Finland, 7–12 October 2012; pp. 17–26. [Google Scholar]
  18. Liu, T.; Guo, H.; Parameswaran, S. iCETD: An Improved tag generation design for memory data authentication in embedded processor systems. IEEE Trans. Large Scale Integr. Syst. 2017, 56, 96–104. [Google Scholar] [CrossRef]
  19. Wang, X.; Zhou, C.; Pang, S.; Li, M. Hardware assisted protectin for data validation at run-time on embedded processors. In Proceedings of the Joint Conference on Information Science and Technology, Guilin, China, 8–9 August 2016; pp. 680–685. [Google Scholar]
  20. Fiskiran, A.M.; Lee, R.B. Runtime Execution Monitoring (REM) to Detect and Prevent Malicious Code Execution. In Proceedings of the IEEE International Conference on Computer Design: VLSI in Computers and Processors, San Jose, CA, USA, 11–13 October 2004; pp. 452–457. [Google Scholar]
  21. Arun, K.K.; Ramesh, K.; Gaston, O.; Sateesh, K.A. A High-Performance, Low-Overhead Microarchitecture for Secure Program Execution. In Proceedings of the 2012 IEEE 30th International Conference on Computer Design (ICCD), Montreal, QC, Canada, 30 September–3 October 2012; pp. 102–107. [Google Scholar]
  22. Stephen, C.; Lucas, D.; Alexandra, D.; Ahmad-Reza, S.; Hovav, S.; Marcel, W. Return-oriented programming without returns. In Proceedings of the 17th ACM Conference on Computer and Communications Security (CCS’10), Chicago, IL, USA, 4–8 October 2010; pp. 559–572. [Google Scholar]
  23. Nicolò, M.; Paolo, P.; Gianluca, R.; Antonio, V. A FPGA-based Control-Flow Integrity Solution for Securing Bare-Metal Embedded Systems. In Proceedings of the 15th Design & Technology of Integrated Systems in Nanoscale Era (DTIS), Marrakech, Morocco, 1–3 April 2020; pp. 1–10. [Google Scholar]
  24. Das, S.; Zhang, W.; Liu, Y. A fine-grained control flow integrity approach against runtime memory attacks for embedded systems. IEEE Trans. Large Scale Integr. Syst. 2016, 24, 3193–3207. [Google Scholar] [CrossRef]
  25. He, W.; Das, S.; Zhang, W.; Liu, Y. No-jump-into-basic-block: Enforce basic block CFI on the fly for real-world binaries. In Proceedings of the IEEE 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC), Austin, TX, USA, 18–22 June 2017; pp. 1–6. [Google Scholar]
  26. Davi, L.; Hanreich, M.; Paul, D.; Sadeghi, A.R.; Koeberl, P.; Sullivan, D.; Arias, O.; Jin, Y. HAFIX: Hardware-assisted flow integrity extension. In Proceedings of the IEEE 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, 8–12 June 2015; pp. 1–6. [Google Scholar]
  27. Christoulakis, N.; Christou, G.; Athanasopoulos, E.; Ioannidis, S. HCFI: Hardware-enforced control-flow integrity. In Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy, New Orleans, LA, USA, 16–18 March 2016; pp. 38–49. [Google Scholar]
  28. Sullivan, D.; Arias, O.; Davi, L.; Larsen, P.; Sadeghi, A.R.; Jin, Y. Strategy without tactics: Policy-agnostic hardware-enhanced control-flow integrity. In Proceedings of the IEEE 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC), Austin, TX, USA, 5–9 June 2016; pp. 1–6. [Google Scholar]
  29. Lee, J.; Heo, I.; Lee, Y.; Paek, Y. Efficient security monitoring with the core debug interface in an embedded processor. ACM Trans. Des. Autom. Electron. Syst. (TODAES) 2016, 22, 1–29. [Google Scholar] [CrossRef]
  30. Lee, Y.; Lee, J.; Heo, I.; Hwang, D.; Paek, Y. Integration of ROP/JOP monitoring IPs in an ARM-based SoC. In Proceedings of the IEEE 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE), Dresden, Germany, 14–18 March 2016; pp. 331–336. [Google Scholar]
  31. Zhang, J.; Qi, B.; Qin, Z.; Qu, G. HCIC: Hardware-assisted Control-flow Integrity Checking. IEEE Internet Things J. 2018, 6, 458–471. [Google Scholar] [CrossRef] [Green Version]
  32. Arun, K.; Jeyavijayan, R.; Ramesh, K. Controlling your control flow graph. In Proceedings of the IEEE International Symposium on Hardware Oriented Security & Trust, McLean, VA, USA, 3–5 May 2016; pp. 43–48. [Google Scholar]
  33. Wu, W.; Wu, S.; Zhang, L.; Zou, J.; Dong, L. LHash: A lightweight hash function. In Proceedings of the Springer International Conference on Information Security and Cryptology, Guangzhou, China, 27–30 November 2013; pp. 291–308. [Google Scholar]
  34. Wang, D. An Open Source Library for Elliptic Curve Cryptosystem. Available online: https://github.com/wangdali/OpenECC (accessed on 10 April 2012).
Figure 1. The example of BB extraction in OR1200 ISA.
Figure 1. The example of BB extraction in OR1200 ISA.
Applsci 12 07750 g001
Figure 2. Threat model of the proposed work.
Figure 2. Threat model of the proposed work.
Applsci 12 07750 g002
Figure 3. The structure of the Hybrid Information Integrity Monitoring Architecture.
Figure 3. The structure of the Hybrid Information Integrity Monitoring Architecture.
Applsci 12 07750 g003
Figure 4. Extracting the control flow integrity reference information of BasicMath.
Figure 4. Extracting the control flow integrity reference information of BasicMath.
Applsci 12 07750 g004
Figure 5. Extracting the bitmap of the legal jump target address.
Figure 5. Extracting the bitmap of the legal jump target address.
Applsci 12 07750 g005
Table 1. Jump instruction types of OR1200 ISA.
Table 1. Jump instruction types of OR1200 ISA.
TypeTarget Address
l.j EAdirect jump instructionEA
l.bf EAdirect branch instructionIf the flag is 1,
Effective Address (EA)
If the flag is 0,
End Address + 0x4
l.bnf EAdirect branch instructionIf the flag is 0,
effective address (EA)
If the flag is 1,
End Address + 0x4
l.jal EAdirect function call instructionEA
l.jalr rBindirect function call instructionThe value of general register rB
l.jr rbindirect jump instructionThe value of general register rB
l.rfeinterrupt return jump instructionThe state before exception handling
Table 2. Characteristics of the selected benchmarks.
Table 2. Characteristics of the selected benchmarks.
BenchmarksTotal
Instructions
Direct Jump and Branch and CallIndirect Jump and Indirect CallTotal BBProgram Size (KB)
OpenECC56,313543912956734197
SHA120,4552822578340086
FFT13,5061818325214391
CRC1618,9412672559323179
BasicMath26,51536676604327149
AES22,1702926609353592
Bitcount19,6842760584334483
Blowfish19,1282685562324783
Patricia23,1303288565385397
QuickSort6707854164101879
Table 3. The configuration of the SoC.
Table 3. The configuration of the SoC.
ContentSpecification
Processor CoreOR1200 (svn rev 853)
Clock Frequency @100 MHz
The size of Data Cache: 4 KB
The size of Instruction Cache: 4 KB
MemoryThe size of DDR2 SDRAM: 256 MB
The size of Flash: 16 MB
BoardDigilent Genesys
FPGA ChipXilinx Virtex 5, XC5VLX50T
Table 4. Statistics on the detection rate of data attacks.
Table 4. Statistics on the detection rate of data attacks.
BenchmarksAttacks on CodeAttacks on MemoryAttacks on Dyn. Data
DI CheckCFI CheckDI CheckCFI CheckDI CheckCFI Check
OpenECC100%72.3%98.7%59.2%100%83.4%
SHA1100%84.2%99.1%69.4%100%92.8%
FFT100%77.5%99.6%65.7%100%85.4%
CRC16100%80.4%99.5%67.4%100%87.1%
BasicMath100%76.3%98.1%63.1%100%88.3%
AES100%81.7%98.9%68.9%100%86.7%
Bitcount100%85.1%98.6%70.6%100%91.2%
Blowfish100%75.8%98.2%60.3%100%81.3%
Patricia100%74.5%98.8%64.5%100%83.7%
QuickSort100%81.6%99.3%68.8%100%89.5%
Average100%78.9%98.9%65.8%100%86.9%
Table 5. Statistics on the detection rate of control flow attacks.
Table 5. Statistics on the detection rate of control flow attacks.
BenchmarksDirect JumpBranch JumpReturn AddressIndirect JumpIndirect Call
OpenECC100%100%91.2%95.3%95.4%
SHA1100%100%92.3%96.5%96.1%
FFT100%100%90.4%94.8%94.3%
CRC16100%100%93.6%95.6%95.2%
BasicMath100%100%94.6%95.2%96.3%
AES100%100%94.5%95.6%94.8%
Bitcount100%100%93.7%94.3%95.6%
Blowfish100%100%92.8%95.2%97.9%
Patricia100%100%96.2%97.1%96.8%
QuickSort100%100%95.1%96.8%96.5%
Average100%100%93.4%95.6%95.9%
Table 6. The size of the reference information and the bitmap for the selected benchmarks.
Table 6. The size of the reference information and the bitmap for the selected benchmarks.
BenchmarksSize of
CFIlabel (bits)
Size of HIREF (KB)FEAddress
Bitmap (KB)
BBAddress
Bitmap (KB)
Total (KB)
OpenECC1652.613.083.0858.77
SHA11326.561.341.3429.24
FFT1216.741.421.4219.58
CRC161225.241.231.2327.7
BasicMath1433.802.332.3338.46
AES1527.621.441.4430.5
Bitcount1226.121.301.3028.72
Blowfish1225.361.301.3027.96
Patricia1230.101.521.5233.14
QuickSort117.961.231.2310.42
Average-27.211.621.6230.45
Table 7. The FPGA resources used for the security-enhanced embedded system.
Table 7. The FPGA resources used for the security-enhanced embedded system.
Slice Logic UtilizationSoCCPU CoreHMM
Slice Registers65842604612
Slice LUTs14,25972081374
Occupied Slices52133312486
BlockRAM/FIFO521348
BUFG/BUFGCTRLs712
Table 8. The performance overhead of the processor configured with HMM.
Table 8. The performance overhead of the processor configured with HMM.
BenchmarksCPI without HMMCPI with HMMPerformance Overhead
OpenECC2.943.079.52%
SHA12.122.235.19%
FFT2.322.416.03%
CRC161.681.783.57%
BasicMath2.412.538.71%
AES3.463.596.36%
Bitcount1.521.645.92%
Blowfish3.543.874.80%
Patricia1.521.657.89%
QuickSort1.861.983.76%
Average2.342.486.18%
Table 9. The comparison of security and practicability.
Table 9. The comparison of security and practicability.
Integrity Protection
Methods
Protect the DIProtect the CFIResource OverheadPerformance Overhead
Parallel Cryptographic Accelerator [1]YesNoThe occupied slices for the proposed module are around 69.2% of the total SoC.2.65%
Instruction Stream Integrity Checker [10]YesNoThe on-chip storage requirement is less than 26.30 KB.<3.45%
Hardware-Enhanced Protection [11]YesNoThe occupied slices for the proposed module are around 44.9% of the total SoC.<2.25%
MicroGuard [5]NoYesCompared to the baseline, the increase in the code size is more than 181%.>19%
FPGA-based CFI Solution [23]NoYesThe increase in the number of instructions is less than 3.72%.From 0.01% to 193%
HCIC [31]NoYesThe increase in the binary size is 0.78%.0.95%
Security Monitoring Unit (SMU) [12]YesYesCompared to the baseline, the area increased by 20.99%.9.33%
Dynamic Sequence Checker (DSC) [32]YesYes (Except for the indirect jumps)The storage overhead incurred due to appending the Hamming code is 4.1%4.7%
Our proposed HIIMAYesYesThe on-chip storage requirement is less than 58.77 KB, and the occupied slices for the HMM are 9.3%.6.18%
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Hao, Q.; Zhang, Z.; Xu, D.; Wang, J.; Liu, J.; Zhang, J.; Ma, J.; Wang, X. A Hardware Security-Monitoring Architecture Based on Data Integrity and Control Flow Integrity for Embedded Systems. Appl. Sci. 2022, 12, 7750. https://doi.org/10.3390/app12157750

AMA Style

Hao Q, Zhang Z, Xu D, Wang J, Liu J, Zhang J, Ma J, Wang X. A Hardware Security-Monitoring Architecture Based on Data Integrity and Control Flow Integrity for Embedded Systems. Applied Sciences. 2022; 12(15):7750. https://doi.org/10.3390/app12157750

Chicago/Turabian Style

Hao, Qiang, Zhun Zhang, Dongdong Xu, Jiqing Wang, Jiakang Liu, Jinlei Zhang, Jinhui Ma, and Xiang Wang. 2022. "A Hardware Security-Monitoring Architecture Based on Data Integrity and Control Flow Integrity for Embedded Systems" Applied Sciences 12, no. 15: 7750. https://doi.org/10.3390/app12157750

APA Style

Hao, Q., Zhang, Z., Xu, D., Wang, J., Liu, J., Zhang, J., Ma, J., & Wang, X. (2022). A Hardware Security-Monitoring Architecture Based on Data Integrity and Control Flow Integrity for Embedded Systems. Applied Sciences, 12(15), 7750. https://doi.org/10.3390/app12157750

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop