We used an MSI laptop with an Intel(R) Core(TM) i7-6700HQ hyper-threading quad-core processor in this phase of the experiment. The frequency of the processor is 2.60 GHz. The size of the last level cache is 6 MB. The operating system is Ubuntu 18.04LTS. The perf tool version is 5.4.114. We evaluated the performance of CacheHawkeye in this section.
4.1. Detect Flush+Reload and Flush+Flush Attacks
In this subsection, we evaluated the performance of
CacheHawkeye on Flush+Reload and Flush+Flush attacks. Flush+Reload attack program monitored target addresses 10,000 times and Flush+Flush attack monitored target addresses 5600 times. As a control experiment, we also evaluated the behavior of
CacheHawkeye on legit AES and RSA encryption and decryption programs. These legit programs also use shared libraries and may access sensitive functions (memory addresses) stored in the cache attack library, so we evaluated the performance of
CacheHawkeye on these programs. We will analyze the performance of
CacheHawkeye under various system loads in
Section 4.3, and we temporarily overlook the interference of system load in this subsection.
The results of
CacheHawkeye detected Flush+Reload attacking RSA are listed in
Table 2. For the sake of brevity, we only list the data in user mode. The results of mem-loads are listed in columns 1 and 2, while the results of mem-stores are listed in columns 3 and 4. The data symbol columns of mem-loads and mem-stores both contain monitored functions (such as mpihelp_mul_karatsuba_case and mpihelp_divrem). Some data symbols appear repeatedly in the same column because the other parameters of this row are different. These functions are frequently accessed and account for 10% to 12.99% overhead. Because these function names are stored in the cache attack library,
CacheHawkeye determines that this program is malicious.
Table 3 shows the results of
CacheHawkeye detecting Flush+Flush attacking AES. We list a few lines which we care about. For the results of mem-loads, the data symbol in the first column does not contain any monitored addresses in the cache attack library. For the results of mem-stores, the data symbols in the third column are contained in the cache attack library. As a result,
CacheHawkeye considers this program to be vicious.
Let us explain the above results. In Flush+Reload, Flush is memory storage procedure in which the cache line is evicted to memory, and Reload is a memory loading procedure in which the memory block is placed into cache. As a result, Mem-stores and mem-loads contain the names of the sensitive functions. Different from Flush+Reload, Flush+Flush only includes the Flush process, so the sensitive memory addresses are only found in mem-stores.
We ran tests to evaluate the performance of CacheHawkeye on legit cryptographic programs that use shared libraries and monitored functions or memory addresses. We tested four programs: AES encryption, AES decryption, RSA encryption, and RSA decryption.
Table 4 lists all results of a legit AES encryption program. The data symbol column does not contain any sensitive functions or memory addresses, So
CacheHawkeye believes that this program is legit. The results of the legit AES decryption program are similar to
Table 4 and also do not contain any sensitive functions or memory addresses. These results are not presented for the sake of brevity.
It is worth noting that we divide the detection of the legit RSA decryption program into two tables.
Table 5 shows the memory load results, and
Table 6 shows the memory store results. There are no sensitive function names in
Table 5. However, this does not mean that the legit RSA decryption program does not access these addresses, but because the access is not frequent enough, they are not caught by
CacheHawkeye. There are some sensitive function addresses(such as mpih_sqr_n_basecase and mpihelp_divrem) in the symbol column of
Table 5, this indicates that the legit RSA program has accessed these target addresses(functions). But there are no sensitive function addresses in the data symbol column. These demonstrate that the symbol column represents the legit program’s memory access, whereas the data symbol represents the cache side channel attack’s malicious memory access. The
perf tool automatically puts the sensitive function name of the attack program in the data symbol column and puts the sensitive function name which the legit program also accesses in the symbol column. We conjecture that the reason behind it may be the memory access (by
clflush instruction and
movl instruction) of the attack program is somewhat different from the memory access of the legit program.
CacheHawkeye only pays attention to the data symbol column, so it does not misjudge the legit cryptographic program.
For legit RSA encryption programs, the results of the data symbol column still do not contain any sensitive functions or memory addresses. These results are not presented for the sake of brevity. Therefore, CacheHawkeye can distinguish between benign cryptographic programs and side channel attack programs.
4.2. Sampling Frequency Configuration
In this subsection, we evaluated the performance of
CacheHawkeye at different frequencies and determine an appropriate sampling frequency. We tested
CacheHawkeye to detect 4 representative attack programs and 4 legit cryptographic programs which may access sensitive addresses. we chose programs that are extremely difficult to detect when configuring the frequency, this can make the configured frequency more universally adaptable. For
CacheHawkeye, the shorter the execution time of the attack program, the fewer sensitive memory address accesses, and the more difficult it is to detect. We chose 4 programs with very short execution time to configure the sampling frequency. As shown in
Table 7, the execution time of these programs is only 7–12 ms. Real-world attacks must be much longer than these times because the attacker cannot synchronize with the victim. Therefore, the frequency configured according to these attack programs far meets the requirements of detecting real-world attacks.
We monitored 4 representative attack programs and 4 legit cryptographic programs with sampling frequencies of 2999, 5999, 8999, 11,999 and 14,999. Each attack program is executed 1000 times at each frequency. 4000 attacking samples are generated per frequency.
CacheHawkeye also tests legit AES encryption/decryption programs and RSA encryption/decryption programs under different frequencies. Each legit program is executed 1000 times at each frequency. We use accuracy to evaluate the performance of
CacheHawkeye at different frequencies and then determine the appropriate sampling frequency configuration. Accuracy refers to the percentage of samples that are judged correctly in the total samples. The formula for accuracy is as follows:
In Equation (
1), True Positive(TP) represents that the malignant program is correctly recognized, True Negative(TN) represents that the benign program is correctly recognized, False Positive(FP) represents that the benign program is recognized as a malignant program, and False Negative(FN) represents that the malignant program is recognized as a benign program.
The accuracy of malicious and legit programs at different sampling frequencies is shown in
Figure 4. The accuracy of the
CacheHawkeye is only 82.7% when the sampling frequency is 2999.
CacheHawkeye’s accuracy improves as the sampling frequency rises. The accuracy rate reaches 100% when the sampling frequency reaches 14,999.
We hypothesized that a greater sample frequency would lengthen the sampling time, so we measured it at various frequencies. We define sampling time as the time it takes to collect memory events and store them as a file.
Figure 5 shows the average sample time of four malicious programs at various frequencies. We can see that when the frequency increases, the sample time does not change significantly. We only need to consider the accuracy when configuring the frequency. As a result,
CacheHawkeye’s sample frequency configuration is 14,999.
4.3. Performance under Different System Loads
In this subsection, we evaluated
CacheHawkeye under different system loads. We used
unixbench and
sysbench to generate system load. We used the default configuration of
unixbench. The configuration settings of
sysbench are listed in
Table 8. During the execution of
sysbench, we randomly picked one of the five routines. The system loads are divided into three categories: no-load, average-load, and full-load. No-load means that there is no system load when
CacheHawkeye is running. The average-load has two workloads, one runs
sysbench, the other runs
unixbench. The full-load has four workloads, two of which run
sysbench, and the other two run
unixbench.
We tested
CacheHawkeye to detect 4 representative attack programs and 4 legit cryptographic programs which may access sensitive addresses under different system loads. Each program is executed 1000 times. 4000 benign samples and 4000 malignant samples are generated under each system load. The experimental results are listed in
Table 9. We discovered that
CacheHawkeye is 100% accurate under no-load and full-load, and 99.99% accurate under average-load. Because
CacheHawkeye has not been pre-trained under different system loads, it can be expected that
CacheHawkeye still performs excellently under unknown system loads. As a result, it can be inferred that the performance of
CacheHawkeye performance is unaffected by system load. Because the memory capacity is substantially more than the capacity of the microarchitecture components (such as the branch instruction buffer and cache), memory events are very little affected by system loads.
Table 10 summarizes some limitations of the above work. CacheRadar and Alam et al.’s methods cannot detect Flush+Flush attacks. These two strategies, however, do not take system loads into account. We believe that these strategies are extremely sensitive to system loads because hardware events such as cache hits and misses are highly susceptible to interference from system loads. NIGHTs-WATCH has a good performance in known system loads and can detect Flush+Flush attacks. However, system loads still bring an accuracy loss of 4.97% [
13] and this pre-trained model may perform poorly under unknown system load. Microarchitecture events are used as feature vectors for detection in all of the approaches listed above. Our approach detects cache side channel attacks using memory events. Compared with the above methods, our method has a very strong ability to adapt to the system loads and close to 100% accuracy.