Beruflich Dokumente
Kultur Dokumente
I.
INTRODUCTION
II.
BACKGROUND
A. Basic blocks
Basic block is a sequence of consecutive statements with
only one entry in and one exit out in computing [6]. The code
in a basic block has one entry point means that in a basic block,
there is no destination of a jump instruction. A basic block also
has one exit point means that when the code comes to the last
instruction, it will begin to execute another basic block. In the
execution of a basic block, each instruction will be executed
exactly once in order. This restricted form makes a basic block
highly amenable to analysis [2].
Compilers usually split programs into their basic blocks at
the first step in the analysis process. The analyzer scans over
the code, marking block boundaries. Block boundaries are
instructions which either transfer control or accept control from
another point. Then cut at each of these points to get basic
blocks. Basic blocks are the vertices or nodes in a control flow
graph.
B. TOCTOU attacks
TOCTOC is a type of attack in which the attacker uses the
timing of check to time of use race condition to influence the
resource [7]. The software checks the state of a resource before
using it, but the resource's state can change between the check
and the use in a way that make the results of the check invalid.
This can cause the software to perform invalid actions when
the resource is in an unexpected state. This can happen with
shared resources such as files, memory, or even variables in
multithreaded programs. The attacker can exploit such race
condition to modify application data, files, directories and
memory.
Usually programs are checked for integrity at boot time of
the system. However, at runtime the integrity of a program
cannot be determined due to possible TOCTOU attacks. As a
result, runtime integrity checking should be implemented.
C. Code Reuse Attacks (CRAs)
CRAs are attacks where an attacker uses the code inside a
compromised program to execute arbitrary malicious code by
diverting control flow through existing code. In other words,
an attacker can launch an attack without injecting code
explicitly to the program. Return-oriented programming
(ROP) [8] [9] and jump-oriented programming (JOP) [10] are
two popular CRA techniques.
JOP is another CRA but it does not need the call stack.
Instead of using return addresses of gadgets, JOP attack uses
indirect branching to chain the gadgets.
D. Hash Function
Hash function is a one-way mathematical function of a
message that it cannot be easily reconstituted back into the
original message, even with knowledge of the hash algorithm.
Such one-way function calculates any arbitrary length
message to a fixed length string. Also it should not have such
condition that there are two different message that be
calculated to the same hash value.
In our defense frame work, we use GCM AES to get the
hash value of the instruction set which translated from basic
block of C program. It should be secure that it is difficult for
attacker to break.
E. GCM algorithm
Galois Field Mode(GCM) is a block cipher mode with
authenticated function. It contains authenticated encryption and
decryption. We only need to use authenticated encryption in
our implementation. The block cipher infrastructures in GCM
we have chosen 128 bits Advanced Encryption Standard
(AES).
Figure X has demonstrated the GCM operation of
authenticated encryption.
GCM authenticated encryption operation has four inputs:
A secret key K. We assume that it is 128 bits long
consistent with the underlying AES block cipher.
An initialization vector (IV) can have up to 2 64 bits. A 96bit IV is recommended for efficiency.
A plaintext P that can have up to 239 bits.
Additional authenticated data A that have up to 264 bits.
This additional authenticated data is authenticated but not
encrypted. and two outputs:
A cipher text C whose length is identical to that of the
plaintext P.
An authentication tag T that have up to 128 bits.
if len(IV)=96
C. DTPM architecture
Figure 5 shows the architecture in our implementation.
DTPM is implemented outside the processor pipeline to
maintain simple and generic system that can be ported to any
processor architectures. For this demonstration, we ported on
to the OpenRISC architecture.
In OpenRISC architecture, CPU fetched instructions from
IMMU through wishbone interface.
DTPM inputs are sampled on the wishbone interface which
are driven by the Fetch module, and generates stall signal
based on the basic block hash value. The input signals
sampled are ibus_addr, ibus_addr_req, ibus_data, ibus_ack.
The output signal is OR-ed with the external stall signal and
driven to the execute stage.
D. DTPM State Machine
Upon PC reset, DTPM stalls the PC and initializes AES
GCM module. DTPM monitors the address on the Fetch block
and look for the Cache memory for start and end address of
the basic blocks. Once DTPM encounters the basic block start
address, DTPM samples the instructions and loads into AES
GCM. If two start addresses are encountered consecutively
DTPM resets the AES GCM module considering it as a wrong
branch prediction. After DTPM encounters end address,
DTPM loads last word into AES GCM module and stalls PC.
DTPM wait for the hash value from the AES GCM module.
Once Hash value is available, DTPM compares hash value to
the cache memory basic block hash value. If hash comparison
passes DTPM resumes PC and proceed to the next basic
block, else DTPM stalls PC.
E. Optimization of GCM module
We use Galois/Counter Mode (GCM) authenticated
encryption to calculate hash value of basic blocks, the 128 bits
fixed length authenticated tag is used as hash value. The
IV.
SRAS
A. High-level idea
As mentioned before, SRAS is our solution to defend
against ROP attacks. In a ROP attack, the attacker
overwrites the return address of a function to make the stack
pointer point to a different value (different address). Upon
function return, execution is not redirected to the original
calling function but instead to another instruction sequence.
SRAS can detect a ROP attack by using a shadow stack to
store a copy of the return address once a function is called,
then check each return instruction issued to the processor
[21]. Figure 7 describe how SRAS works intuitively.
SRAS can also defend against any attack that is based on
corrupting a return address, including conventional stack
smashing and all buffer overflow attacks which overwrite
return addresses.
V.
RESULTS
A. Valgrind issue
As mentioned before, the idea of checking the hash values
of basic blocks comes from paper [13] by Arun Kanuparthi et
al., and the authors do not mention how to get basic blocks
information from an executable file. When we try to
implemented the DTPM, we use Valgrind tool [14] to extract
the basic blocks from a C program running on Linux and
amd64 architecture. We make a mistake in assuming the
Valgrind supports all processor architectures, including
OpenRISC. Unaware of this mistake, we set out to implement
the AES-GCM hash function and the DTPM module. We only
notice the problem when we do simulation of the DTPM
module. If Valgrind supports OpenRISC, or we have time to
write such a tool for OpenRISC architecture, we would be
able to show how DTPM defends against TOCTOU attacks.
B. Simulation setup
Simulation is done with the help of Fusesoc [16], a
program that manages HDL code, simulates and builds systemon-chip solutions. Fusesoc stores a library of modules (called
cores in fusesoc) as well as instructions to use them. Most
cores and instructions can be downloaded from github
repositories, and they are already used and tested by developers
in the OpenRISC community. Hence, when one needs to
simulate or implement a core on a FPGA, Fusesoc will
automatically download the HDL code and use the given
instructions to let the FPGA design software (in our case, the
Altera Quartus) finish the job. In our solution, we modify the
mor1kx CPU to add the defensive mechanisms. To do that we
need to download the mor1kx source code, then create a new
core in Fusesoc.
Figure 7: SRAS high-level approach.
SRAS
N/A
N/A
N/A
D. Performance Overhead
Because we want to measure the performance of the
OpenRISC system with or without the defensive mechanisms,
we need run a benchmark program on the OpenRISC system
implemented on the DE0-Nano board. We decide to use
CoreMark [18] as the benchmark to report performance
overhead of our defensive modules.
There are two reasons CoreMark is the appropriate
benchmark option. First, it is the only common benchmark
used by the OpenRISC community [19] [20]. It may come
from the fact that OpenRISC architecture is not supported by
any architectural simulator. Also, benchmarking is not crucial
for OpenRISC, an open-source architecture aimed at academic
and non-commercial use. The second reason we choose
CoreMark is our defensive modules are added mostly to the
CPU of the system, and CoreMark focuses primarily on the
CPUs performance. As a result, CoreMark can reflect the
performance changes caused by CPUs modifications.
CoreMark benchmark consists of C programs that do
read/write operations, integer operations, and control
operations. Those benchmark programs run commonly used
algorithms including matrix manipulation, linked list
manipulation, state machine operation, and Cyclic Redundancy
Check. In general, CoreMark is used to test a processors
pipeline operation, memory or cache access, and handling of
integer operations. After running it, Coremark will give a
single-number score for easy comparison between processors.
Original Design
With Defense
(deactivated)
With Defense
(activated)
CoreMark Score
86.6961
86.6961
1.734
N/A
N/A
[2]
[3]
[4]
Condition
Development Board
Value
Terasic DE0-Nano
FPGA
Altera Cyclone IV
Processor Clock
50 MHz
[6]
Instruction Cache
32 KB
[7]
Data Cache
32 KB
MMU
Yes
Hardware Multipy
Yes
Hardware Divide
Yes
Floating Point
Single Precision
[5]
[8]
[9]
[10]
VI.
CONCLUSION
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
REFERENCES
[1]