Sie sind auf Seite 1von 45

Deobfuscation

Reverse Engineering Obfuscated Code


by Sharath K. Udupa, Saumya K. Debray and Matias Madou
Presented by Fabrizio Steiner
1
Overview
Introduction
Obfuscation Transformations
Basic Control Flow Flattening
Enhancements
Interprocedural Data Flow
Artificial Blocks and Pointers
Deobfuscation
Cloning
Static Path Feasibility Analysis
Combining Static and Dynamic Analysis
Experiments
Conclusion
2
Introduction: What is obfuscated Code?
Obfuscating is used to prevent reverse engineering.
Code is very hard do read and to understand.
Important for newer environments such as .NET or JAVA
Obfuscator takes a code block creates unreadable code.
Deobfuscator tries to reverse obfuscation. (statistically, dynamic
analysis)
3
Why obfuscate .NET code?
4
Introduction
Goals of obfuscation
improve software security
hard to reverse engineer code
protects the owners intellectual property
Could also be used to hide malicious software (prevent the
detection).
5
Introduction
Raises 2 Questions
What sort of techniques are useful to understand obfuscated
Code?
What are the weaknesses of current code obfuscation techniques
and how can we address them?
Please keep in mind these two questions during the presentation!
We will see for example the weakness of control flow flattening
6
Obfuscating Transformations
2 classes of transformations
1. surface obfuscation (focuses on syntax)
2. deep obfuscation (focuses on program structure)
7
Surface obfuscation
Harder for human to understand
No effect to determine algorithms
Variable renaming: easy undo by applying parser that resolves the
variable references.
8
Deep obfuscation
Changes the programs control flow and data references
Affects efficacy of semantic tools for reverse engineering
Working around deep obfuscation much harder than surface
obfuscation.
We only take a look on the harder one! Deep obfuscation taken from
Chenxi Wangs dissertation.
9
Basic Control Flow Flattening
Aims to obscure the control flow logic
By flattening the control flow graph, all basic blocks will have the
same set of predecessors and successors.
Control flow during execution is guided by a dispatcher variable.
Basic Block assigns the correct value for the next block to take.
Switch block uses the dispatcher, to jump to the block
10
int f(int i, int j)
{
int res = -1;
if (i < j) {
res = i;
} else {
if (i = j) {
res = 0;
} else {
res = j;
}
}
return res;
}
res = -1
i < j?
res = i
i = j ?
res = 0
res = j
return res
Y
Y
N
N
A
B C
E
D
F
Example: Basic Control Flow Flattening
11
int f(int i, int j)
{
int res = -1;
if (i < j) {
res = i;
} else {
if (i = j) {
res = 0;
} else {
res = j;
}
}
return res;
}
res = -1
x = i < j?1 : 2
A
res = i
x = 5
B
x = i = j ? 3 : 4
C
res = j
x=5
E
res = 0
x= 5
D
return res
F
switch (x)
x = 0
0 1 2
3 4 5
Example: Basic Control Flow Flattening
11
Enhancements
We will discuss 2 enhancements of basic block flattening now.
Those make it more difficult to deobfuscate the code.
12
Enhancement 1: Interprocedural Data Flow
In Basic CtrFl Flattening the values assigned to the dispatcher
variable are within the function.
The control flow is not obvious.
Reconstructing by examining the values assigned to dispatcher
variable.
Requires only intraprocedural analysis.
13
Enhancement 1: Interprocedural Data Flow
Improving by using interprocedural informations.
Idea: Use global array to pass dispatch values.
Every call-site writes the values to a random field of the array. Every
call-site has a different randomto this field.
Obfuscated code assigns values from array to the dispatch variable.
Locations accessed and the contents of these locations arent
evident by examining the callee code.
14
res = -1
x = i < j?1 : 2
A
res = i
x = 5
B
x = i = j ? 3 : 4
C
res = j
x=5
E
res = 0
x= 5
D
return res
F
switch (x)
x = 0
0 1 2
3 4 5
Example: Interprocedural Data Flow
15
res = -1
x = i < j?1 : 2
A
res = i
x = 5
B
x = i = j ? 3 : 4
C
res = j
x=5
E
res = 0
x= 5
D
return res
F
switch (x)
x = 0
0 1 2
3 4 5
int A[...]; // global array of indices
int w; // offset into array A
w = random1
A[w] = 3
A[w+1] = 1
A[w+2] = 2
A[w+3] = 4
A[w+4] = 5
call f(i,j)
caller 1:
w = random2
A[w] = 3
A[w+1] = 1
A[w+2] = 2
A[w+3] = 4
A[w+4] = 5
call f(i,j)
caller 2:
Example: Interprocedural Data Flow
15
int A[...]; // global array of indices
int w; // offset into array A
w = random1
A[w] = 3
A[w+1] = 1
A[w+2] = 2
A[w+3] = 4
A[w+4] = 5
call f(i,j)
caller 1:
w = random2
A[w] = 3
A[w+1] = 1
A[w+2] = 2
A[w+3] = 4
A[w+4] = 5
call f(i,j)
caller 2:
res = -1
x = i < j?
A[w+1] :
A[w+2]
res = i
x = A[w+4]
B
x = i = j ?
A[w] :
A[w+3]
C
res = j
x = A[w+4]
E
res = 0
x = A[w+4]
D
return res
F
switch (x)
x = 0
0 1 2
3 4 5
Example: Interprocedural Data Flow
15
Enhancement 2: Artificial Blocks and Pointers
Enhancement 1: Extended by adding artificial Blocks
Artificial Blocks
Some never executed
Difficult to determine with static analysis (caused by dynamically
indirect branch targets)
Adding indirect loads and stores (pointers) to these unreachable
blocks.
Confusing static analysis about taken dispatch values.
16
Enhancement 2: Artificial Blocks and Pointers
How it works!
Add 2 artificial blocks (B, B) for every basic block.
B will be executed, so indirect assignments through pointers set the
dispatch variable.
B also contains indirect assignments, never executed. Only for
confusing static analyzer.
17
Example: Artificial Blocks and Pointers
B
0
1 2
3
a = j
switch (x)
A
a = 1
x = 3
C
i = i1
a = a*i
D
return a
x = i < j ? 1 : 2
x = i > 0 ? 2 : 3
f:
x = 0
S
Init
18
switch (x)
A
p = &b
*p = 4
q = &c
*q = 6
x = 1
p = &b
*p = 3
a = j
x = b
p = &b
*p = 9
p = &b
q = &c
*p = 8
*q = 9
x = 8
*p = 3
q = &c
p = &a
C
i = i1
a = a*i
return a
D
0
1
2
3
4
5 6
7 8
9
x = 0
S
Init
f:
int a, b, c, *p, *q
p = &b
*p = 3
q = &c
x = 1
*q = 4
a = 1
x = i < j ? b : c
*q = 9
x = b
x = i > 0 ? b : c x = 6
x = 5
B
Example: Artificial Blocks and Pointers
B
0
1 2
3
a = j
switch (x)
A
a = 1
x = 3
C
i = i1
a = a*i
D
return a
x = i < j ? 1 : 2
x = i > 0 ? 2 : 3
f:
x = 0
S
Init
18
Deobfuscation
We now consider some methods for reverse engineering obfuscated
code.
Obfuscation inserts spurious execution paths into programs.
To cause bogus information during program analysis.
1 is the original control flow path and 2 is the spurious control flow
path.
2 introduces imprecision to program analysis,
where execution paths join.
2 1
A
B
19
Deobfuscation
Forward analysis (reaching definitions) results are tainted at the
entry of B.
Results of backward analysis (liveness analysis) are affected at the
exit of A.
To address this problem one could clone portion of the program
20
Cloning
Clone some parts of program, in such a way that spurious paths no
longer join original paths.
Applying cloning creates a new block
B
Improves forward dataflow information.
Backward dataflow arent improved. At exit of A we still have the
possibility to take the spurious path.

B
1
A
2
B
21
Cloning (2)
Goal of deobfuscation to identify spurious paths.
But where should we apply cloning? We dont know which paths are
spurious.
=> Simple approach clone every block where multiple paths join.
If obfuscater is known => improve the cloning.
22
S
A B C
Example Cloning
23
S
A B C
S
A B C
S1 S2 S3
A
B
C A B
C
A B
C
Example Cloning
23
Static Path Feasibility Analysis
Constraint-based static analysis to determine if a execution path
(acyclic) is feasible or not.

Given execution path with a set of live variables x at entry.

Construction of C such that (x)C is unsatisfiable if for all


executions of the programthe is never executed.

So is unfeasible.
24
Static Path Feasibility Analysis (2)
Many ways to construct the constrain.
We take into account arithmetic operations.
Propagation of information among a single path, not along all
execution paths.
Each instruction named , value after Instruction
Unknown value
n I
k
x
k
.
y .
25
Static Path Feasibility Analysis Rules
Assignment: =>
j most recent instruction defined y
Arithmetic: =>
expresses the semantic of the operation.
If semantic is not known, or either or is unknown
then
Indirection: Pointers can be modeled at different levels
: I
k
x = y. , C
k
x
k
= y
j
,
: I
k
x =y z C
k
x
k
= f

(y
i
, z
j
).
, f

n C
k
x =.
r y
i
z
j
26
Static Path Feasibility Analysis Rules (2)
Branches: for some boolean expression e
Unconditional branches treated as
Other: Analysis aborted and branch assumed to be feasible.
Constraint constructed as a conjunction of every instructions
A constraint solver could determine if the path is feasible or not.
: I
k
if e goto L
C
k

e if I
k
is a taken branch in ;
e if I
k
is not taken in ;
e e true,
C
k
27
B0
(1)
(2)
(3)
B1 B2
(4)
(5)
B3
(6)
B4 B5
x = 1
if (u > 0) goto B1
y = 2
z = x + y z = x y
if (z > 0) goto B5
Example: Static Path Feasibility Analysis Rules
28
B0
(1)
(2)
(3)
B1 B2
(4)
(5)
B3
(6)
B4 B5
x = 1
if (u > 0) goto B1
y = 2
z = x + y z = x y
if (z > 0) goto B5
= B0 B2 B3 B5
Example: Static Path Feasibility Analysis Rules
28
B0
(1)
(2)
(3)
B1 B2
(4)
(5)
B3
(6)
B4 B5
x = 1
if (u > 0) goto B1
y = 2
z = x + y z = x y
if (z > 0) goto B5
u
o
( )
[x
1
= 1 y
2
= 2 u
0
0
z
5
= x
1
- y
2
z
5
> 0]
= B0 B2 B3 B5
Example: Static Path Feasibility Analysis Rules
28
B0
(1)
(2)
(3)
B1 B2
(4)
(5)
B3
(6)
B4 B5
x = 1
if (u > 0) goto B1
y = 2
z = x + y z = x y
if (z > 0) goto B5
u
o
( )
[x
1
= 1 y
2
= 2 u
0
0
z
5
= x
1
- y
2
z
5
> 0]
= B0 B2 B3 B5
Example: Static Path Feasibility Analysis Rules
unfeasible
28
Combining Static and Dynamic Analysis
Static analysis is inherently conservative.
Set of paths from static deobfuscation is a superset of actual paths.
On the other way dynamic analysis cant consider all possible input
values.
What about combining them?
Start with underapproximated set of control flow paths from
dynamic analysis.
29
Combining Static and Dynamic Analysis (2)
Use static analysis to add paths, that could be taken.
Also possible first use static then dynamic analysis.
But the result set still could contain more or less paths.
Authors took first dynamic then static.
- Suppose we know a way to determine all taken paths during
execution.
- Mark these paths and propagate information only across these
paths.
30
Combining Static and Dynamic Analysis (3)
Conventional static analysis is degeneration where all paths are
marked.
Improving for analyzer: Mark paths that are taken during dynamic
analysis.
Propagate dataflow information along these paths.
If a branch is reached where only one outgoing path is marked, and
the outcome cant be uniquely determined. Add the untaken branch
to the marked (taken) set.
31
Experiments
Evaluated on binary optimizer working on x86 (PLTO). Deobfuscator:
Constraint-based path feasibility and dynamic tracing.
Implemented 3 control flow obfuscations as described earlier.
Obfuscated programs fromthe SPECint-2000 benchmark suite.
Original Obfuscated Effects of Obfuscation
Program Functions Edges Functions Edges
F
obf
/F
orig
E
obf
/E
orig
(F
orig
) (E
orig
) (F
obf
) (E
obf
)
bzip2 42 2,655 30 157,192 0.714 59.21
crafty 104 12,172 89 4,309,502 0.855 352.05
gap 825 43,079 768 1,973,980 0.930 45.82
gcc 1,792 99,516 1,398 8,816,058 0.780 88.59
gzip 73 2,916 59 107,882 0.808 37.00
mcf 19 799 19 16,756 1.000 20.97
parser 180 12,299 174 684,904 0.966 55.69
twolf 165 14,799 157 1,277,410 0.951 86.32
vortex 638 39,229 615 1,969,734 0.963 50.21
vpr 252 8,948 211 310,210 0.837 34.67
GEOM. MEAN: 0.876 59.43
(a) PLTO 32
Key:
Added: Number of edges added due to obfuscation = E
obf
E
orig
(see Table 1).
Removed: Number of edges removed by the deobfuscator.
% Over: Overestimation error relative to number of edges in obfuscated program =
over
/E
obf
.
% Under: Underestimation error relative to number of edges in obfuscated program =
under
/E
obf
.
are dened in Section 4.

over
= |{e | e P
deobf
and e P
orig
}|

under
= |{e | e P
deobf
and e P
orig
}|
PLTO
Program Added Removed % Over % Under
bzip2 154,537 154,537 0.00 0.00
crafty 4,297,330 4,297,330 0.00 0.00
gap 1,930,901 1,930,901 0.00 0.00
gcc 8,716,542 8,716,542 0.00 0.00
gzip 104,996 104,996 0.00 0.00
mcf 15,957 15,957 0.00 0.00
parser 672,605 672,605 0.00 0.00
twolf 1,262,611 1,262,611 0.00 0.00
vortex 1,930,505 1,930,505 0.00 0.00
vpr 301,262 301,262 0.00 0.00
GEOM. MEAN: 0.00 0.00
Experiments (2): Basic Flattening
33
Key:
Added: Number of edges added due to obfuscation = E
obf
E
orig
(see Table 1).
Removed: Number of edges removed by the deobfuscator.
% Over: Overestimation error relative to number of edges in obfuscated program =
over
/E
obf
.
% Under: Underestimation error relative to number of edges in obfuscated program =
under
/E
obf
.
are dened in Section 4.

over
= |{e | e P
deobf
and e P
orig
}|

under
= |{e | e P
deobf
and e P
orig
}|
Program Added Removed %Over %Under
bzip2 154,537 116,896 23.95 0.00
crafty 4,297,330 3,051,105 28.92 0.00
gap 1,930,901 1,177,850 38.15 0.00
gcc 8,716,542 4,936,993 42.87 0.00
gzip 104,996 74,111 28.63 0.00
mcf 15,957 15,198 4.50 0.00
parser 672,605 464,098 30.44 0.00
twolf 1,262,611 820,698 34.59 0.00
vortex 1,930,505 1,351,354 29.40 0.00
vpr 301,262 165,695 43.70 0.00
GEOM. MEAN: 26.89 0.00
Experiments (3): Flattening with
Interprocedural Data Flow
34
Key:
Added: Number of edges added due to obfuscation = E
obf
E
orig
(see Table 1).
Removed: Number of edges removed by the deobfuscator.
% Over: Overestimation error relative to number of edges in obfuscated program =
over
/E
obf
.
% Under: Underestimation error relative to number of edges in obfuscated program =
under
/E
obf
.
are dened in Section 4.

over
= |{e | e P
deobf
and e P
orig
}|

under
= |{e | e P
deobf
and e P
orig
}|
Program Added Removed %Over %Under
bzip2 165,639 130,743 21.76 0.56
crafty 4,403,750 3,169,697 28.21 0.01
gap 2,365,955 1,655,983 31.23 0.03
gcc 9,609,646 5,830,097 39.94 0.01
gzip 125,508 97,539 23.69 0.36
mcf 22,335 22,375 1.60 1.69
parser 786,423 590,215 26.09 0.02
twolf 1,401,063 973,949 31.18 0.03
vortex 2,275,709 1,735,787 25.00 0.02
vpr 386,508 259,889 34.21 0.08
GEOM. MEAN: 21.40 0.06
Experiments (4): Artificial Blocks and Pointers
35
Conclusions
Code obfuscation proposed by a number of researchers.
Rely on theoretical difficulty of reasoning statically kinds of program
properties.
Shown that combination of static and dynamic analysis bypasses
much of the effects of obfuscators.
Control Flow Flattening used in commercial obfuscators, can be
removed in a relatively straightforward way.
36
Questions?
Thank you for your attention
37

Das könnte Ihnen auch gefallen