Sie sind auf Seite 1von 14

Reversing a Simple Virtual Machine - Tutorial

1. Retrieving instructions and registers


Well, tonight I'm tired, I've downloaded a bunch of nice music songs that I like a lot, and it's time to
reverse. Having received requests about this tutorial, contrary to my attitudes I'll write a small
tutorial.
I've heard talking over and over of the HyperUnpacke!, so at end, I opened it. I fired my I"# $.%
&yeah, I don't use the cracked one... tool' are, after all, for those that can't do things without...
(o, I opened the crackme. It starts with a lot of ugly anti&I"# tricks, which requires to un&define )U
key* the +ump,call pointers, and then redefine the pointed area as 'code' )- key*. It hides the pointer
to .oad.ibrary and strings like i.e. /0irtual#lloc1 this way. 2k, funny but not interesting, we want
to see the virtual machine. Hoping it is not encrypted, otherwise we have to fire 2lly and unpack the
packer until the 0 is in clear...
(o, how do we search a 0 in the code, using I"# $.%3 (imple4 use the scrollbar and the most
ancient of reversing tools4 5en.
What are we looking for, what could be a '5en' point3 Well, When I browsed aspr 6.! dll I found the
push sequence followed by a ret to be '5en' point &indeed it was the to&do list of that packer. #nd
for a 03 Well, a 0 is formed by instruction emulation, which are usually function or addresses
to which a common loop of code +umps to. In this case, we look for pointers,functions list. 7es,
such lists can be many things. 8hey could be ob+ects, for e9ample, which are stored this way. How
can we distinguish them from a 0 &or, what if the 0 is coded with an ob+ect in H.3
8he answer is rather simple. (tart e9amining these procedures, and look for recurrent patterns. :or
e9ample, if they refer to the same parameters, and the same parameter seems to contain,be used in a
pattern among more than one of these functions, you might be in presence of a 0. ;ersonally, I
always try to find references to common attack points, as the program counter )the <I; equivalent*.
8his might not be always simple &i.e. binded flow 0s like =: are fairly comple9 )btw you can log
it with various techniques*.
>ut let's get back to the crackme. .et's say that scrolling, looking around and following randomly
+umps and procs we found an interesting list, such the ne9t one4
"oes not it seem interesting3 # long table of pointers. .et's then e9plore one of those secondary
links )the first table of links +ust point to the head of seconds &mmh?*
I"# gives us this stuff as data, but after pressing - for marking it as code it becomes...
Interesting, no3 #n XOR operation followed by a +ump. .et's press '-' on all the chunks, to see
what's happen4
8hese are the first place were I originally pressed '-'. <9amine the code. #ll these snippets +ump to
the same address, which means they have a common epilogue.
@otice the first instruction4 a repeating mov ecx, esi in all the entries? "oes it not sound as a pattern
to you &maybe the same logical parameter is passed in esi3 -learly, it is the shift count used in the
ne9t instruction, a shl. 8hey also uses the [edi] register as target area of the shl instruction in all the
snippets. #nd all the three code blocks present the same structure, changing only the memory
reference of the core )the 'acting'* instruction4 byte ptr, word ptr, dword ptr. "oes this might be a
virtual shl instruction in the three referencing possibilities3 7eah?
(o we have understood that here the source parameter for SHL is passed in esi, the destination
clearly in edi, and we have a sequence of shl on byte ,shl on word,shl on dword.
We have been lucky, however. 0s are often more comple9 from the structural point of the
instruction set. 8his 0 does not implements many of the comple9ities related to the different kind
of register,memory,displacement references within the instructions, as it seems to use a fi9ed
source,destination mark for the instruction4 esi is a generic pointer to source, and edi is a generic
pointer to the destination result )as we can see by reversing more, generic 0 registers are passed
to the 0 instructions by memory reference &i.e. If the destination of a SHL is the generic 0
register R1, edi would contain the pointer to R1).
#n usual and pretty standard attack point in 0s are the NO instruction equivalents. How can you
discover them3 (imple. 8hey do nothing but update the internal status of the 0. (o, an instruction
that +ust update a register which seems to be used as program counter can be very probably our
NO in such 0. 8his crackme's virtual machine is pretty straightforward, however, so we +ust
attacked it recogni'ing comple9 instructions directly.
@ow, it is time to reverse all these instruction blocks and name them. 8he result will lead to
something like this4
#ll these instructions are structured e9actly )more or less* like the shl one. 2ne interesting point to
observe is the idiv instruction. #s you may notice, it has divided in I"I0 and I"I0AB<(8. #s you
remember, I"I0 return also the remainder of the division. If you e9amine how the the ! virtual
opcode are implemented, you'll notice4
the idiv return in <"I a different register. 8his should make you think &why3 (imple. 2ne is the
result, the other the remainder. >eing the 0 instruction structured to work on binary set
)source,destination*, the author needed to duplicate the work of ternary instructions.
@otice that, before rebuilding a 0, I usually look to all the instruction set, trying to figure out
something important we haven't talked yet about. I always look for hints about the 0 register's
structure. :or e9ample, when I found the following instructions, I first thought4
/!SH"1333 Why do he need a ;U(H: instruction here3 He is saving the flags after a comparison.
mh... and then pops them on a structure related to the <#C register. Is <#C register's used with
displacements in other 0 code snippets3 7es, of course.
#t this point ask yourself4 why one should save the flags after a comparison within a relative
structure3 In case you did not understand this yet, the [#$X%&'h] clearly points to the virtual
<:.#D( register. (o we can open the I"# structure page, create a structure and add doublewords
until we create the /fieldAE-h1. Which we'll rename in 0A<:.#D( or such.
#s in the sample above.
@ow we have identified our first 0 register? .et's hunt the other, while reversing opcodes. #mong
instructions, we find also the ne9t one4
When I saw it I noted4 it takes a fi9ed 0 register )fi9ed because the offset from the 0 structure
base, e(x, is fi9ed, 6Eh* and subtract $. 8ake the operand from edi mask out the last ! bytes and
then store them. What asm operation do you know that decreases a register when writing3
-'mon... maybe it is more clear now...
...I hope I needed not to comment it. 8his is ;U(H "W2B".
#nd another 0 register is uncovered. .et's go on, we still miss the <I;, the generic registers...
.et's find them. >rowsing the instructions, we can find4
@ow, this instruction has the same layout of the '), but it features a *+ instruction. It is a +ump,
good. #, must be used here, as we +ump somewhere, so we must alter the #, register somehow.
We already know what <#CFE-h is, it is our 0A<:.#D(. (o, here the virtual eflags gets moved
in -;U eflags, and G5 is e9ecuted. If the +ump is @28 taken, the edi parameter is moved within
e(x%-. We know that e(x contains our 0 conte9t, so we can bet that the instruction parameters
that gets copied there is... our new #, after the +ump )technically, this means that the instruction is
*N+, not *+?*.
(o...
We found the 0 <I; register. @ow, try yourself to identify the ne9t instruction4
I won't give you any hint, e9cept that is clearly an instruction that uses <(; and <I;. 8hink please.
#nother last interesting point. 7ou should always keep in mind that the 0 author is not ties to
follow an 'rule' when coding a 0. (o, instruction are not needed to be 'standard'. 8hey can do
anything their creator wishes. :or e9ample, one instruction does this4
7ou should notice this4 it uses the real <(; register? Why3 It saves the real <(;, than take the
virtual stack and set it as the B<#. stack. #nd call a function via <"C. 8his means that this virtual
machine is capable of making calls in real -;U space, by pushing virtual parameters in the virtual
stack and then calling this instruction, which swaps the stacks )it reminds a bit the stack switching
with parameters copying between inter&privilege gates, if you know well processors*. #lso note that
the return value of the real&cpu e9ecuted function is saved within our 0 conte9t, somewhere...
I reversed almost all the 0 set and registers in half an hour, and you can do the same, with little
effort. 8here are only a bunch of instructions that are more comple9, but they are not important for
0 reversing )I mean, for understanding the general structure*.
Well, it is time for me to go to sleep, very very late? Hope you appreciated the small tute.
a9imus
2. General VM Structure
...ne9t time H&*
...Well, ne9t time has come, let's fire the mp% player with '.iga' 4&*
If we e9amine the general structure of a 0, we usually find a big cycle that takes care of running
the 0 across the virtual assembler, emulating this way the comple9 stages the processors e9ecute
when fetching, decoding and e9ecuting instructions. 8he Hyper-rackme! uses this generic 0
structure4
6. (etup the 0 -onte9t.
!. <nter the 0 loop.
%. Bead byte at 0.<I; address and check the instruction type, supporting various instruction
types4
6. >inary Instructions.
!. Unary Instructions.
%. :low -ontrol Instructions.
$. (pecial Instructions.
I. "ebug Instructions.
J. @2; and H.8 )alias /Kuit 01* instruction &the latter ending the 0 loop.
$. Gump at start of the 0 .oop.
8his structure is general enough to be kept in mind. :rom a generic point of view, each 0
contains the following elements4
8he initiali'ation block,function of the 0irtual achine
# loop block,function that scan and e9ecutes the instructions of the 0 program.
# generic block,function that decodes the 0 instruction's opcode, with its parameters,
registers, inde9ing modes and anything the 0 creator wanted to place on.
# list of 0 instruction code blocks, which perform each an instruction duty. 8hey are
roughly the equivalent of the micro&code modern -;U's uses for decomposing and
e9ecuting common #( instructions.
# set of macro&instructions, specific to the 0 and not easily mappable to #( opcodes.
8hese instructions might be harder to understand.
#n e9ample of the Hyper-rackme! initial structure elements can be seen by e9amining the
following commented I"# snip4
#s you can see, the R#S.$R./0)/RO'#SS is the point )!* of the above description, whereas the
part under the 1( short ,S/!N$R2/,NS.R is equivalent to the )%.6* point. 8he code in this snippet,
apart cleansing the registers, prefetch the first instruction 2pcode )the byte pointed by 0)3#,* and
analyse it for choosing which 'e9ecution unit' of the 0 should be utilised for the instruction type
being fetched.
.et's now e9amine one of the 'building block' of this 0, the Setup/4i5(ry/,5structio5/(r(ms
function, which takes care of processing the binary 0 opcodes. :or e9amining the ne9t fragment,
remember that <#C contains our 0A-2@8<C8. (o, we already know that e(x%- refers to our
0)3,#.
I think it is important now to understand what we are looking for, or analysis will be useless. We
are trying to recover the 0 Instruction structure, together with a more detailed description of the
0irtual achine structure. 8he procedure that fills up the parameters for the binary instructions
must know how to decode the binary instructions, so by e9amining how the bytes that makes an
opcode we can rebuild the 0 instruction format. What should we e9pect to find3 It depends
heavily on the comple9ity of the instruction set, as it depends entirely by the author choices. Which
we must reverse. (o, we must always e9amine carefully how the instruction's byte are utilised, as
they can change from instruction type to instruction type. #nd please remember that 0 instruction
are not compelled to be always of the same si'e, as 9LJ instruction's are not all of the same si'e...
7ou won't be able to apply the method used below to other 0s. <ach 0 uses its own opcode
and 0 structure, so you should try to understand what fragments are used to hint its
reconstruction.
.et's start by e9amining this code4
8his snippet should be clear4 we load the seco5d byte pointed by our virtual #,, [e(x%1], then we
move it on the dl register. >efore commenting in detail this point, we should keep notice we've +ust
used one of the byes that makes an instruction. .et's move over.
8his snippet is pretty similar )conceptually* to the prior one. #$X still contains our 0)3#, address,
and now the third byte forming the opcode is loaded in memory and tested )technically, only the
high nibble of it is tested, as you can notice by the (5d,shr pair*. #nd notice the instruction that
follows. #6, contains our 0A-2@8<C8 pointer here. (o, the #'X register contains a dword
inde9, which is applied to the 0A-2@8<C8 for retrieving a dword pointer, which is then
offseted by 6Eh. >ut do you remember3 0)/'ON.#X.F1&h MM 0)/#S. 8his means that when
<-C is Eh here, we got the <(; register addressed. #nd when it is 6h, the dword after it is
addressed, until the 6I
th
"W2B" after <(; )a nibble ranges E&6I, you'd know...*. (o, we detected
right now a possible usage of the third byte of the binary opcodes &at least of its upper nibble. 8he
snippet below is the area where we +ump if we are successful in the 17 instruction used in the code
above.
#s you can notice, it takes the value that follows the first dword from #$X )which is our 0)3#,*
and places it in #6,. #nd we know that #6, will contain at end the destination parameter of 0
opcode? 8his help us understanding that the first dword is used only for the opcode purposes, and
after it we have opcode parameters.
8his is what we know of our 0A-2@8<C8 right now4
.et's continue our analysis of binary opcodes, and try to map the 0AI@(8BU-8I2@ format. We
have already encountered the offsets F6,F! of our 0 instruction, so lets e9amine the last one, the
F%4
8his byte is directly loaded in ecx using )O0SX. 7ou should already understand what I'm about to
say4 why )O0SX33 this byte is then added to the <"I parameter, which contain our destination
parameter. Why should we need to add something to our parameter3 "isplacement, of course...
(o, we now can rebuild the instruction's structure for >inary instruction's4
I agree I haven't commented much this part. >ut the reason is that it is very '0&dependent'.
3. Reversing VMs Guidelines
8he steps shown in prior chapters are an important step toward the comprehension of a 0.
7ou can initially skip the structure of a 0 instruction, as long as it is not
decrypted,decoded within each instruction.
#t this point, we must e9amine deeply the instruction set trying to find something
recogni'able, as the @2; instruction &which might not be included at all.
2nce the instruction set is starting to result clear, at least in minimal part, a special care must
be set by looking for possible 0 register's usage. <ventually their usage won't be clear, as
they can be 'shifty', remapped upon each 0 entry etc. but we don't care. Nnowledge is
incremental, and making errors is human &especially if you abuse of 5en for quickening
your analysis by intuition H&*
#t this point, we must attack the 'living heart' of the 0, its decoder. It contains all the
important information's of the 0 and the structure of the 0 instructions, as it is usually
responsible for the scheduling and performing the instruction )pre&*processing. 7ou must
remember that often the decoder have to analyse the 0 instruction for discovering things
like the opcode length, parameters and so on. >ut it is also possible that part of the
management is performed in the instructions itself &i.e. making instructions of fi9ed si'e )i.e.
6J bytes*.
#nd then3 8hen we must get back to the instruction set, trying to understand specific, non&
standard opcodes that perform creative duties that are usually not part of a processor )i.e.
-alls to 'real' functions, #;I functions, calculation blocks etc. etc.*.
#t this point we have decoded most of the 0, and we might try to debug an instruction or
two to se if things are as we e9pected, and if 0 registers follows up our scheme.
>ut before or later you have to get coding for dumping the 0 ;rogram in comprehensible
shape. 7ou might wish to write an I"# plugin )if you don't use $.% like me* or a script for
decoding the 0 program. 2r much simpler but slightly less effective, you can code a
logger, which is simply an hook in the 0 instruction table, for each instruction )simply
make your debugger&loader and use breakpoints which you defer in the breakpoint event, or
in+ect a dll which hooks the table*. Whenever an instruction is called, your hook dumps the
opcode name, and the parameters. (o, you can rebuild the flow of the program. #n useful
add&on to the logger is a 0.<I; dumper, which allows you to assign the right key to each
0 instruction, and eventually the possibility to 'alter' the result of conditional +umps, so to
allow the logger to e9amine the ma+or part of the 0 program and eventually 'skip' long
cycles. .ater, you can reassemble most of the 0 program it using the 0.<I; logged for
each instruction.
Well, I hope this can help you all to understand 0s better. I saw is common style in tutorials to
place credits, so my thanks to the -ommunity and my friends 5ero and H#02N.
Begards,
a9imus
6I&6J,O,!EEJ
:or the curious, this is my I"# analysis of the binary parameter's setup decoder of the crackme4

Das könnte Ihnen auch gefallen