Sie sind auf Seite 1von 52

-----cut here------------------------------------------------------------------Usefull Tools in Cracking + Cracking Chinese Horoscope 1.0 (newest version) by --..__J_o_h_n_n_y__A_U_M__..

-(TNT) --------Motto for my actions:------I'm for peace, love and prosperity and one global nation but without money to divide us and without ego, who keeps men separated! Be a man of good sense - be naturally, be divine! Try to progress on spiritual way! No God, no freedom! I'm against tyranny under any form, against mondial iudeo-masonic occult domination and against infiltrated bad rase of aliens! Out with Satan from this planet! Real happiness, free and freedom for all! -------Hi, guys, the forces of divine are back! I decide myself to write this tutorial about usefull tools in cracking and helping tools, in the ideea that this could be very usefull, in special, for beginners in cracking. Don't worry about this word - beginner - we are beginners in something all our life! Before the begining of descriptions I wish to personally thank to all this wonderfull cracking tools authors who put great efforts, time and intelligence to produce such of state of the art programs! In fact, I need to thank to all who have something to create and to give to others for improving their lives and work (or passion) in better! All this tools or most of them can be found on these sites: http://protools.cjb.net http://protools.hpage.net http://w3.to/protools http://www.suddendischarge.com http://202.103.100.253/hambo/cracktools.html http://www.crackstore.com/tools.htm Or ask any advanced cracker to help you! In this tutorial I would not describe SOFTICE, W32DASM, SmartCheck, ProcDump or HIEW - these tools are detailed in older tuts! Now, Universal Pictures presents: THE TOOLS! 1. One of the most important tool after the tools named above, after me, is a compare tool: RixComp 4.87 - my choice (site: http://www.radsoft.net). Soon you will have a tutorial for cracking and enhancing this fine tool (this with the generosity and help of CIA & tKC, thanks a lot, guys!). I use this tool in

combination with a DOS comparing tool - bfc.com (Binary File Compare). You can find these tools at www.geocities.com/john_aum/john_files/crk_tools.zip . I alternate RixComp with bfc.com (DOS program) because bfc.com give me very fast the exact number of differencies that exist, interesting, no? 2. A program hardly needed, after RixComp, is a program who automatically makes cracks in DOS and/or Windows - here are 3 most used by me: DOS PatchIt 2.2 (very good on many differencies); in Windows, but DOS style Eliminator Patch Compiler 3.01 (by our beloved tKC, I love this man!) and for Windows PatchMaker 0.99a. Of course, these progs makes cracks from comparing your original_prg.exe and your_cracked_copy.exe files. Soon, if posible, (thanks to CIA and tKC) I'll make a tutorial for improving PatchMaker 0.99a, because with the improvings maded by me, the work with this program becomes faster, and we wish to work faster, if posible. And because authors don't appear with a new version, I'll do it, for helping me & you. Beginners must notice that for advanced cracks, like double or triple cracking (3 files in one move) or for modifying Windows Registry must make their own cracks or search the WEB for more complex patchers. 3. WinPatch 1.2.8 (from www.artistryinsoftware.com) is a great patcher (tKc use this, too) for patching a file or a group of files (update them) even if the new files are bigger in dimensions. Atention, when you patch, you must make 2 different subdirectories: one with original file(s) and second with modified or bigger file(s). You need this proggie badly, so grab it now! 4. ConfigSafe 3.06.04 (or older versions)! You absolutely need this extraordinary tool! What it can do? WOW! Ha, ha, ha! Can do this: find any modification maded in structure and number of subdirectories or files, in Windows Registry, almost anything it moves on your hdd, can be detected! This after installing a new soft or after a new entry and exit in and from a program. The program will tell you where is counting the days remaining on your trial period and many others about new writings in registry or hdd. For me, even more extraordinary is that I've never seen this program to be

recommended by crackers, maybe they are keeping this secretly! The time of revealings has come - you can find the trial version of this program at site: http://www.configsafe.com/html/demo.html . I'll tell you how to crack it on a coming tutorial (with the help and kindness of CIA and tKC). 5. Registry Crawler 1.21-2.0 (http://4developers.com) - very usefull when you need to find something very fast in Windows Registry or to go on same adresses (bookmarks) on many times. You must have it! 6. File Info 2.30 - this program can identify many files types and can tell you in what language is build an executable or if it is crypted or packed and with what software. Can tell you if a document is in Word, txt, html or enhaced txt & many others; this even on renamed extensions. Is one of the best around!Grab it! 7. DeShrink 1.6 - this program can deshrink or decrypt executables or binary files maded with Shrinker 1.0 - 3.4 for you, so in this way you can have access to real code of program with w32dasm with/or hiew tools for directly modify the bites. 8. UnAspack 1.0.8.3 - this nice software can unpack beatifully for you any program packed with Aspack untill version 2000 & 2.1. You must have this! 9. UPX 0.xx-1.00 - good compressor and decompressor of executable or binary files. All versions can be found on http://upx.tsx.org or newest at sites above. Excellent packer and unpacker for DOS or for Windows GUI progs. It's a must for a cracker! 10. Other important decryptors or unpackers: Bye PE-Crypt v1.02, UNP V4.11, UnArmadillo v1.1.1, PEunCompact v0.01 and many others. If you found an unknown crypter/packer search for it on www.suddendischarge.com, for example. For packing your cracks (to be smaller) use Aspack 2000 or 2.1 or UPX 1.00. 11. An interesting tool is Compare2Crack/486 v0.06b (c2c.com).This fine DOS tool will provide you very fast a list of all modifications (differencies) between 2

executables or binary/dll files: the original and your cracked copy of it. This list will be created as a txt file. Very usefull somethimes! 12. Another fine tools: help2com, com2exe, exe2com, com2txt, Topo 1.2(for adding of a window at the begining of a program), bat2exec, loupe.exe (for viewing details in a bmp or icon), htmstrip.exe (for converting an HTML into a txt file), xdoc.com (for converting a txt file into a DOS exe), PRIVATE EXE 2.2 (put a password on your GUI exe file - see my tutorial from tKC tut no. 72), cracker.exe and/or pcracker.exe (cracking progs that crack with help provided from *.crk files), old SOFTICE 2.x-2.8 (DOS progs) and others. 13. GameTools 3.23 - a program with functions and design similar to SOFTICE;this program can help you a lot in debugging and cracking DOS games and softwares. 14. SuperSnooper for Windows - this nice proggie will show you only the text from an executable; you can download a similar (but DOS) program from my site: www.geocities.com/john_aum/john_files/crk_tools.zip. In the zip are even more usefull progs. Also take a look at my infos from www.geocities.com/john_aum; all there is for you; be well informed and tell to others! 15. Exescope 5.12, Restorator 2.50, Resource Grabber 2.42, Resource Hacker 2.3.0.2, Resource Scrutator 1.21, MultiRipper 2.70: all these are progs that grabs quickly the resources from DOS and/or GUI (windows) softwares like icons, bmps, delphi resources, jpgs, mods... You must have these excellent progs! And Exescope is one of the best! Decompress or decrypt first if neccesary! 16. And of course you could provide yourself with languages of programming for to produce your own cracks in DOS or Windows: Turbo Pascal or Borland Pascal 7.0, Turbo C 2.0, Visual Basic 6.0, Delphi 5.0, MS Visual C++ and others. Search after them with mega-engine www.profusion.com or www.metacrawler.com . Of course, this too: Masm 32 5.0 or another ASM GUI compilers. And Learn ASM well! 17. I almost forgot! Here must be reminded also HEX Workshop 3.02: a fine hexeditor of exe & binary files (or the others). Nice functions: can shorten or

prolong files, copy or replace fragments from files (an icon or a bmp for instance, when Exescope can't). I've tested many hexeditors, but this appears (in my opinion) to be the fastest & easiest! Of course, I alternate this program with Hackers's View, the best ASM & hexeditor for crackers! 18. If you use nfo files near your cracks, use NFO Builder 0.9b or newest version to easily build your own nfo files (http://fnw.tsx.org) . At the end: all these (after me - of course i'm subjective, anyone is) fine tools mentioned in this tutorial are one of the most important in every day job of a cracker. -------------------Cracking Chinese Horoscope 1.0 (newest version) WWW: http://www.springsoft.com Cracker: --..__J_o_h_n_n_y__A_U_M__..-Protections to be removed: expiring, some nags and disabled options Tools: W32Dasm, Hacker's View, both backgrounded by Windows Commander 4.03 Altough in tKC tut no. 74 is my tutorial about cracking Chinese Horoscope (first version ever) now you only can download from www.springsoft.com the 1.0 version, so here you have on short, how to crack this version. Due to the lack of space and because almost all steps in cracking this software are similar to tut 74, except the new adresses where to modify bites, I'll not give you details on how to crack these identical protections, only the final modifications (but you must practice if you wish to advance in cracking): - first protection (copy enabled) 812D - 75 -> EB - second protection (print enabled) 8357 - 75 -> EB - third protection (begining nag) B647 - 55 -> C3 The new NAG protection added by the producer in this version, will be explained by me in detail, now. - first, make a copy of chscope.exe -> y.exe and dissasemble it with w32dasm; - press ALT-S-F (search in w32dasm) and look for words that appear in error nag (after imputing dates) -> "You may only view people..."; OK! We found the place where is this nag! Press PageUp, you see this call USER.MESSAGEBEEP, yes? Above it is the conditional jump 75 03 (w32dasm adress 2.1347). We could make this 90 90 for never show us the nag but we observe at

2.1331 - 74 03. If we make this instruction 90 90, the program will go faster at the right adress, jmp 13F6 (2.133 & 2.1349), will no longer process other instructions untill 2.1347. So, let's make this je from hiew adress 6E31 ->9090. Bingo! Working just fine! NAG is gone forever! Now, exactly like in tutorial 74, delete jqlreg.ini (from c:\windows) and rewrite with Windows Commander 4.03 the year of prg. subdirectory from 2000 to 2050, for no expiration until year 2050 (put any year you wish - 3000, for ex.). ---------------Greets: tKC (my love too!), CIA, TNT, PC, CORE, all crackers, PRO or newbies, all cracker teams (keep going, we must eliberate from iudeo-masonic tirany, all must become free), we are great guys, and nice too. Love you all (but you must be a good soul!). Romanian Greets: Salutari tuturor crackerilor din Romania! Daca doriti cu adevarat schimbari in bine, luati ca optiune de vot si pe cei de la Romania Mare! Au aratat prin fapte ca sunt oameni iubitori si de omenie! O sa ne astepte si zile mai bune, ginditi optimist, Dumnezeu e aici cu noi! At last, but from all my heart: I love you Heavenly Father, I know you are with me all the time!!! God is love! Try this: www.geocities.com/john_aum Incredible infos for YOUR EYES ONLY!!! E-mail: johnny_aum@yahoo.com ---------------Sorry if my english is not perfect!----------------------------------cut here-------------------------------------------------------------------

END _______________________________________________________________________ _____ The Registry Torn Apart --- Ankit Fadia<ankit@bol.net.in> _______________________________________________________________________ _____ http://blacksun.box.sk The registry is a hierarchical database that contains virtually all information about your computer's configuration. Under previous version of Windows, those setting where contained in files like config.sys, autoexec.bat, win.ini, system.ini, control.ini and so on. From this you

can understand how important the registry is. The structure of the registry is similar to the ini files structure, but it goes beyond the concept of ini files because it offers a hierarchical structure, similar to the folders and files on hard disk. In fact the procedure to get to the elements of the registry is similar to the way to get to folders and files. In this section I would be examing the Win95\98 registry only although NT is quite similar. The Registry Editor The Registry Editor is a utility by the filename regedit.exe that allows you to see, search, modify and save the registry database of Windows. The Registry Editor doesn't validate the values you are writing: it allows any operation. So you have to pay close attention, because no error message will be shown if you make a wrong operation. To launch the Registry Editor simply run RegEdit.exe ( under WinNT run RegEdt32.exe with administer privileges). The registry editor is divided into two sectios in the left one there is a hierarchical structure of the database (the screen looks like Windows Explorer) in the right one there are the values. The registry is organized into keys and subkeys. Each key contains a value entry , each one has a name, a type or a class and the value itself. The name is a string that identifies the value to the key. The length and the format of the value is dependent on the data type. As you can see with the Registry Editor, the registry is divided into five principal keys: there is no way to add or delete keys at this level. Only two of these keys are effectively saved on hard disk: HKEY_LOCAL_MACHINE and HKEY_USERS. The others are jusr branches of the main keys or are dynamically created by Windows. HKEY_LOCAL_MACHINE This key contains any hardware, applications and services information. Several hardware information is updated automatically while the computer is booting. The data stored in this key is shared with any user. This handle has many subkeys: Config Contains configuration data for different hardware configurations. Enum This is the device data. For each device in your computer, you can find information such as the device type, the hardware manufacturer, device drivers and the configuration. Hardware This key contains a list of serial ports, processors and floating point processors. Network Contains network information. Security Shows you network security information. Software This key contains data about installed software. System It contains data that checks which device drivers are used by Windows and how they are configured.

HKEY_CLASSES_ROOT This key is an alias of the branch HKEY_LOCAL_MACHINE\Software\Classes and contains OLE, drag'n'drop, shortcut and file association information. HKEY_CURRENT_CONFIG This key is also an alias. It contains a copy of the branch HKEY_LOCAL_MACHINE\Config, with the current computer configuration. HKEY_DYN_DATA Some information stored in the registry changes frequently, so Windows maintains part of the registry in memory instead of on the hard disk. For example it stores PnP information and computer performance. This key has two sub keys Config Manager This key contains all hardware information problem codes, with their status. There is also the sub key HKEY_LOCAL_MACHINE\Enum, but written in a different way. PerfStats It contains performance data about system and network HKEY_USERS This important key contains the sub key .Default and another key for each user that has access to the computer. If there is just one user, only .Default key exists. . Each sub key maintains the preferences of each user, like the desktop colors, the fonts used, and also the settings of many programs. If you open a user subkey you will find five important subkeys: AppEvent It contains the path of audio files that Windows plays when some events happen. Control Panel Here are the settings defined in the Control Panel. They used to be stored in win.ini and control.ini. Keyboard Layouts It contains some advanced code which identifies the actual keyboard disposition how it is set into the Control Panel. Network This key stores subkeys that describe current and recent network shortcuts. RemoteAccess The settings of Remote Access are stored here. Software Contains all software settings. This data was stored in win.ini and private .ini files. HKEY_CURRENT_USER It is an alias to current user of HKEY_USERS. If your computer is not configured for multi-users usage, it points to the subkey .Default of HKEY_USERS. Description of .reg file Here I am assuming that you already have a .reg file on your hard disk and want to know more about how it is structured.Now do not double

click the .reg file or it's content will be added to the registry, of course there will be warning message that pops up. Now to view the properties of the .reg file open it in notepad. To do so first launch notepad by going to Start>Programs>Accessories>Notepad. Then through the open menu open the .reg file. Now the thing that differentiates .reg files from other files is the word REGEDIT4. It is found to be the first word in all .reg files. If this word is not there then the registry editor cannot recognize the file to be a .reg file. Then follows the key declaration which has to be done within square brackets and with the full path.If the key does not exist then it will be created. After the key declaration you will see a list of values that have to be set in the particular key in the registry.The values look like this: "value name"=type:value Value name is in double commas. Type can be absent for string values, dword: for dword values and hex: for binary values. For all other values you have to use the code hex(#): , where # indicate the API code of the type. So: "My "My "My "My string" = "string value" is a string dword" = dword:123456789 is a dword binary" = hex:AA,BB,CC is a standard binary other type" = hex(2):AA,BB,00 is an expand string

Important Note: expand string has API code = 2 and extended string has API code = 7. As you can see, strings are in double quotes, dword is hexadecimal and binary is a sequence of hexadecimal byte pairs, with a comma between each. If you want to add a back slash into a string remember to repeat it two times, so the value "c:\Windows" will be "c:\\Windows". Before write a new .reg file, make sure you do this else you will get an error message. Command Line Registry Arguments FILENAME.REG to merge a .reg file with the registry /L:SYSTEM to specify the position of SYSTEM.DAT /R:USER to specify the position of USER.DAT /e FILENAME.REG [KEY] to export the registry to a file. If the key is specified, the whole branch will be exported. /c FILENAME.REG to substitute the entire registry with a .reg file /s to work silently, without prompt information or Warnings. That wraps up the Windows Registry. Ankit Fadia ankit@bol.net.in To receive more tutorials on Hacking, Perl, C++ and Viruses/Trojans join my mailing list:

Send an email to programmingforhackers-subscribe@egroups.com to join it. Visit my Site to view all tutorials written by me at: http://www.crosswinds.net/~hackingtruths

http://blacksun.box.sk - Black Sun Research Facility. Best tutorials on earth! END

Bitmanipulation instructions ?/h1>


1998 by Cruehead / MiB

Hello there!
The reason why I'm writing this is because lately I have been getting a lot of emails all asking the same question: "What on earth is XOR, and what does it do?". So, Instead of having to say the same thing over and over again, I decided to write this short little essay on the subject of the "Bitmanipulation Instructions". Enjoy... First of all, let's talk about what a byte really is...for many of you this is nothing new, so you can skip this part if you wish. A byte consists of 8 bits which all can hold a value of either 0 or 1. For example, here is how the letter 'X' looks like in binary form: X - 01011000 How do I know this? You can get this information pretty quickly...First you need either the HEX value of the letter 'X' or the DEC value. A very comfortable way of getting the value is using our beloved debugger Softice. First of all enter Softice (Ctrl-D) and on the commandline enter: ?'X' Now you'll see something like "00000058 0000000088". That means that 58 is the hex value of the letter X and that 88 is the decimal value. You can get

this information in other ways as well, looking it up using an ASCII table is perhaps the best way. Now that you know the dec value of the letter you can load up the calculator that comes along with windows. It's one of the few programs that microsoft has developed that actually can be usefull. So, now that you're in the calculator, make sure that you have choosed the "advanced" setting in the menu and enter the dec value that you previously got - 88 in this case. Now click on the "bin" check box..and voila - You got the binary form of the letter 'X'...nice, huh? Ok, now let's move on to the part that you all have been waiting for - the bitmanipulation instructions! There are a couple of these instructions, and you'll very often see these when you're on the "cracking highway". We'll talk about the most common ones, beginning with... XOR?/b> This instructions is a very important one, and perhaps the biggest reason to why this essay is written. What kind of information can we get about this? First of all, let's take a look what PcHelp has to say about it: "Performs a bitwise exclusive OR of the operands and returns the result in the destination.". Ok, did that brighten things up for you? Well, didnt think so either, so I'll try to explain it. Let's go back to our example again and use the letter 'X'. What do you think an instruction like "XOR 88,65" would do? As you already know - 88 is the dec value of the letter 'X' and 65 is the dec value of the letter 'A' (you should be able to figure that out by now). Let's take a look what happends:
Character X A Dec value 88 65 Binary form 01011000 01000001 00011001

Result after XOR 88,65: 25

Ok, Let's focus on the binary part. What XOR really does it that it compares one bit at a time. If they are the same, the result bit is set to 0, if they are different the

result bit is set to 1. We can show it like this instead:


0011 0101 -----0110

Ok, now that you (hopefully) understand how it works, your next question will problaby be something like "What can it be used for?". As you might now, XOR is used quite alot when it comes to simple encryption needs. I'll show you why here:
XOR 88,65 = 25 XOR 25,88 = 65 XOR 25,65 = 88 (from our example)

You see how easy it is to get the original value? Take a look at this:
X XOR'ed with 57 is 89 (note that X stands for "unknown" here)

And now you want to know what X is...Then you can simply use XOR 57,89 and you'll get the value of X. Another thing that this instruction can be good for is if you want to set anything to zero...Let's say that you want to empty the EAX register. There are a few ways of doing this, including:
SUB EAX,EAX MOV EAX,0

Sure, both of these instructions works fine, but we can use XOR instead...but how and why?
XOR EAX,EAX

That also sets EAX to zero...the only difference is that this method is faster (ie takes less CPU time) than the others and that's why it's commonly used. So now when you see this while cracking, you'll know what's going on. All the other bitmanipulation instructions works simular, lets take a look at... AND?/b> Now that you know how XOR works, It's easy to understand how AND works...We'll use our example once again:
Character X A Dec value 88 65 Binary form 01011000 01000001 01000000

Result after AND 88,65: 64

Also AND compares all the bits one by one. If both are set to 1, the result bit is also set to 1, otherwise the result bit is set to 0. Ok, let's quickly move on to another instruction.

OR?/b> Once again our example is used:


Character X A Dec value 88 65 89 Binary form 01011000 01000001 01011001

Result after OR 88,65:

As like the others, also OR compares the bits one by one. If both bits are 0, set the result bit to 0, otherwise set it to 1. Well, I think that's enough for you right now...hopefully now you're atleast somewhat more clear of what these instructions do (otherwise both my time and yours were wasted). Mail me if you want to ask/complain/send money to cruehead_@hotmail.com Cruehead / MiB'98 Back to Tutorials page... Copyright MiB 1998. All rights reversed.

END
The Great Dead-Listing Excavations or what we (could) see in disassembled code SvD Jan'99 ------------------------------------------------------table of contents: 1. Introduction. Assembler/Compiler and Disassembler/Decompiler 2. Dead Data 2.1 byte, word, dword, .. 2.2 alignments/packing 2.3 offsets 2.4 addresses: near, segments/descriptors, far, normalized/huge 2.5 zero value 3. Dead Code 3.1 outside-Function (call) structure: pre-call actions / call / post-call actions 1. Prepare and pass arguments to function 2. Call function 3. Restore stack, if needed 4. Use the results 3.2 inside-Function (call) structure:

prologue / main part / epilogue 1. Initialization and stack frame creating 2. Argument receiving 3. Exits and result passing 3.3 Interrupts - structure, arguments, results 3.4 Crazy instructions (or crazy processors ?) 3.5 Obvious and non-obvious calculations; Logical value calculations; Arithmetic optimizations 3.6 Deadloops: JMP self 3.7 Nasty instructions: JMP [eax]; CALL [eax] 3.8 Meaningless instructions 4. Disabling code, limiting demo-versions, etc.. 5. Some final notes 6. How to learn more on this (kind of magic)

This is a (little long) Memory for those times, when the only way to debug a program was to dump the (codes of) disassembled listing on a line-printer (until all the paper in the building finishes ;) and, then, chewing the pencil, examine for long hours how that material (the code-thread) goes on and on ... Needs: 1. some disassembled listing 2. some good knowledge of the processor's features and instructions (architecture and assembler, addressing ways/meaning) 2b. probably, some particular target program's features - you should always know very well your target from outside before slipping your fingers there - it could bite you ;) 3. patience 4. Readiness to start from the very beginning Again As usual :), I will start with an anecdote. This time it is strictly to the point ;). So, one sunny day the female-ant and the male-elephant get married. They celebrated, then slept together the night. On the morning, suddenly, the elephant died. And the ant said sadly: "oh,no, my God - one night BIG fun, and now, all lifetime digging..." The situation for me (and probably for you) is the same. Once long-long ago i cracked some simple program, believed in my power and abilities, and up to now... I am digging. Code, Data, Files, Directories, Pages, ....., or all in one word, Shits. Well, I'll try to show some paths around the labyrinths of that HOLE... I have tried to make the things below OS and processor independent; but, sorry, the intel i80x86 architecture influenced me a lot; so don't be furious if your JMP SHORT is not coded as 0xEB; try to extract the essence and use the particularities as examples. And, also, I do not claim for absolute exactness and completeness. For i80x86 assembly combined with some compiler, i have made a macros-file for almost all the frequently used occasions and compilers in my experience, like arguments-in-stack-positioning; saving/restoring registers; function/variable naming etc.; you could use/change it if you like: svdmacro.asm.

I am using here the C language examples and hex-number-notation; so 0xAC stays for 0ACh (in intel assembler notation) i.e. decimal 172. 1. Introduction. Assembler/Compiler and Disassembler/Decompiler Every (micro)processor could be considered as a "byte-stream-ofinstructions driven language-interpreter" machine. When that interpreter-machine fetches a byte-code, which stays for some instruction, depending on the type of the instruction the interpreter would do some action, or fetch more bytes to precise the action, or get arguments. For variables of that language-interpreter there are the registers of the processor and/or the direct accessible computer memory. Assembler (or assembly language) is the language, that is directly associated with that machine byte-codes and instructions i.e. the specific processor's external language (external - because most processors themselves are also small-computers, working with internal microinstructions; sooner or later we will have to deal with them too). Also Assembler is called the program that translates textually written assembly language program into machine codes. Compiler is a translator from some more high-level language (Fortran, Pascal, C, Ada, C++, hundreds others) to Assembly language or to machine codes directly. Every high-language operator or construction (i.e. high-level-instruction) may produce one or Many assembler instructions (note there are compilers and code-generators that could even expand/unroll loops and recursions as much as you like! not talking about inlining memcpy() etc.. simpler stuff). So Disassembler is the tool that reverses the machine codes into assembly language (e.g. 0xEB xx stays for Jump Short xx bytes forward/backward for intel i80x86 processors) and is ALMOST straightforward procedure, while Decompiler is something that must re-build the original high-level language constructions like if-then-else; assignments; loops; function/procedure calls etc... which is not a straightforward and is usually not possible because of optimizations made over the machine code which remove some redundancy needed to understand what exactly it was. If there are no ANY optimizations, it could be possible to make a decompiler to anything - just following/translating backwards the code-stream-produced-by-thespecific-compiler and reversing it. But not for 100% sure. Here we deal with Assembly language stream, produced by a disassembler, trying to understand the logic of the program, i.e. to deCompile it. We are the decompiler. BUT DO NOT expect to receive the original source. You can't learn assembly here - you need it to understand most of the things explained (like registers, memory-addressing-kinds etc). Note also that some processors can (replace or emulate) their predecessors. So a 32bit processor may behave as 24, or 16bit one. There are several modes - real,protected,virtual,etc... But I'm not going to explain these features too wide here; just as it is needed by the topics. I will not explain here self-referencing and self-modifying code - but one should be aware that such things HAPPEN (for example, look at some executable-packing/unpacking/encoding/decoding techniques).

2. Dead Data There are some specific rules/dependencies/things around representing data in computer's memory. All they influence too much code-generating and should be understand well, so therefore are covered here. Here are also described some features of the code in representing it as data (byte-stream - from processor's view it IS only another kind of data). 2.1 byte, word, dword, .. A BYTE is something the could be represented in 8-bit (there were also 7-bit and 9-bit bytes, but they disappeared). Like one ASCII character. Or, unsigned integer number 0..255 (0x00..0xFF). Or signed integer -128,-127,..,0,..+127 (0x80,0x81,...0,...0x7F) (intel's notation). It is (in most widespread computers) the least atomic piece of memory, that could be read/write at once. All other numbers/structures/streams/etc..data-pieces consist of whole bytes (in memory representation). Of course, two data structures could use different parts of same byte (sharing it), but it is unusual. Word is processor's widest-bit-size integer that can be processed (Read/Write/Add,And,etc...) at once (here 1/2/4/8/16/32/...bit processors) - do not mix up with processor's address space - there are for example 8-bit processors with 16bit address space, like good old M6502 in Apple][. I do not say there are no instructions that access more than a Word (see LDS/LES for example - they could be used for quick AND small loading of more-than-Word at once!) Usually, accessing whole processor Word is easier/smaller/faster than accessing parts of it (or more than a Word). Also addressing a Word placed on Word's boundary could be easier/faster than if not on a boundary; i.e. addressing Word at address 0x346 could be slower than at 0x344 for an 32bit processor (0x346 is not divisible by 4), but will not matter on a 16bit processor (see, I already mixed up wordsize and addressing-size :). The same is valid for the code: JUMPing to 0x458 could be faster than JUMPing to 0x455. AND sometimes it could be impossible to create/hook to some special kind of routine if it's start is not Word aligned. When representing bigger-than-1-byte integers, is important how the bytes are ordered. In Intel's notation, first byte is the least significant; while in Motorola (for example) it is the most significant. So checking if an (unsigned) integer is odd (by TEST'ing it's bit 0) could be done in i80x86 by testing first byte ONLY (i.e. the byte at the address of the integer). Sometimes, integers could be represented in smaller-size ones if could fit - e.g. instruction like MOV eax, 5 could have a 0x00000005 (i.e. 0x05 0x00 0x00 0x00) in it's code, but also could have 0x05 ONLY (it depends on the specific processor). WORD in i80x86's assembler notation is (now not) the processor's Word, but 2byte/16bit integer. Be careful not to mix up with processor's Word - they have same meaning only for 16-bit processors. DWORD (Double Word) in i80x86's represent a 32bit/4byte number (again be careful not to mix up - it is the Word of a 32bit processor). Why they still keep naming things wrong - may be, for

"retrogressive compatibility". Or by some nostalgia ? Adding a value to some other value could cause an overflow, and the resulting value will be "wrapped" - i.e. only the part of it which is above (max-integer-of-that-bit-size) will remain. THIS is particularly IMPORTANT with pointers (see below). example: 0x70+0xA0 gives 0x10 (and overflow flag risen) for adding BYTEs. Usually processor can access memory by all smallest-or-equalthan-its-Word chunks - BYTE,WORD,DWORD. But not for all processors and not for all instructions. example: there are all MOV al,0; MOV ah,0; MOV ax,0; MOV eax,0, but you can't access the upper 16 bits of the eax directly. Or you have ESI and SI, but not less. Or, if you PUSH ax in 32bit mode, it will be same as PUSH eax (eventually with zeroed upper 16bit) - stack goes up/down only by one whole Word. When processor modes are mixed, one byte-code-of-instruction may mean different things. Example: byte-code for MOV AX,BX in 16bit is SAME as MOV EAX,EBX in 32bit. To specify different access-mode, there are sometimes mode-overriders, i.e. you could do MOV AX,BX in 32bit mode, but it will coded more strangely. 2.2 alignments/packing As I said above, speed of accessing data may depends on how data is ordered in a structure (Packing) and where it is laid (Alignment). Sometimes, less space is preferred. So, some structures could be represented EXACTLY as they are defined; or/and where they are defined. Sometimes, speed is most wanted feature. So, some structures consisting of different-bit-size numbers like {byte a; WORD b; byte c;} could be intentionally represented as equal-sized numbers like {WORD a; WORD b; WORD c;} or even {int a; int b; int c;} if "int" is same as the processor's Word. This could be done by hand (not usual), or by special instruction to the compiler - by so called packing or alignment control {i.e. you are describing the first kind of structure, but the compiler represents it as the last one). This way, two WORDs (16bit-integers), laid one after another, could take 4 bytes (space saving) or 8 bytes (speed - there will be 2 bytes unusable "hole" between them). Also, have in mind, that sometimes the compilers (and/or the programmers :) optimize code so much, that some small structures are loaded as a Word into some register at once, ignoring redundant pieces afterwards. Be careful - different compilers have different Default alignment (structure packing) - e.g. for Watcom it is 1 byte (i.e. space saving), while for Zortech is sizeof int (need4speed). AND, most compilers do not make any difference between datastructure alignment and data-packing - they change both with same value. Alignment of the code is hand-controllable in assembler, but is usually subtle in compilers/code-generators, depending on the optimization bias preferred - space or time. So you can expect NOPs or some bullshit filling some (unused) "holes" in the code - e.g. between some function end and next function's start (to make it start at aligned boundary); or after an unconditional JUMP. 2.3 offsets Offset is the difference in the addresses of two variables or two instructions, i.e. address1 + offset = address2; you do not add

addresses, you could only subtract them, producing offset; you can add/subtract offsets, producing another offset; offsets could be positive or negative; addresses have no sign. Usually offsets are used as processor's Word (regardless of addressing-space-limit) - expanded if needed. There could be: - data-offset-variables in data: { char type; int offs_to_next_chunk; } - data-offsets in data-addressing part of the instructions: MOV ax, [si]+2345; - code-offsets in code-address-related instructions: JMP SHORT address is actually coded as JMP offset-from-nextinstruction-to-address; sometimes to distinct between address and offset, the offsets are shown with sign, e.g. JMP +05; - also code-offsets in data, but this is unusual: { MyFuncPtr pf; int offset_in_func_to_put_checksum_there; } The offsets are represented as plain integers, but could be smaller size if could fit (see explanation about byte-ordering above). 2.4 addresses: near, segments/descriptors, far, normalized/huge Pointer is the address of some data/code. Data and Code could be different or equal notions - it depends on the processor and it's current mode - sometimes it's possible to access code as data and vice-versa, sometimes it is not. In some processors, to expand the reachable addressing space, idea of segments (or descriptors) is used - there are several segments(and registers) which have fulladdressing-space coverage. All of them could share/overlay same real memory; or separate one. In case of intel i8086 and successors/emulators (i386 in real mode is Almost an i8086), they choosed the worst possible: the segments share same memory, all of them are read/write anywhere; and the combination segment/offset is not unique: segment is shifted 4 bit left (not 16!) and then added to offset; thus resulting in a 16+4=20bit addressing space, but there are 2^12=4096 logical combinations segment/offset pointing SAME real address. To cover whole addressing space (1M) of an i8086 one needs to change the segment register 16 times - offsetting gives only 64K. The (above mentioned) "wrapping" of offsets is the PAIN in 16bit modes, because within same segment, if offset is 0xFFFF, by incrementing it you will get not Next value really, but something somewhere around zero of the segment (i.e. far away backward); and adding 8 to 0xFFFF:0x0009 will wrap (as a result) the 1M addressing space, so you will get the physical address (20bit) 0x00001. Near pointer: an address in one segment only; i.e. offset only. Therefore the instructions always have a segment of use - a default one (DS,ES,SS,CS - it depends on the type of instruction) or other than default, stated by so-called segment-overriding (a specialinstruction that changes the addressing of the next usual instruction). Far pointer: an address possibly in other segment, i.e. "absolute" address - it contents the segment and the offset. In standard i80x86, if stored in memory, offset is first, then segment/descriptor. This layout is NOT mandatory, but is used by processor's LDS and LES instructions. And, in most compilers (if the pointer is maintained by the compiler AND NOT by the programmer himself). Huge/normalized pointer: Far pointer, which is always unique; e.g. all more-significant bits put in the segment, and the offset is

from 0 to 15 (0..0xF) only; or vice-versa. Pointer arithmetics: one could subtract near pointers if he is sure they use same segment; subtracting far pointers is very tough work, usually done by special (internal compiler) function. In i386 and above processors, in i386 mode, offsets are 32bit (i.e. whole addressing space) and segments are actually a logical descriptors - they point to some physical (again 32bit) base address which is the logical zero of that segment. Thus, a descriptor+offset is simply base_address+offset (no shifts, no redundancy). Of course, same physical address could be accessed through several different descriptors, but it should be done Intentionally (and could be made impossible since descriptors have properties like read-only-memory, limits, memory-mapping, etc). So, logical and physical addressing are very Different things here. Subtracting far pointers (i.e. descriptor:32bit_offset) is almost meaningless, because descriptor's base address are usually unknown. A near pointer in 16bit i86 mode is 16bit offset; in 32bit i386+ mode it is 32bit offset; in both cases the offset is added to some segment/descriptor's-zero, but in 32bit it is more transparent and obvious (and you never know where really you are; but, knowing where really you are in most cases doesn't have any meaning). In hand-made assembly one can play with segments a lot; but compilers always observe some strict choosen usage. Even, some compilers (Metaware HighC) do not allow far pointers at all (in i386 mode), because they (and the notion of far pointer itself) are too machine-dependent. Example: MOV ax,[345] and MOV ax,[857] could address same WORD in some 16bit compiler-generated program - depending on value of DS (data segment) register; while MOV eax,[345] and MOV eax,[567] is usually never be the same in a 32bit compiler-generated program; I do not say it is not possible to make it (by hand) intentionally. 2.5 zero value The most strange value around is the Zero. In modern computers, it is represented as binary zeros. But you should be careful - the programmer or problem itself could need Another Shifted zero. (hey, don't think the character '0' is a Zero, it is a character and have ASCII value 0x30). A zero offset means no offset at all, but be careful what that offset is added to: JMP SHORT +00 is not a never ending loop, but same as NOP (ok, a bit slower and badly-cacheable - which is sometimes useful ;). CALL +0000 could be a way to run the function-code-thatfollows twice - once in the CALL and once After the RET; second time RET will exit the whole function (huh, if that code is not selfmodifiable, though - see some self-unpacking techniques). A Zero pointer is something special - especially when the pointer is near - and could be VERY different from the REAL physical Zero address. It's [[Using and understanding depends tightly on the context]]. Examples:context]] 1) MOV AX,[0000] will get what is written in 0000:0000 only if data segment DS register is also 0; otherwise it is only just the beginning of the Data segment; and if this i8086 is emulated as a virtual machine, 0000:0000 points to the start of memory given to the machine, which is far-far-away from Absolute Zero; but could be "zero"-enough for a 16bit program - it may never know it is emulated

;);

2a) MOV EAX,[00000000] will get you to the beginning of the Data descriptor, which, in 386 protected mode, is usually prohibited by the extender/OS; so you will get a protection fault instead; But, under some extenders (DOS4G) it is The Physical zero; 2b) MOV EAX,[0F0000000h] will give you the contents of THE EXACT physical Zero address under FlashTek-32VM DOS extender; Sometimes, the (above mentioned) "wrapping" of values/ registers/ addressing-space (which in most cases is a PAIN) could be rather useful. Thus, in FlashTek-32VM, getting the base of a DS descriptor and subtracting it from zero will give us a Near offset to the absolute Zero address - adding anything to that offset (on DS descriptor) will "wrap" around the addressing space, and start from Zero again. This is only an example - how to obtain the physical Zero address under 386+ protected-mode control is a theme for another BIG essay (see svdXmeg0.asm for several extenders-and-different approaches, including memory twice-remapping). So watch very carefully - [[seeming-obvious mistakes are sometimes intentional]].sometimes intentional]] 3. Dead Code As for general code understanding, learn well the assembly language AND processor's architecture/features (from programming point of view, of course; no need to know what each transistor is for ;). There is one (nasty from reverser's point of view, but VERY good and useful otherwise) feature in the best compilers (example: HighC), named INSTRUCTIONS_LOW_LEVEL_SHEDULING. The compiler knows which instructions are inter-dependent and which are not (both as a program, and as processor-features). So after generating code, it re-orders independent instructions in a "strange" (but better operating) way, which will use the best of the processor's architecture - e.g. pipelines, parallel executing, etc. So sometimes a=b;x=z;g=r*t; may not look exactly this way (i.e. could be very mixed-up) in assembly. But if there is some "y=func(a,x,f,g)" afterwards, it WILL BE Afterwards - the real sequence of the things IS NOT touched. One could force the compiler to put texts (i.e. strings) into code segment, and not in the data. Thus, every function, that uses some textual values, will have them after or before it's code. That's why sometimes a function ends, but the next one could start far away from there. [[Use processor's or compiler's limitations to guess which is what]]. Every compiler has its own (constant) purpose for the available registers. Examples of some of wide-spread purposes are: SP,BP for accessing arguments and temporary variables; AX (and DX) for returning result; SI/DI for fast "register" variables or for array-addressing; etc... Some of them are always saved before touching (and restored afterwards) because contain some needed value for the work afterwards; some of them are never saved/restored, because are always used as temporary ones. NOT ALL registers could do everything in any processor - The older processors have very particular specialization of theirs registers, while modern ones do not (e.g. in 8086 no MOV [AX]+5 baseaddressing exists, only basing by BX,SI,DI; while in 386+ one can use almost every register as a base). This could make understanding of a

good-compiler-generated code hard. I mean that same pieces of source code (like expanded inline function) will be translated almost same way to simpler processors or by simpler compilers; while good compiler for good target processor could make it every time in different way e.g. 1st time using EAX/ESI/ECX; next time EDI/EBX/EDX. And, of course, the logic of the source code will be the same, so things (at logical level) will look similar, if not same. 3.1 outside-Function (call) structure: pre-call actions / call / post-call actions The USUAL way of calling a function is: 1. Prepare arguments, then pass them all; OR Prepare and pass one by one; 1a. (possibly) save some state 2. Call function 3. Restore stack (clear arguments, if needed) and possibly some state 4. Use the results (p.4 and p.3 could be mixed up) 1. Prepare and pass arguments to function This is made by three ways: through stack, through processor registers or through some special static/dynamic program-dependent structure. For stack method there also two ways. Usual one is by PUSH-ing arguments there (which decrements the stack pointer - stack is filled from higher-addresses to lower). THis way is simple and takes less space, but is a bit slow and needs stack-pointer restoring every time. The unusual one (Metaware HighC), is without touching stackpointer every time (only once in caller-function start), and by MOVing things at proper place in stack just like if they were PUSHed. (in i386+ one could address things in stack by [ESP]+offset addressing). This way is faster, no need to touch/care-about stack pointer all-thetime, but is unusual, and sometimes, takes more space - PUSH EAX is 1 byte, while MOV [ESP]+4,EAX is 4. The order of arguments in the stack depends on the language and compiler - standard "C" notation pushes them backwards (last argument first), but for C++ there's no standard and there could be differences. And also, if the function is a method-of-a-class, i.e. "this" pointer is used, different compilers place it differently, but only first or last (before/after all other arguments). Example: Zortech pushes C++ parameters like 1st,2nd,...last,this; HighC pushes them like last,...2nd,1st,this. For register calling convention (e.g. Watcom) arguments are put into registers (almost all, could be) and just then if more arguments remain, into stack. There is special instruction for the Watcom compiler in describing functions (example: see it's bios.h), telling which arguments are put into which registers (i.e. the default convention could be overridden easily). Passing pointers (and references - they are pointers of a bitdifferent kind) is not much different from other values, but as i said above, one can use processor's and compiler's limitations to guess which is what. Example: far pointers need both offset and segment to be pushed, so pointer arithmetics needs addition-with-overflow, which is made only through AX+DX couple. So, the value that (finally) before pushing, was in AX, was the offset, and that in DX was the segment.

Also, if required far-pointer is a data-pointer, it is usually in DS of the caller, so compiler could PUSH DS; PUSH value_offset instead of PUSH value_seg; PUSH value_offset; the same is true if it is code-pointer, but then CS is pushed. [[Do not expect that all the arguments will be grouped and pushed together just before the call]]. Sometimes it is so, but sometimes the pushes are immediately after the calculations, and there could be several pages of code between pushing two neighbour-standing arguments. The usual way of saving of the current state (of registers, or variables, structures, etc..) to prevent it's change by the function is done inside the function (i.e. who is going to change something, he should care to save the state). But some times it could be done outside the function, by the caller, somewhere before the call (and restored afterwards). 2. Call function ________________ Well, the CALL itself could be made in many similar ways, but be aware that sometimes compilers/linkers make optimizations. So, if a function is declared FAR (i.e. needs far return address because could be in other segment), and it could be currently reached by near CALL (i.e. is in same segment with the caller), some compilers will PUSH CS; CALL near func_offset instead of CALL FAR func_seg:offset. Also, you could find that JMPs (or PUSH address; RET/RETF which is the same, but is used for example, because one cannot make a JMP FAR [AX] - only near one) also can be used here - if the called function knows where to return or doesn't return at all... 3. Restore stack, if needed If function is standard "C" style, it returns same stack pointer as on entry. So here the arguments should be removed from stack (OF COURSE only IF they were PUSHed there - another reason to use MOVto_stack instead of PUSH). Usually it is done by ADD SP,nnnn, where nnnn = number_of_arguments * Word_size_in_bytes. It is done after each call, but it is possible (by hand) several calls to be made and just then an ADD SP,summ_of_nnnn after them. In some code-generators, for space-tightening, instead of ADD SP,nnnn is used POP some_trash_register one or several times. But it is only used for less than 2-3 arguments. If the function in Pascal/C++ style, it will remove arguments itself. 4. Use the results If result(s) are returned standard way (i.e. not through some program dependent structure), they are in the registers. Every compiler has his own way of handling this, but some things became almost-standard over the i80x86 assembly. Usually the return value (integer, or pointer) is in AL, AX or EAX; if it is longer than a Word, the higher part is in DX (EDX) - e.g. for far pointers or long integers. If a function should return a structure (not reference, but whole structure, which is a VERY stupid way of passing results, IMHO, especially if bigger than one Word), it is done in other way. A space for the structure is put aside as temporary variable in the stack; a pointer to that space is passed to the function as additional argument; inside or outside the function the constructor of the structure is called (if any). So at return we have a filled temporary structure in the stack of the caller (which most frequently uses afterwards are to copy that temporary structure somewhere else - why

not simply pass a reference of the final instead of all that abracadabra...).

recipient to the function

3.2 inside-Function (call) structure: prologue / main part / epilogue 1. stack frame creating 2. saving state (registers) if needed 3. argument receiving 4. function body 5. result passing 6. restoring state (and stack if needed) 7. exit (and clear arguments from stack if needed) Both preliminary (1,2,3) and post-processing (5,6,7) actions could be mixed up AND/OR placed inside the function body. Usually (!) every function has an initializating part, main part, and ending part. The initialization may consist of setting up internal stack frame, creating (place for) temporary variables, saving some incoming state or registers; extracting arguments; the ending part may consist of result sending, destroying/cleaning-up the temporary variables/stack frame, restoring state. Of course, it is possible to have a function without these things. But, this is the usual compiler-generated way. Everything else should be hand-made, OR made by (recently unusual) good compiler (but there are SUCH - e.g. HighC, or Watcom). There could be Several entry points with a bit different numbers or lists of arguments and/or behaviour (well, it is like having several different functions, but sharing same code sometimes), or using one base sub-function for several different purposes. Example: if func5 has 5 arguments, there could be second entry point, which skips 1st argument, i.e. that func4 will have only 4 arguments (like C++ default argument-values). Or, if a function's behaviour is made to depend on some register, e.g. CX, it could have several entry points (each like MOV CX,number; JMP realstart) - this is actually an "inlined" switch(), but the base subfunction is invisible from outside. 1. Initialization and stack frame creating Stack goes from up to down. To have a new stack frame, means to move the stack pointer to a new place, and to use that space in between new and old one for temporary variables. But, one should 1) save the stack pointer before touching it, OR 2) know exactly how to restore it back. Examples: 1) MOV DI,SP; ..body.. MOV SP,DI; 2) SUB SP,8 ..body.. ADD SP,8. The older i80x86 processors was not able to address anything in the stack using the stack pointer - only PUSH,POP,CALL,RET were available. And another special register called BP (base pointer) was invented (why? ask them,not me!). So all the things in the stack, were, if not PUSHed or POPed, addressed by BP. For this purpose every function starts with saving BP and getting the SP there: PUSH BP; MOV BP,SP; and ends with restoring BP (and SP if it was touched) - POP BP; (MOV SP,BP); RET... A space in stack for temporary needs is put aside with simply SUB SP,number_of_words_needed. Newer processors (286) involve new instructions, which combine all these thing into one. ENTER number_of_words (instead of PUSH BP; MOV BP,SP; SUB SP,nnn); and LEAVE number_of_words (instead of ADD SP,nnn; POP BP).

The i386 and successors are able to address things in stack basing on ESP as well as on EBP. But, most of the compilers still generate same needless instructions (for EBP) even when there's no need (see examples below) - it is hard to break a habit. Of course some of them take advantage of that feature, thus making code better, but a bit harder to comprehend. It is possible to create a new stack frame anytime, i.e. the function could have some (not stack related) processing before or even without creating stack frame/accessing stack-arguments - if no need OR no temporary variables and all arguments are in the registers (example: Watcom generated code!). The usual way of saving state to prevent it's change by the function is done inside the function (i.e. who is going to change something, he should care to save the state). But some times it could be done outside the function, by the caller, just before the call (and restored afterwards). As not all of the registers are important, some of them are saved (if touched inside the function), some are not. 2. Argument receiving As I said about argument passing above, there are several ways i'll cover here only the stack-based arguments, because there are some almost standard layouts there. If nothing is touched immediately after the call, the stack pointer points to the return address. So, 1st (or last - it depends on the passing order) argument should be at SP+(size_of_address). Therefore, for near i8086 calls, the arguments start at [SP]+2; for far call at [SP]+4; for near i386+ call: [ESP]+4; far i386+ call: [ESP]+8... ; the temporary place for variables starts at SP-offset to variable. If something is pushed, the offsets above should be increased with the summed size of the pushed things. The same goes on for the BP register - but with an additional correction of plus one WordSize, if BP/EBP is pushed immediately after entry (it could be saved in other way - by MOV somewhere, but it is unusual). Thus, the offset to arguments start is, if based on ESP: [ESP]+sizeof_funcAddress+sizeof_things_pushed_before_access; and if based on BP/EBP after PUSH BP done: [BP]+size_of_funcAddress+WordSize. The next argument is accessible by Adding the (rounded up to whole Word) size of the previous one to its offset (also, this way one could see what size is some argument: by subtracting the offsets to next one and it's own). First temporary variable is at [BP]-WordSize; (or [ESP], but it is not used this way). The above calculations are GENERAL - i.e. valid for any compiler/code. But there could be differences in processor's Word size, in way of addressing (BP/EBP/ESP), order of arguments, and presence or absence of a C++ "this" pointer as an argument. Some of them are stated and expanded for several different compilers in svdmacro.asm. 3. Exits and result passing As i said above, older processors use MOV SP,BP; POP BP; RET at end of routine; newer could use LEAVE nnn; RET. Depending on the calling convention, the function may destroy the arguments-passed-toit in stack (by RET number_of_Words) or leave them there (by simple

RET). While there is usually one entry point, there could be several exit points from a function; or several JMPs to one exit point; or (!) several JMPs from several functions to same exit point (if all of them have same size of arguments. Before restoring stack all the saved-at-entry-things should be restored - thus POPing registers or whatever (e.g. direct values to memory). Methods of passing results i have explained above - usually AX, or AX/DX pair, or EAX is used. 3.3 Interrupts - structure, arguments, results An interrupt is a function that is called by some hardware event or by INT number instruction (or by this unusual hand-made sequence: PUSH flags; CALL FAR calculated_address). As it should be accessible from any point in memory, it should be always a FAR function (i.e. requires also code segment/descriptor), returning with RETF. As it should be callable at any time, and as there is no standard "transinterrupt-store-place", all the needed arguments/results should be in registers AND all the registers should be saved/restored at exit (except those containing the result, of course). Therefore, a usual interrupt structure is: PUSH any registers touched, including segments/descriptors do something POP all the above registers IRET eventually with CLI and STI somewhere in most important points. (Thus, the frequently encountered string "PSQRVW" or other similar patterns in older programs are simply codes for PUSH AX; PUSH BX; PUSH CX; PUSH DX; etc sequences :-) In newer processors there is special single instructions for allat-once - PUSHA and POPA - but not frequently used there was a mistake in POPA in one of i386 clones). Usually if no much stack used, the interrupt uses the caller's stack; but sometimes it saves/sets-own-stack/uses-it/restores. 3.4 Crazy instructions (or crazy processors ?) Could you calculate a*4+b+37 in One instruction? It depends on the processor. More sophisticated the processor is, more sophisticated methods of addressing are possible. And if there's an instruction that gives you the chance to get the result of some addressing-methodcalculation, you are happy (1. you will save space; 2. these calculations are FASTER than any other; 3. but it may stop or stall the processor's pipeline, which could result in slower overall execution, so they are of limited usage). Intel's i80x86 have an instruction called LEA (Load Effective Addressing). It calculates the address through the usual processor's addressing module, but do not use it for memory-access, but stores it into target register. So, if you write LEA AX,[SI]+7, you will have AX=SI+7 afterwards. In one instruction. And in i386, you could have LEA EDI, [EAX*4][EBX]+37. In one instruction! But, if the multiplier is not 1,2,or 4 (i.e. sub-parts of the processor's WOrd) - you can not use it - it is not an addressing mode. [[Always Try to (find and) use special instructions/ functions/ variables/ features, which are intended to do something else, but could well do your job too]]. Example: Your repair code won't fit into

the small-unused-space-in-the-code you have found? There are LDS/LES instructions that could save you space - they load two Words (or actually a far pointer) in once. But [[There is always a trade-off for some extra feature]]. Almost always optimizing space slows down and vice versa - optimizing speed makes thing larger. Or, any optimization makes the code unportable and/or unintelligible. Example: PUSH and POP register are one byte instructions - useful for space-saving - but have sideeffects of touching stack (which is slow and sometimes undesired). PUSH SI; REP MOVS; POP SI is smaller, but uses stack; while MOV DX,SI; REP MOVS; MOV SI,DX is faster, but uses DX. This Razor is nasty from creative's point of view but is sometimes useful for reversing - the programmer's / compilers should take this into consideration, and this puts some limits/standards on the (not-intentionally-hand-made) code. 3.5 Obvious and non-obvious calculations; Logical value calculations; Arithmetic optimizations Okay, you should know that XOR reg,same_reg and SUB reg,same_reg means same - fast (inside processor) zeroing of the reg. That testing if a reg is zero (and other characteristic) is done by TEST reg,reg; or AND reg,reg; or OR reg,reg; (instead of CMP reg,0) and JMP-by-condition afterwards. But what should mean the following three code-pieces? 1): Segment: _TEXT DWORD USE32 00000018 bytes 0000 8b 44 24 04 example1 mov 0004 23 c0 and 0006 0f 94 c1 sete 0009 0f be c9 movsx 000c 0f 95 c0 setne 000f 0f be c0 movsx 0012 03 c1 add 0014 c3 ret 0015 90 nop 0016 90 nop 0017 90 nop 2): Segment: _TEXT DWORD USE32 0000001c bytes 0000 55 _example3 push 0001 8b ec mov 0003 53 push 0004 8b 55 08 mov 0007 f7 da neg 0009 19 d2 sbb 000b 42 inc 000c 8b 5d 08 mov 000f f7 db neg 0011 19 db sbb 0013 f7 db neg 0015 89 d0 mov 0017 03 c3 add 0019 5b pop 001a 5d pop

eax,+4H[esp] eax,eax cl ecx,cl al eax,al eax,ecx

ebp ebp,esp ebx edx,+8H[ebp] edx edx,edx edx ebx,+8H[ebp] ebx ebx,ebx ebx eax,edx eax,ebx ebx ebp

001b

c3

ret

3) Segment: _TEXT DWORD USE32 00000016 bytes 0000 8b 44 24 04 _example3 mov 0004 f7 d8 neg 0006 19 c0 sbb 0008 40 inc 0009 8b 4c 24 04 mov 000d f7 d9 neg 000f 19 c9 sbb 0011 f7 d9 neg 0013 03 c1 add 0015 c3 ret

eax,+4H[esp] eax eax,eax eax ecx,+4H[esp] ecx ecx,ecx ecx eax,ecx

Well, they mean SAME - the following simple function: int example( int g ) { int x,y; x = !g; y = !!g; return x+y; } First code is made by HighC. It IS OPTIMIZED as you piece is by Zortech C. Not so well optimized, but shows NON-obvious calculations: NEG reg; SBB reg,reg; INC reg; means: if (reg==0) reg=1; NEG reg; SBB reg,reg; NEG reg; means: if (reg==0) reg=0; see. Second interesting else reg=0; else reg=1;

And it is WITHOUT any JUMPS or special instructions (like SETE/SETNE from 1st example)! Only pure logics and arithmetics! Now one could figure out many similar uses of the flags, sign-bit-placein-a-register, flag-dependent/influencing instructions etc... The third example is again by Zortech C, but for the (sameoptimized-by-hand) function: int example( int g ) { return !g + !!g; } I put it here to show the difference between compilers - HighC just do not care if you will optimize the source yourself or no - it always produces the same most optimized code (it is because the optimization was pure logical; but it will NOT figure out that the function will always return 1, for example ;); while Zortech cannot understand that x,y,z are not needed, and makes new stack frame, etc... Of course, it could be even optimized more (but by hand in assembly!): e.g. MOV ECX,EAX (2bytes) after taking EAX from stack, instead of taking ECX from stack again (4bytes)... but hell, you better replace it with the constant value 1! Other similar "strange" arithmetics are resulting from the compiler's way of optimizing calculations. Multiplications by numbers near to powers of 2 are substituted with combinations of logical shifts and arithmetics. For example: reg*3 could be (2*reg+reg): MOV eax,reg; SHL eax,1; add eax,reg; (instead of MUL reg,3); but it even can be done in ONE instruction (see above about LEA instruction): LEA eax,[2*reg+reg] reg*7 could be (8*reg-reg): MOV eax,reg; SHL eax,3; sub eax,reg

3.6 Deadloops: JMP self

3.6 Deadloops: JMP self

If you see a dead-loop instruction, like "0x347: JMP 0x347" (i.e. JMP SHORT -2), this could mean two things. 1) you have found an (intentional) error. 2) The code-thread you are following is for initialization only. All the things that happen afterwards are controlled through other points: by interrupts; or events; or other threads, etc... (0x347:CALL 0x347 is not exactly a deadloop - it will fill up the stack and cause a hang or stack-fault). 3.7 Nasty instructions: JMP [eax]; CALL [eax] I call these instructions nasty, because they are usually impossible to follow without online-debugging (if the code calculating EAX is somewhere around it, you are lucky). These are very fast and useful instructions for function-pointers-execution and virtualmethods-calling; AND ALSO for doing branching ("C" switch operator) but (fortunately and unfortunately) very rarely. 3.8 Meaningless instructions Can you give an example of 2-byte NOP instruction? Well, MOV ax,ax will do. Or every other MOV reg,same_reg. But you will not see such thing in compiler generated code (oh, if you see, then send $1 to the author to help him starting some other business ;). If you see such thing, it is coded-by-hand, and may be someone will modify it somewhen... Many compilers in easy-mode do not think too much about saving resources and removing obvious redundancies. So, in Borland code, or Zortech-without-optimizations you could see frequently sequences like MOV [BP-2],AX; MOV AX,[BP-2]; RET - they do not do nothing really, but the compiler simply does not remove them. Warning: in one of 10 places this sequence could be USED really (by some JMP to second MOV). So, be careful. 4. Disabling code, limiting demo-versions, etc.. There are several ways of limiting a ready program to some democapabilities (I talk from a programmer's point of view). 1. There is a #defined constant, that determines some array sizes. All the things depend on that constant, but the arrays are static (i.e. in the executable or allocated during the executableloading/initialization). This is the worst case from reversers point of view - you just do not know how many arrays are there and there's no room for expanding. (example: Novell server.exe up-to ver3.x - 20user-version has am empty static tables for 20users only - not even byte more) 2. The same as above, but the allocation is dynamic by the program. This is only a bit easier, because again you don't know how many/big arrays are and where are they allocated. 3. There is a static/dynamic constant/variable, that determines the above sizes. If you change it before the allocation, you are OK. 4. The stupidest kind is just to add an additional check if some var is more than some little constant, at several places in the program (and all the rest remain the same). This is the easiest way to being reversed - find those places, and remove the check.

5. In my programs I usually #ifdef some parts of the program, so they are really missing in the demo-version. This usually cannot be reversed, especially if the missing processing is unique in the program (you will need to reinvent them, which is not always impossible I've made it several times long ago in a (partially)encoded programs). 6. Now mix up all the above (and add some nasties... look at howto-protect.htm better) So finding a limitation could be a pain, if the programmer intents to do so (which is BTW very hard, coz requires that several versions of the source to be supported at same time). Usually there are some more options on the command line (if there is such), or "hidden" shortcuts, menus, etc... I am talking generally, not for dos/windoz/any-other special platform - just the technology. Now, how to obtain the missing limits. By observing, of course. There are always some traces - SIZE of some array, memory allocation, clearing, copying (malloc, memset, memcpy), file, block or anything else. There is no general technique - just look inside. If you are very lucky, you may see what the programmer has hidden. But, for bad programs, the hidden code is not exactly hidden, just needless or dead. So you could be lucky if needed code is there (not dropped by compiler), but you should find it in the heaps of rubbish. 5. Some final notes 5. Some final notes

A code could look very different if disassembled from different starting point. If the instruction you are on is long, and you move the disassembly start 1-2-3 bytes below or above, you could get fairly different instructions decoded. There are sometimes special intentionally developed deceiving-instructions, which are executed normally in some case but from the middle in other cases. Do not be startled if you see a Jump or Call to a place where nothing seems reasonable. In 99% of the cases There Will be Code there when that Jump/Call gets executed - by preliminary moving some code, or by unpacking/decoding; in the rest 1% this is a) intentionally made - to cause a fault OR to make the disassembler crazy and thus to hide something interesting; b) it is a corrupted code - because some disk reading error or some-other-code is overwriting that part of code as a mistake in some pointer; c) it is a mistake in the jump-code (by the programmer), but such things happen Very Rarely. Be aware: I am not talking about Jump/Call using some variable as a target address - thus if variable gets wrong, noone could help you - You should repair the function that sets the variable, or just ignore all that section. [[Always keep a track of what you have found]] - some strange or special functions (strcmp, memcpy, etc..), strange or special variables, etc. - write down the addresses AND some description you have figured out. You see, same strcmp() function used to compare some executables optional command-line argument with the list of possible ones could be used also to check your name, password, etc... One never knows which bush from will the rabbit come. 6. How to learn more on this (kind of magic) It is easy. No need to reinvent the wheel it is already

invented ;). Just study it. (here is the, he-he, as one said, learning-curve killer :). Make a simple C/C++/anything program/function that does something very-simple that you know exactly-what. Compile it with the (choosentarget) compiler. Without optimizations. Now disassemble the object code. Look carefully what code is associated with your functions. Some object-file (.obj) disassemblers can put source lines as comments around the assembly code, so you can see easier which is what. Or link then disassemble / debug it whole. Now compile it with full optimizations. Look again. Now find/write more complex program. Repeat the above. After your 100-th trial you will know VERY WELL WHAT in assembly means WHAT in C. For that compiler. And you could decompile (i.e. understand) a disassembled executable into C by mind. Try other compilers (one can link, i.e. mix up into one executable, modules made by different compilers). Etc... after some years you could make such an essay yourself ;). This way, you may find mistakes in your programs and/or compilers - before they become a fault. [[If you are programming, sometimes check out what kind of (shits) your compiler produces]]. Just in case. In my experience there were many occasions when I have found my (and non-my) mistakes, caused by wrong code, compiler settings, errors in compilers (- yes, such things happen! - and it is disgustingly difficult to find them), etc..., just only by looking in the resulting object/executable code and not in the source/makefile's/etc. And it's the only way to find a way around the mistake if it is in the compiler. [[The compiler will not add brains to the programmer's)]]. It would try to express the best way the programmer's ideas, but if they are wrong... (remember the above needless example function - yes, it could be made much longer, more complex, resource-consuming, windowanimating, but will have the same CONST result: 1 ;-). As final words, I would like to say the following. If one knows how to rip an win999.9 application with HardBreak3.14159, but he does not know what ACTUALLY he is doing, is he a reverser ? No. My opinion is that he is only a blind user of someone-else-made-recipe. Or, using the other words, Dull Consumer. I do not imagine that everybody is able to figure out (I also didn't know it until I saw it) that exchanging A and B without temporary variables could be done by XOR A,B; XOR B,A; XOR A,B (i.e. A=A^B; B=A^B; A=A^B) sequence - and it WILL work on ANY processor/language supporting XOR operation; but if one see such thing and CAN'T understand it, he is a fault. [[Particularities die, the principles remain... but they should be learned BOTH - seeing theory behind the practice and practice behind the theory]].behind the theory]] Now, sharpen your pencil, and... nice digging. SvD Jan'99 =======================================

END

Calls and how they make use of the stack


by Ignatz of stoicForce

Generals
This is a paper on calls and how a program uses the stack to pass variables to a call. this can be very useful especially for newbies. i dont think that experienced crackers will get much out of it, but if you dont know how a stack works, what a call does and how parameter are passed to a call then this could help you a lot. Also i want to state that everyone is responsible for his actions. i will not be held responible for any illegal action you might use the knowlege provided here for. so lets get comfortable and enjoy.

Needed tools
All you need is i) some free soace in your brain and ii) a drink; a longisland-icetea will do the job: 4 cl rum, 4 cl tequila, 4 cl whiskey, 4 cl wodka, some lemonjuice and then just fill it up with coke. its perfect.

Why calls?
Im sure you already asked youself what is a call. what is it good for? imagine you impement a program and have to write text to the screen very often for example. then this text wont be the same everytime but the routine, the code, to bring it on screen will be the same every time. so you would have to write the whole code over and over again for every message you have to put on screen. but there is a cheaper way. just write the code down once and give this piece of code a name, a reference, like "write_to_screen". then all you would have to do instead of writing all the code again and again is just calling the name, lets say the function or call "write_to_screen". but wait there is still another problem: we do have different messages, so there needs to be a variable like "msg" that holds the message we want to print out. and of course we have to pass this variable to the function. this would then look like this in high level language like C++:

write_to_screen(msg); push msg call write_to_screen

but what does it look like in assembler? it would look like this:

this example shows the two topics explained in this tutorial. what a call is and why you need to pass variables to a call. now you know what a call is good for and what it does. maybe youre irritated by the push so the next thing we need to approach is the stack.

The stack

This is a structure that stroes values. these are always 4 byte (= 1 DWORD) long. it works LIFO (Last In First Out). this means a value that was recently put on the stack will be get of the stack first. there are only two commands to work with the stack. these are called push and pop. push puts a value on the stack and pop gets a value from the stack. so lets have a quick example.

push a push b push c pop eax pop ebx push eax push 5

but push an d pop also do additional things. to understand that we must first have a look at e the register ESP (Stack Pointer. this register always points to the beginning of the stack. this means the last made entry. so if you push a value esp is being decreased by 4, if you pop a value esp is increased by 4. example:

Stack= Stack= Stack= Stack= Stack= Stack= Stack=

{a} {a,b} {a,b,c} {a,b} eax = c {a} eax = c; ebx = b {a,c} eax = c; ebx = b {a,c,5} eax = c; ebx = b

ESP = 7E0000; EAX = : ... :7DFFF0 12 34 56 78 :7E0000 00 00 00 00 this is the initial

01020304 00 00 00 00-00 00 00 00 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 state so the stack is filled with 78563421.

if you now push EAX then you will have the following. ESP = ESP - 4 = 7DFFF0 ESP points here : ... | :7DFFF0 00 00 00 00 00 00 00 00-00 00 00 00 04 03 02 01 :7E0000 12 34 56 78 00 00 00 00-00 00 00 00 00 00 00 00 as you can see the stackpointer gets adjusted and the value is written to the correct position. pop ebx you can now pop the value again ESP = ESP + 4 = 7E0000 EBX = 01020304 :7DFFF0 00 00 00 00 00 00 00 00-00 00 00 00 04 03 02 01 :7E0000 12 34 56 78 00 00 00 00-00 00 00 00 00 00 00 00 | ESP now points here. notice that the popped value is not changed. of course you can also add or sub something from ESP. lets see what happens if we add 1 to the ESP ESP = ESP + 1 = 7E0001 :7DFFF0 00 00 00 00 00 00 00 00-00 00 00 00 04 03 02 01 :7E0000 12 34 56 78 00 00 00 00-00 00 00 00 00 00 00 00 | ESP now points here. lets now pop a value pop EBX ESP = ESP + 4 = 7E0005 :7DFFF0 00 00 00 00 00 00 00 00-00 00 00 00 04 03 02 01 :7E0000 12 34 56 78 00 00 00 00-00 00 00 00 00 00 00 00 | ESP now points here. and EBX gets 00785634 by adding one we see that the stack gets "out of order" so you should only add something multiplied by 4 to the ESP.
now you know how a stack works and what it does so lets take an inside look to calls.

Calls - the inside


As you have seen, the stack a an ideal place to store values. lets have another look at the little example we had before. With our new knowlege we can understand what happens and even show as a serious bug.

push msg call write_to_screen

; this wont work because we can only push 4 bytes! ; the call will then read the data off the stack

so we have to pass the address of the msg. we do this easily with the command lea eax, msg. AX now has the addres of msg. then we push the address with push eax. lets bloat this up and make eit real so you can see what exaclty is going on.

data: EAX = 00000000 :00450000 31 32 33 1234567890......

34

35

36

37

38-39

30

00

00

00

00

00

00

stack: ESP = 007E0000 :007DFFF0 00 00 00 00 00 00 00 00-00 00 00 00 04 03 02 01 :007E0000 12 34 56 78 00 00 00 00-00 00 00 00 00 00 00 00 this is all before the push :00412l00 lea eax, msg ; nothing changes exept for EAX is set to 00450000 :00412104 push eax ; data stays the same but the stack changes ; stack: ESP = 007DFFFC ............ address ; :007DFFF0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 45 00 ; :007E0000 12 34 56 78 00 00 00 00-00 00 00 00 00 00 00 00 :00412105 call write_to_screen ; put the serial to the screen. ; short break --:00412110 next instruction ...

what will the call do now? it will do one very important thing. it will automatically push the address of the next instruction on the stack. why does it do this? because it is a call not a jmp and after it has done what it had to do, the program should continue after the routine was called- thats what a call is all about. so it stores the return address on the stack so the pc knows where to go afterwards. lets see what the stack looks like after entering the call:

stack: ESP = 007DFFF8 ........... return-address pushed to stack :007DFFF0 00 00 00 00 00 00 00 00-10 21 41 00 00 00 45 00 :007E0000 12 34 56 78 00 00 00 00-00 00 00 00 00 00 00 00 and the call will continue.
so whenever you enter a call you have the return address on top of the stack. there is only one thing left now: how does the call read the data from the stack?

Reading data from the stack


This is the most interesting part, and there are many ways to archieve this. the first possibility is to pop the returnvalue and store it somewhere and then pop all other data and push the returnaddress back. this of course is very complicated especially if you have too many values and a huge call. this will force you to save the values again on the stack because you cant hold all the data an the registers. it might be ok for a small call with few parameters. so what is the other way round? easy you just save the EBP (Base Pointer used to adress variables) by pushing it on the stack and then get the stackaddress into EBP by mov EBP, ESP. now you can address all variables EBP relative like mov eax , [EBP + 20]. this way you get a better grip of the stack. you can still use the stack normally in the call and just have to make sure that the returnvalue is on top of the stack and that you recalled EBP when you leave the call. lets take a look at a simple example. i extracted this from MP3 CD Maker with Win32DASM:

beginning of the call :004301A1 push ebp :004301A2 mov ebp, esp

; ebp is stored ; now ebp points at the

top of the

stack and pops EBP

; you do not have to care for pushes ; because you address the parameters

; relative now :004301A4 mov eax, dword ptr [ebp+08] ; this equals pop eax, pop eax but youd ; lose the return address. ;( :004301A7 push esi ; store another value.. . . call routine unimportant. . leaving the call :004301FF pop esi ; restore esi :00430200 pop ebp ; restore the original base pointer :00430201 ret ; now the returnaddress is on top of the stack
hint you might the see something like mov eax, [ebp+10]. now you can name the [ebp+10] variable because it will always point at the same value in the call. in a protection call with name and serial for example you could say [ebp+10] is the name and [ebp+14] is the serial. this makes programs and especially deadlistings lot easier to read. with all this knowledge we shold know have a look at some live codesnippets extracted with Win32DASM.

Live snippets
I)
First example will be the call GetWindowText. to get a grip on it lets see what the MSDN says about it: int GetWindowText( HWND hWnd, // handle to window or control with text LPTSTR lpString, // address of buffer for text int nMaxCount // maximum number of characters to copy ); If the function succeeds, the return value is the length, in characters, of the copied string, not including the terminating null character. (else its zero) the returnvalue will get stored in EAX. and how about the tree parameters? (hWnd, lpString, nMaxCount). to see about this lets have a look at the deadlisting:

:00445273 push 00000020 ; nMaxXount :00445275 push eax ; address for string :00445276 push edi ; hWnd * Reference To: USER32.GetWindowTextA, Ord:015Eh | :00445277 Call dword ptr [0044C430]

see how the values are pushed on the stack? so when a program gets a serial and you want to know where it is stored, all you have to do is set a breakpoint at the second push and read the address in eax- thats it.

II)
The next example is a function from MP3CD Maker that upcases the name. to see if a call is likely to be associated with the protection, see what the parameters are by identifying what is pushed before the call. also check the results of the call by seeking which parameters were changed by the call and if it has a returnvalue.

:00421539 lea eax, dword ptr [ebp-74] ; :0042153C push eax stack ; :0042153D call 0043BB82 ; :00421542 add esp, 00000004 ; the beginning of the call:

eax holds the address to the Name ; the address is pushed on the and passed to the call only one parameter. stack adjustment <----------| |

:0042BF40 push ebp ; :0042BF41 mov ebp, esp ; :0042BF43 push edi ; :0042BF44 push esi ; -naming the variables: :0043BB96 mov eax, dword

-leave the all :0042C09C pop edi :0042C09D leave :0042C09E ret

now we could even name the call here like "void RegNameToUpCase(char* Name)" now you should be able to read a lot more assembler because the many commands between calls are just preparing the parameters for the next call.

| | | | | ptr [ebp+08] ; you could name | ; ebp+08 regName becs it holds | ; the value of the name in | ; order to upcase it | | ; but only edi gets popped | ; so there seems to be something wrong | ; there needs to be an ------------------------

using th EBP - relative method. you should know by now edi and esi are pushed to stack

Concluding remarks
Go and find out about calls and stacks yourself. try it. dont waste time searching for the serial, set a breakpoint at the parameter before getwindowtexta. being able to identify parameters for a call will also enable you to "name" calls. also making deadlistings more readable. of course any comments and critics are welcome. just mail me

Ignatz

or

visit

stoicForce

yours Ignatz - Make cracking more enjoyable.

truly

END

Reversing.generals
by Ignatz / stoicForce for newbies/intermediate crackers.

Prelude
Greetings! After quite some work, i can now present my personal experience regarding successful reversing. where reversing really means reversing and not cracking. cracking, of course also makes use of reversing, but during my work with Opera 3.62, i realized that cracking is just a small part in the world of reversing. read this, even if you just want to crack, bad boy. it will sureley help you.

Waypoints into the light


or just TOC if you prefer
0) Conventions 0.1) Commenting The Code 0.2) Translate The Code 1) Functions 1.1) Calling Conventions 1.2) Local Variables Parameters 1.3) Return Value 2) Loops 2.1) Counter 2.2) Exiting Condition 2.3) Type Of Loop 3) Control Structures 3.1) If Then Else 3.2) Case 4) Global Variables 4.1) Initialization 5) Generating High Level Source 6) Further work 7) Conclusion

Conventions
As you start to reverse, you will soon see, that it is essential to stick to some general conventions. like the way you name a vaiable or you comment a loop. this should always look the same, due to sticking to a convention. never delete information, always add information. this is the most important rule for me, it is. this means, for example, never overwrite an address but add the name to the address.
mov [00534160], eax mov [00534160_SerialFlag], eax ; like this you wont lose vital information

i now want to present some more thoughts on this topic.

Commenting The Code


It is very important that you use clear definitions and dont mind if you write some extra words. youll love yourself for that afterwards, and youll hate yourself if you can not follow your steps just because you didnt comment well enough. It will also make

some code parts clearer for you, because you have to think about what this piece of code does if you want to comment it seriously. as a general rule: comment the code as if someone else, who never reversed before, had to read, sorry my bad, -understand- it. commenting is about understanding the code not reading the code.

Translate The Code


This is the most important part if you really want to reverse the code - changing the asm commands into pseuo-high level language source. if you do a good job here, it will be peanuts to generate the reversed code in your prefered language. its important to realize the difference in translating the asm-source and commenting it. if you just comment the code, the outcoming sourcecode is not defined at all. for example, there are at least 3 differet ways to realize a loop. but if you read trough the asm-code you will see if its a while, a repeat or a for loop. lets look at an example:
xor exc, exc :00440000 inc ecx . . cmp ecx, 05 jbe 00440000

; comment: counter of loop is increased ; comment: counter is compared ; repeat this while the counter of ; loop is 4 or less continue loop

----; reversed 1 while (i < 5) { still to come } ----;reversed 2 for (i=0; i<5; i++) { still to come } ----;revesed 3 i = 0 repeat i++; still to come until (i >= 5 )

the comments have nothing to do with the code you finally find suitable. they just make you and others understand whats going on. The reversed style does not make anyone understand the code better, unless he knows how to reverse C code better then reversing asm for example. it is just another form of the same thing. like translating form latin to english. in this case it is not absolutely clear what kind of loop it is. it looks like a repeat until, but its not really important either, unless you want to make an exact copy of the original source. If you understood the difference between commenting and translating, step to the next

section.

Functions
A program consists of many different functions. thus, it would be wise to reverse one function after the other. this allows us to stay focused on one part of the code. another benefit is, that the structure of the code will nearly reveal by itself. if you have reversed a couple of functios, you will see a structure coming out of the dark very quickly. this structure will contain the returnvalues, parameters, stackadjustments, other commands and further more. Lets see what we need to describe a function. 1. Function Name 2. Parameters 3. Return Value 4. Purpose Of The Function finding a name for a function is the most important part. it has to be a name that describes the purpose very well and is easy to understand. this will make it easier for you to interpret the code if you find the function again somewhere else. the parameters should be described by name and type, sometimes include a small description, too. this will become easier when you revealed the calling convention. ad 3: you will get the returnvale of a function in eax. this means a function can not return a string or object. it can only return a pointer to the string or object. an other possibility is that a function manipulates/overwrites one of the input parameters, thus making this an output parameter, too. so you have to take a good look at how the returnvalue is used by the program and how it uses the parameters. at last, when youre done, write a short description of the function. maybe you thought about a small problem. functoins call other functions and they call functions and so on. so where would you start ? this is up to you. it depends on you at what depth you start reversing, if you use a bottom-up, or top-down strategie (start at the depth of a messagebox call for example). we will now take a closer look at the methods.

Calling Conventions
One of the most important things to find out is, how the program passes parameters to functions and how the program handles the stack adjustments. there are two major conventions. the C and the pascal convention. if you dont know how the stack is used during calls, then refer to my tutorial "Our friend stack and his cousin call". i will only repeat the basics. parameters are passed through the stack. this means that the stack has to be restored after the function is completed. if its not restored, the caller will have a corrupt stack, with which it cant work with. So the calling convention consists of two parts: 1. Put values on the stack 2. Restore stack Let us now have a look at the two most common conventions. you will see all neccessary things when learing about them.

) The C calling convention (also used by the windows API) in this case, the parameters are pushed on the stack in reversed order and the caller has to restore the stack. example:
procedure test1(Par1, Par2, Par3: integer); asm: push par3 ; push in reversed order push par2 push par1 call test1 ; call function add esp 0C ; restore stack value

) The Pascal calling convention in this case, the parameters are pushed in the same order as they appear in the declaration of the function or procedure. but now, the functoin has to do the stack adjustments.
procedure test1(Par1, Par2, Par3: integer); asm: push par1 ; push in same order push par2 push par3 call test1 ; call function ... ; no add esp,0C ; this was already done within the test1 call

Local Variables - Parameters


When you identified how the parameters are passed to the function, you can easily use this knowledge to name the parameters in the function. two lines of code can be found at the beginning of nearly every function.
(1.2.I) push ebp mov ebp, esp ; save the original base pointer ; set basepointer to the stack (parameters!)

this is a very efficient way to access the parameters without having to worry about the stack anymore. every parameter can now be accessed realtive to ebp. examples for c convention: par1 = ebp+8, par2 = ebp+0C and so on. dont forget, that the return address and ebp are also put on the stack when the function is called. this is why the first parameter is not at ebp but ebp+8. you see how easy it is to work with the parameters now. no more annoying push or pop instruction necessary. The next line will often stand right under the two lines above.
(1.2.II) sub esp, 00000018 ; make room on stack ; for local variables

even here, the stack is reserved, but not used with the push and pop instructions. this

part of the memory is directly accessed relative to ebp just as the parameters are. the only difference is, that its not ebp+8 but ebp-8 for example. note that ebp-8 is the third local variable (assuming all variables are WORD values). now you can imagine, where a stackoverflow comes from. what we get out of this finding is the following : (assuming that 1.2.I is used in the call): *every* ebp + X is a parameter passed to the call (assuming that 1.2.II is used in the call): *every* ebp - X is a local variable knowing this, it is not so hard anymore to identify and then name the parameters as well as the local variables of a call.

Return Value
If the call returns a value, it is always returned in eax or in one of the input parameters. you will often reckon if it is an input parameter, by checking if an address or a value is provided to the call. if its a value it is definitly not a parameter you need to care about afterwards. if it is an address, to a string for example, then you should take a look at the next lines of code to figure out wether it is used again or not. in most cases you will see that its used again. (thats more efficient, than wasting memory) i will now show you what a pascal programmer has in front of his eyes when he creates a function. this should make the concept even clearer. here is a normal declaration of a function:
function var erg: begin erg := add := end; MyAdd(x, y :integer) : integer; // declaration integer; // local var x + y; erg; // do the things // return value

This is a normal function. it just adds two integers. there are two parameters and a local variable that is used to temporarily store the result, which is then returned by the function. the assembler has to care, that the two parameters are removed from the stack after the function is done and it has to reserve enough memoryspace for the local variable...might look like this:
; *Pascal* push ebx push ecx call MyAdd ; ; ; ; par1 par2 call the function in this example, the function restores the stack

(pascal) mov ..., eax ; returnvalue in eax ; *C* push ecx push ebx call MyAdd ; par2 ; par1 ; call the function

add esp, 8 ; restore the stack mov ..., eax ; returnvalue in eax ;------------------------------; *MyAdd* ;------------------------------push ebp ; save baspointer mov ebp, esp ; get a grip on the parameters sub esp, 4 ; make room for the local var ; adding mov ebx, [ebp+0C] mov ecx, [ebp+8] add ebx, ecx mov [ebp-4], ebx mov eax, [ebp-4] leave add esp, 8 ret ; ; ; ; ; get first parameter ; get second parameter ; add ; move result to local var ; return local var ; returnvalue is eax this instruction does the folowing: mov esp, ebp pop ebp ONLY for pascal!

here you should see the differences between pascal and c convention. it is also obvious how the assembler works with parameters and local variables. i will now show you the same program with variable-parameters. to notice the differences better, i made the whole example in pascal. now we declare a procedure, not a function, because it has no explicit returnvalue.
procedure MyAdd2(var res:integer; x, y :integer) // declaration begin res := x + y; // do the things end; // no explicit returnvalue

the point is, that the parameter res will be overwritten by the procedure with the result. this means, you have to give a variable parameter to the procedure which can be overwritten; it cannot be a constant or a number. let us see what this would look like in assembler:
;* i will only do the pascal style since you already ;* saw everything important regarding the ;* c - pascal differences in the last example lea ebx, res ; because its a var you have to pass the address! push ebx ; put address as 1st param on stack push edx ; edx contains x push ecx ; ecx contains y call MyAdd2 ; call the function mov ..., res ; since res was overwritten by MyAdd2 ; it now has the result ;------------------------------; *MyAdd2* ;------------------------------push ebp ; save baspointer

mov ebp, esp

; adding mov eax, [ebp+10] mov ecx, [ebp+0C] add eax, ecx mov ebx, [ebp+08] mov [ebx], eax leave add esp, 0C ret

; get a grip on the parameters ; no local variables ; ; ; ; ; get first parameter get second parameter add get the address into ebx move result to variable-parameter (erg)

; adjust stack

in this example, you can see the difference between an explicit returnvalue and a variable-parameter. i hope you got the point, so we can continue to the next section.

Loops
Identifying loops can be a difficult job. especially if the are big and encapsulated. this makes it even harder. but through my years of reversing i figured that a jump backwards is very often a jump done by a loop. for a normal "if then else" statement, you would skip code, by jumping forward, but with loops you have to return in the code which means you have to jump back. that should be clear now. this means, after all its not so hard to identify loops. when you identified a loop you have to reverse it of course. let us now have a closer look at the loops main parts.

Counter
if you already made some contact with asm-programming you will have noticed, the loops normally use the ecx register as the loop counter. in case that ecx is not the counter, just look at the jump-statements of the loop. there must be a condition checked before (like test eax, eax) the jump. use your brain to figure that out. brains is the best tool ever developed.

Exiting Condition
when you identified the counter you also see where the counter is checked. then all you have to do is look at the statement before the loop-jump and voila.

Type Of Loop
This is not as important as you might think. you can transfer every type of loop into another one. the only really important thing to notice is, if the loop is done at least once or if not. As an example a while loop might not be entered due to the fact, that the condition at the beginning is not met. a repeat until loop will always be executed for once at least, because there is no check at the beginning. Loops are not so hard to figure out. all you have to do is think a bit.

Control Strcuctures
These are especially important for crackers. This is obvious. The program checks if you are regged or not. this makes heavy use of control structures. thus understanding these is essential to every serious reverser. generally they can be divided into if-then-elseifelse and case statements. for control structures it is not so important to identify them and reckon that there is something, but it is essential to understand the changes this statement means to the programexecution. only with that knowlege you can interpret and judge a control statement correctly. i will only point out some examples here. i think this shows best what i mean.

If Then Else
This is a piece of code taken out of Opera 3.62. The purpose of the code is to set the displayed program name in the program bar to either "Opera 3.62 (Unregistered version)" for unregistered users, or to "Opera 3.62" for regged users. it got insane reversing this one. i believed that there was a statement like this:
if (regged) progname = "Opera 3.62" else progname = "Opera 3.62 (Unregistered version)"

but what it does is the following:


cmp byte ptr [PrgDisplayName_00531640], bl ; do the following only if PrgDisplayName="" jne 0045FE49 mov esi, 00000080 ; esi = 80 (maxCount) push esi ; maxCount (maximum chars to append) push edi ; append at edi = PrgDisplayName String Resource ID=20092: "Opera 3.62" | push 00004E7C ; str to append mov ecx, ebp call 00460BD8 - AppendStr(ToApp,EndOfOrig,Count) push edi call 00501E10 - int strLen(str S) ; calculates lenght of the prgDisplayName string pop ecx sub esi, eax ; calcuate maxCount mov dword ptr [005315F4], eax push esi ; maxCount lea eax, dword ptr [eax+PrgDisplayName_00531640] ; eax = end of DisplayName

push eax ; EndOfOrig String Resource ID=21428:" (Unregistered version)" | push 000053B4 ; str ToAppend mov ecx, ebp call 00460BD8 - AppendStr(ToApp,EndOfOrig,Count) * Referenced by (C)onditional Jump at Addresses: |:0045FDD3(C), :0045FE12(C) | cmp dword ptr [ebp+flg_Regged_000004EC], ebx;=0 ; check regFlag mov eax, dword ptr [005315F4] ; length of Opera 3.62 string je 0045FE5E ; if regFlag = 0 then set end of PrgDisplayName ; right behind "3.62", this means ; discrad the "(Unregistered version)" part mov byte ptr [eax+PrgDisplayName_00531640], bl ; mov 00 after the Opera 3.62! ; "thats why the first chracter of manually ; entered Names vanished because the lenght ; function was 0" jmp 0045FE65

As you can see, the program goes a different way. it does not differ between regged and unregged the way i thought. it always puts the not regged part in place. this really confused me until i saw how this really works. the program overwrites the string a third time if you are regged. then it sets the end of the string zero, so that the not regged part is ignored. look at it in pseudo code.
s = "Opera 3.62\0"; a = " (Unregistered version)\0"; append_this_to_at(a, s, endOf(s)); if regged then s[10]="\0"; end;

Case
This also is some code from Opera 3.62. It decides which help page to show in the browser. I came across this code by accident, but it is a very useful example for a switch/case statement. Before you look at it i want to point out, that there are two different ways in writing a switch statement. one is to write a series of jump sequences. this would equal a "if then elsif elsif elsif...end" and there is the type Opera 3.62 uses here. it is faster than the previously mentioned method. here, the program calculates a jumpmark [4*eax+004932A3] to the corresponding code, instead of going through every line. this only works, because the code has the same length for every branch taken - 4 bytes. normally this is not the case, so you have to deal with a series of "cmp jne"

statements which refer to the "if then elsif elsif elsif...end". lets look at the code now.
:00493070 mov eax, dword ptr [ebp+10]; eax = Par1 :00493073 add eax, FFFFB1DD ; Par1 = Par1 - 20003d "look at stringrefs" :00493078 cmp eax, 0000008A ; if eax smaller than 138d jump toindex.html :0049307D ja 0049305E :0049307F movzx eax, byte ptr [eax+00493357] :00493086 jmp dword ptr [4*eax+004932A3] ; switch statement, calculate jumpmark * Data Obj ->"keys.htm" | :0049308D push 0051DD58 ; a jumpmark :00493092 jmp 00493063 * Data Obj ->"prefmenu.htm#print" | :00493094 push 005256B0 ; another jumpmark :00493099 jmp 00493063 * Data Obj ->"dialogs.htm#direct" | :0049309B push 0052569C ; ... :004930A0 jmp 00493063 * Data Obj ->"prefmenu.htm#sethome" | :004930A2 push 00525684 :004930A7 jmp 00493063 * Data Obj ->"dialogs.htm#fileuplf" | :004930A9 push 0052566C :004930AE jmp 00493063 * Data Obj ->"dialogs.htm#hotlist" | :004930B0 push 00525658 :004930B5 jmp 00493063 * Data Obj ->"dialogs.htm#locked" | :004930B7 push 00525644 :004930BC jmp 00493063 ; and so on ...

Global Variables
In most programs there are variables and constants that have to be accessible everytime. these cannot be local variables, because these are discarded after the owning function terminates. Thus they are only accessible by the funciton itself. the variables that can be

accessed all the time are called global variables (in contrast to local). as a reverser you should keep an eye out on those, because they contain flags like registrationflags, demoflags, trialdate, ... and other data like programname, parameters, ... . identifying global variables can be a hard job. i figured two ways Opera accessed its variables. one way was direct addressing. this means, if a variable is at :00543380 the program accesses it by this number.
direct mode: mov eax, dword prt [00543380]

other than that, it might use relative addressing mode. you can see an example of this in the next section. with this mode, the program accesses the variable relative to a baseaddress. this base is stored in a register. if you see something like this its a bit harder to identify the global variable. still it is possible. you just have to search for the offset value. dont get confused everything will be clear in a second. just look at the example.
mov eax, [esi + 4EC] | | base offset

all you have to do now is search for the offset value. if you can find it serveral times in the same context, then you found a global variable. like i found the regflag here:
; first appearence :004CB0AF mov eax, dword ptr [esi+flg_Regged_000004EC] :004CB0B5 cmp eax, ebx :004CB0B7 lea edi, dword ptr [esi+flg_Regged_000004EC] ; second appearence :004D9543 mov dword ptr [esi+flg_Regged_000004EC], eax :004D9549 pop ebx :004D954A je 004D9556 ; third appearence :004D963C mov eax, dword ptr [ecx+flg_Regged_000004EC] regFlag :004D9642 ret ; check that

; return with

; there are still many more occurences of this addressing but i think you got the point

Intialization
These variables have to be initialized. you can see how ths looks in this example. when you see a part of code that looks similar, you know what you got.
* Referenced by a (U)nconditional or (C)onditional Jump at Address: ; /*INIT SECTION */

|:0045BD04(U) | :0045BD12 mov dword ptr [esi+regName_00000138], ebx :0045BD18 mov dword ptr [esi+0000013C], ebx ; the variables are addressed via esi+X :0045BD1E mov dword ptr [esi+00000144], ebx ; this means relative addressing :0045BD24 mov dword ptr [esi+00000148], ebx :0045BD2A mov dword ptr [esi+0000014C], ebx :0045BD30 mov dword ptr [esi+00000150], ebx :0045BD36 mov dword ptr [esi+00000154], ebx :0045BD3C mov dword ptr [esi+00000158], ebx :0045BD42 mov dword ptr [esi+0000015C], ebx :0045BD48 mov dword ptr [esi+00000160], ebx :0045BD4E mov dword ptr [esi+00000164], ebx :0045BD54 mov dword ptr [esi+0000038C], ebx :0045BD5A mov dword ptr [esi+00000184], ebx :0045BD60 mov dword ptr [esi+00000188], ebx :0045BD66 mov word ptr [esi+00000212], 0008 :0045BD6F mov dword ptr [esi+00000218], ebx :0045BD75 mov dword ptr [esi+0000021C], ebx :0045BD7B mov dword ptr [esi+00000224], ebx :0045BD81 mov dword ptr [esi+00000220], ebx ~something deleted~ :0045BE9F mov dword ptr [esi+00000358], ebx :0045BEA5 mov dword ptr [esi+0000036C], ebx :0045BEAB mov dword ptr [esi+00000380], edi :0045BEB1 mov dword ptr [esi+0000037C], edi :0045BEB7 mov dword ptr [esi+00000378], edi :0045BEBD mov dword ptr [esi+00000374], edi :0045BEC3 mov dword ptr [esi+00000370], edi :0045BEC9 mov dword ptr [esi+00000384], edi :0045BECF mov dword ptr [esi+00000388], edi :0045BED5 mov dword ptr [esi+000003C4], edi :0045BEDB mov word ptr [esi+000003B4], bx :0045BEE2 mov dword ptr [esi+000003EC], ebx :0045BEE8 mov dword ptr [esi+000003F4], ebx :0045BEEE mov dword ptr [esi+000003F8], ebx :0045BEF4 mov dword ptr [esi+000003F0], edi :0045BEFA mov word ptr [esi+000004D8], bx :0045BF01 mov dword ptr [esi+000003DC], ebx :0045BF07 mov dword ptr [esi+000002A8], ebx :0045BF0D mov dword ptr [esi+000002AC], ebx :0045BF13 mov dword ptr [esi+00000250], ebx :0045BF19 mov dword ptr [esi+00000254], ebx :0045BF1F mov dword ptr [esi+00000258], ebx :0045BF25 mov dword ptr [esi+0000025C], ebx :0045BF2B push 00000720 :0045BF30 mov dword ptr [esi+flg_Regged_000004EC], ebx ; INIT

here is not much to say. you can recognize very easily how each variable is set. most of them are flags but some of them are addresses to strings or other data. which is which this to find out is your work.

Generating High Level Source


The last step in reverse engeneering is to recreate the high level sourcecode. how to do this you might ask. there are two general approaches. one is to strictly follow the target program and also follow its instructions, sometimes even without exactly knowing what they do, to find out later when reading the recreated source. -or- understanding the code and writing your own source, without having reversed everything since you were able to guess what the code does. the first approach focuses on reconstructing the original source files, thus you have to write it in the same language as it was originally written. the second approach only wants to create source that does the same as the original (g.e. it doesnt matter if you show a nag with a dialogbox or a messagebox the user will know he failed to reg anyway but the reversed program is not exacly the same), hence it can be written in any language. the drawbacks are of course, that you will never know if your program does the same since you only look for functionality - a lot of testing has to be done in this case. but it is much faster than looking trough every line of asm source. the plus on the first method is that you can sometimes continue without understanding the meaning of the code. it might get clear later on. as always it depends on you which approach (also mix em up) you want. enough theory let us see an example now.
:004BCF49 push ebp :004BCF4A mov ebp, esp for the function :004BCF4C sub esp, 00000548 stack :004BCF52 push ebx :004BCF53 mov ebx, ecx :004BCF55 cmp dword ptr [ebx+00000714], 00000000 :004BCF5C je 004BCF66 function :004BCF5E push 00000001 :004BCF60 pop eax :004BCF61 jmp 004BCFEF eax = 1 ; save base pointer ; set basepointer ; make room for ; ; ; ; save ebx ebx = ecx if [ebx+714] == 0 contine with

; eax = 1 ; . ; leave with with

* Referenced by a (U)nconditional or (C)onditional Jump at Address: |:004BCF5C(C) /* continue with ?WriteRegFile(?) */ | :004BCF66 push esi ; push source :004BCF67 push edi ; push destination :004BCF68 mov ecx, ebx ; ecx = ebx :004BCF6A call 004BC81A :004BCF6F mov ecx, 0000012F ; ecx = 0x12F (303d); 303*4 = 1212 (0x4BC) :004BCF74 mov esi, ebx ; sourceaddress of RegInfo :004BCF76 lea edi, dword ptr [ebp+FFFFFAB8] ; destination address of RegInfo :004BCF7C lea eax, dword ptr [ebp+FFFFFAB8] ; . :004BCF82 repz ; while not finshed

:004BCF83 movsd :004BCF84 push eax RegFileBuffer :004BCF85 mov ecx, ebx :004BCF87 call 004BCF14 - Decrypt(chr *ToDecrypt) RegInfo

; move the RegInfo ; pointer to ; ; encrypt the

* Reference To: KERNEL32.SetFileAttributesA, Ord:0268h | :004BCF8C mov esi, dword ptr [005121A4] ; put address of SetFileAttributes into esi :004BCF92 push 00000080 ; attributes to set :004BCF97 lea edi, dword ptr [ebx+CRegFileSize_000004BC]; edi = addr of filename :004BCF9D push edi ; address of filename :004BCF9E call esi ; Make File writable :004BCFA0 push 00001010 ; fuMode (action and attribs) :004BCFA5 lea eax, dword ptr [ebp+CRegInfBuf_FFFFFF74]; eax = address of buffer :004BCFAB push eax ; address of buffer :004BCFAC push edi ; addres of Filename * Reference To: KERNEL32.OpenFile, Ord:01E8h | :004BCFAD Call dword ptr [00512228] :004BCFB3 mov ebx, eax OpenFile(fName,[opts]) (FileHandle) :004BCFB5 cmp ebx, FFFFFFFF :004BCFB8 jne 004BCFBE ebx = eax :004BCFBA xor ebx, ebx :004BCFBC jmp 004BCFE6

; ebx = hfile ; if open succeeds ; then continue with ; else handle = 0 ; skip writing part. Jump at Address: number of Bytes to eax = address of Pointer to buffer filehandle

* Referenced by a (U)nconditional or (C)onditional |:004BCFB8(C) | :004BCFBE push CRegFileSize_000004BC ; write :004BCFC3 lea eax, dword ptr [ebp+FFFFFAB8] ; Encrypted RegInfo :004BCFC9 push eax ; holding encrypted reginfo :004BCFCA push ebx ; * Reference To: KERNEL32._lwrite, Ord:02F7h Regfile ! | :004BCFCB Call dword ptr [005121CC] :004BCFD1 xor ecx, ecx of setne cl later :004BCFD3 cmp eax, FFFFFFFF :004BCFD6 setne cl occoured :004BCFD9 push ebx :004BCFDA mov dword ptr [ebp-04], ecx

; is used to write

; set ecx = 0 because ; check for errors ; set if an error ; save result _lwrite

* Reference To: KERNEL32._lclose, Ord:02F2h | :004BCFDD Call dword ptr [005121C8] :004BCFE3 mov ebx, dword ptr [ebp-04] _lwrite

; close file ; ebx = result of

* Referenced by a (U)nconditional or (C)onditional Jump at Address: |:004BCFBC(U) | :004BCFE6 push 00000021 ; make file WriteProtected :004BCFE8 push edi ; pointer to filename :004BCFE9 call esi ; SetFileAttributes :004BCFEB pop edi ; resotre values :004BCFEC mov eax, ebx :004BCFEE pop esi * Referenced by a (U)nconditional or (C)onditional Jump at Address: |:004BCF61(U) | :004BCFEF pop ebx :004BCFF0 leave :004BCFF1 ret ; bye

After reading the notes it is clear what the program does. I will not attempt to generate C++ source that does the same. might look like this
#define CRegFileSize 1212 // global Var gRegFileName /* Name: WriteRegInfoToFile * Purpose: Writes the encrypted Reginfo into the file OUser350.dat * and sets its fileattributes to writeonly * Returnvale: Either 1 if function fails, or it returns the result of * the _lwrite function which is temporarily stored in res. * Remarks: Have to find out about the first flag and the first * function. Very strait forward implementation. */ int WriteRegIfnoToFile(void) { // variables char *CryptRegInfo, // Holds the encrypted Registrationinformation *RegInfo, // Holds the original Registrationinformation *RegFileBuffer ; // Filebuffer for the RegFile handle hFile; // Handle to Regfile int res; // result of function if !unknownflag_[ebx+714] { exit 1; } else { (void) unknownfunction_004BC81A(uPar1, uPar2); (void) StrCpyN(CryptRegInfo, RegInfo, CRegFileSize); (void) encrypt(CryptRegInfo);

(void) SetFileAttributes(gRegFileName, FILE_ATTRIBUTES_NORMAL); if (hFile = OpenFile(gRegFileName, RegFileBuffer, 10)) { res = _lwrite(hFile, CryptRegInfo, CRegFileSize); _lclose(hFile); } (void) SetFileAttributes(gRegFileName, FILE_ATTRIBUTES_READONLY); return res; } }

Further work
This should be the biggest section, since there is still loads and loads of work to do. i will only point out some headwords: dll reversing, object oriented reversing, ocx-vxd reversing, language specific reversing, packed and encypted programs, commercial protection schemes... . what i want to enforce now is windows reversing. identifying and reversing the WindowProcs, MenuHandlers, Messages, and so on and so on. Maybe you will find another tutorial on this and other topics. but lets see what time will bring. A lot of these things can also be handled by you. i would like so see some tutorials tools and other stuff. Now we know where to start you so lets get movin.

Conclusion
To me, reversing is like solving a puzzle. you start out at one point put peice after peice together until youre stuck. then you start another colony somewhere else and slowly the little colonies start growing together. you will be able to see the big picture clearer and clearer with every piece you add to the whole. after a while you can already predict what its going to be like and then, in the end you marvel at your genius work, show it to others and enjoy it, tell stories about how you did it and cant wait to start all over again with another one, because of the fascinating dynamics and the great fun. never forget that, whatever you are doing enjoy your work be proud of it and make it something special. almost forgot to provide you with some links: +Fravia all tools you need a place to get some practice another place to get some practice assembler tutorial Thanks go to seneca and reverser+.

last words
Please send your comments and all to me -thanks.

Thats all for today folks explode, Ignatz


1999-2010 last edited 02/12/2008 00:16:29 by the

stoicForce

Das könnte Ihnen auch gefallen