You are on page 1of 16

What is Assembly Language?

Assembly language is essentially the native language of your computer. Technically the processor of your machine understands machine code (consisting of ones and zeroes). But in order to write such a machine code program, you first write it in assembly language and then use an assembler to convert it to machine code. However nothing is lost when the assembler does its conversion, since assembly language simply consists of mnemonic codes which are easy to remember (they are similar to words in the english language), which stand for each of the different machine code instructions that the machine is capable of executing. Here is an example of a short excerpt from an assembly language program:
MOV EAX,1 SHL EAX,5 MOV ECX,17 SUB EAX,ECX ....

An assembler would convert this set of instructions into a series of ones and zeros (i.e. an executable program) that the machine could understand.
TOP

What is it good for?


Because it is extremely low level, assembly language can be optimized extremely well. Therefore assembly language is used where the utmost performance is required for applications. Assembly language is also useful for communicating with the machine at a hardware level. For this reason, it is often used for writing device drivers. A third benefit of assembly language is the size of the resulting programs. Because no conversion from a higher level by a compiler is required, the resulting programs can be exceedingly small. For this reason, assembly language has been a language of choice for the demo scene. This involves coders writing extremely small programs which show off their creative and technical abilities to other members of the scene. In this tutorial you will learn how to write assembly language programs and how to make use of these to do interesting things such as calculations, graphics, writing windows programs and optimizing programs written in other languages.

Getting an Assembler

There are only two things required to get started with assembly language. An ordinary text editor to enable you to write your assembly language programs and an assembler. There are numerous good, free assemblers available on the web. Here are some of them: Lazy Assembler (LZASM) GoASM Flat Assembler (FASM) MASM32 Pass32 Netwide Assembler (NASM) Throughout these tutorials we will be making use of Flat Assembler (FASM). It is to be highly recommended, as it takes much of the pain out of writing Win32 applications in assembly language and it is flexible enough to do everything else you would want, even OpenGL.
TOP

The Linker
Technically you need a program called a linker as well. However, most decent assemblers come with a linker as part of the package. Basically a linker is used for large projects where there is more than one file to be assembled. Each of the assembly language files might contain references to code elements in the other files. A linker is the program that ties all the loose ends together and makes a single program out of the pieces. Thus, technically, an assembler converts assembly language files to object files (basically machine code with some loose ends), and a linker connects all the object files together and makes a single executable program out of them. Some powerful standalone linkers are also available free. Although we won't be needing one, as FASM comes with its own built in linker, here is a linker which has decent features:

ALINK Actually, a linker is not just specific to assembly language programing. It is used fairly much regardless of the language in which you are programming.

Setting up FASM
Before starting FASM, there is one slightly pesky problem that needs to be fixed. FASM doesn't know where to find its include files, so we add an environment variable to take care of this. If you are going to use the command line version of FASM, fasm.exe, then simply type
SET INCLUDE=C:\FASM\INCLUDE

at the command line (assuming of course that you unzipped FASM into the directory C:\FASM. If you wish to use the Windows version of FASM, fasmw.exe (recommended), then go to "Start->Control Panels->System->Advanced->Environment Variables", and make a new system variable, called include, with the value c:\fasm\include. Now you are ready to run FASM.
TOP

Example Programs
The first programs that you will want to run are the example programs that come with FASM. For example there is a "Hello" program, which simply displays a message box containing some text, on the screen. Inside FASM, open the examples directory and you will find the programs you can assemble and try out. Open the directory called Hello. Once you have opened the file which is called Hello.asm, simply go to the Run menu and run the file. The program will assemble and then run. All it does is pop a message box up on the screen.

The Hello World Program


The first program we look at is a simple Windows "Hello World" program:
include '%fasminc%/win32ax.inc'

.code start: invoke MessageBox,HWND_DESKTOP,"Hello World!","Win32 Assembly",MB_OK invoke ExitProcess,0 .end start

The first line includes a special macro file win32ax.inc which contains all the nasty business that you don't want to see when writing a Windows program. The next line tells the assembler we are going to include some code now. The line start: is simply a label. It just gives a name to one of the lines of our program so we can refer to it elsewhere. We'll use labels often, for example when writing loops or using other jump instructions. The label will simply allow us to refer to that particular location within our code, by name; say for example if we wanted to jump to that location from somewhere else. The program is ended with the .end directive. It takes one parameter, the name of a label corresponding to the entry point of the program (which doesn't need to be at the start of the program). When the program is loaded into memory by the operating system, execution of the program will begin at this point.
TOP

Windows API Calls


The remaining two lines are calls to the Windows API to tell it to display a message box and then quit. You'll need a decent Win32 API Reference if you want to write Windows programs. You can download one from the documentation section of the FASM website. The other features on these lines of code are parameters sent to the corresponding API functions, MessageBox and ExitProcess. Two of the constants used, are given names by the win32ax.inc file that we included. These are also the names given to these constants in the Win32 API reference. We defer discussing Win32 programming until later in the tutorial. But it is useful to see right from the start that FASM makes Win32 programming very easy.

Registers

Assembly language programming is actually quite primitive. You'll soon realise that about all the computer's processor can do is interact with data in the computer's memory and with the data in its own internal data stores, which are called registers. Your job as a computer programmer is to write code that tells the CPU what data to move where. For the time being, we'll just be doing 32 bit coding, so the internal registers in the CPU, for us, will be 32 bits. Each bit is capable of storing a 1 or a 0. The advantage of registers over computer memory is that they are extremely fast. Time critical programming applications often try to maximize the amount of computation that can be done in the CPU registers, instead of in the computer's memory. The CPU has four general purpose programming registers, EAX, EBX, ECX and EDX and a number of other specialised registers. Each of the four general purpose programming registers is 32 bits wide, but we can access the lower 16 bits of EAX (called AX) if we want to. Similarly the lower 16 bits of EBX is called BX, etc. Furthermore, we can access both the upper and lower 8 bits of AX (called AH and AL respectively), too, and similarly for BX, CX and DX. Refer to Diagram 1 below to see how EAX is laid out.

We'll introduce the other specialised CPU registers as we need them. For now the general purpose registers will be sufficient for us to get started with low level programming.
TOP

Moving Data into Registers


The first thing we will want to learn about programming at the level of the CPU, is how to put data into registers. For this we use the MOV instruction. (This is the first real assembly programming instruction we've met.) Here is a code fragment which moves the number 47 into the EDX register:

MOV EDX,47

Or, suppose that we wanted to move the hexadecimal number A4C9 into the 16 bit AX register:
MOV AX,0A4C9h

(Note the leading zero, and the trailing h, to denote that the number is hexadecimal.) Similarly, if we wished to move the binary number 01101110 into the 8 bit BH register we would type the following code:
MOV BH,01101110b

Binary
Since each bit of a register or memory location can store a one or a zero, we have to learn about binary arithmetic. This is similar to the ordinary decimal (base 10) arithmetic that we are used to, except instead of there being 10 possible digits in each place (0-9), there are now only two possible digits we can use (0-1). This means that the places in binary numbers correspond to powers of 2. The following table illustrates this.
Binary 1 10 100 1000 10000 100000 1000000 Decimal 2^0 = 1 2^1 = 2 2^2 = 4 2^3 = 8 2^4 = 16 2^5 = 32 2^6 = 64

10000000 2^7 = 128

To make other binary numbers, we simply add powers of 2 together. For example:
Binary Decimal 1101 101 1001 2^3 + 2^2 + 2^0 = 8 + 4 + 1 = 13 2^2+2^0 = 4 + 1 = 5 2^3 + 2^0 = 8 + 1 = 9

10100 2^4 + 2^2 = 16 + 4 = 20

Each number can be expressed uniquely as a binary number. To write a binary number in assembly language, we append a b to the number to indicate that it is binary, e.g. 11010110b. This corresponds to the decimal number 214. So now we have two ways to move the number 214 into a register, e.g.
MOV AH, 11010110b MOV BH, 214

Hexadecimal
Since it is rather annoying to write out many ones and zeroes, there is a shorthand way of writing binary, called hexadecimal. Each four binary bits are encoded by a single hexadecimal digit. Here is a table showing what each hexadecimal digit stands for.
Hexadecimal Binary Hexadecimal Binary 0 1 2 3 4 5 6 7 0000 0001 0010 0011 0100 0101 0110 0111 8 9 A B C D E F 1000 1001 1010 1011 1100 1101 1110 1111

To write hexadecimal numbers in assembly language, we append the number with an h and prepend it with a 0. So now we have a third way of writing the number 214 in assembly language. It is 0D6h. We can also specify numbers to be moved into registers, in hexadecimal.
MOV CH,0D6h

Of course hexadecimal is base 16, where there are 16 different possibilities for each digit (0-F), and where each place corresponds to a power of 16. E.g. 045C3h corresponds to the decimal number 4x16^3 + 5x16^2 + 12x16^1 +
3x16^0 = 16384 + 1280 + 192 + 3 = 17859.

Memory on a 32 Bit Machine


Each 8 bits of memory is called a byte, and this is the fundamental unit of memory on a 32 bit machine. Of course our computers have many megabytes or even gigabytes of memory. We can access each of the bytes of memory in the machine by specifying an address. This is a 32 bit number which represents the location of the byte of memory we are interested in. Of course, when we are writing an assembly language program, we have no idea which locations of memory our program code will be loaded into, when the program is run. Nor do we have any idea which memory locations will be allocated for data. Fortunately for us, the assembler and the operating system deal with these issues for us. Thus we rarely need to work with explicit addresses as such.
TOP

Data
Here we describe how to tell the assembler that we want to allocate small chunks of memory for storing data. This is somewhat like declaring variables in a higher level language, in that we give names to the chunks of space that we set aside for data. Here are some examples of declaring and initialising some variables.
myvar1 anothervar someval repeatvar string1 DB DW DD DB DB 3 03FAh 721099 7 dup(12,28) 'This is a string'

The first line sets aside a single byte of memory and initialises it to the value 3. This byte of memory can then be referred to in the program by the name myvar1. Essentially the word myvar1 represents the address of the memory location. If we want to refer to the actual value stored at that address (i.e. the value 3 until something changes it), we must write[myvar1]. The second line sets aside a word of data (two consecutive bytes) containing the value corresponding to the given hexadecimal number. The third line declares and initialises a double word (four bytes of data).

The next line makes use of the dup operator to set aside 14 bytes of data and to initialise it to seven copies of the two bytes 12, 28. This operator is quite useful for declaring arrays of bytes or words, etc, that are initialised to zero, e.g:
myarr DD 100 dup 0

The next line above sets aside 16 bytes of data and sets their contents to be equal to the ASCII values corresponding to the letters of the given string. This is how we can declare strings in assembly language.

Uninitialised Data
Sometimes we want to set aside some data without actually initialising it to any particular values. Here are some examples of doing this.
myval thisval array1 array2 DB DD RB RW ? ? 32 1000

The first two lines declare a byte and a double word respectively, without initialising them to anything. The second last line reserves 32 bytes of space, but doesn't set them to anything. The final line sets aside 1000 words of data without initialising them. Note that for large arrays it is best to reserve space for them and not initialise them in the data definition, but to write a routine in code that actually initialises the values in the array (if necessary). This can make your program smaller since the initialisation is done by a small piece of code instead of a long set of explicit values contained within your program. Since initialised and uninitialised data are treated differently, it is sensible to separate the two into different sections. Of course sometimes you want certain pieces of data to occur in a given order, in which case this can't be done, but otherwise, separating the two different kinds of data out gives the assembler a chance to make use of the distinction. Although not required, one can place .data before the data declarations in a program, just as we put .code before the code.

Moving Data Between Registers

We can move data from one register to another, so long as they are of the same size. For example, to move the contents of DH into CL we write:
MOV CL,DH

Note that the source register goes on the right and the destination register goes on the left.
TOP

Movement to and From Memory


We can use the same instruction to move data from memory to registers and vice versa. Note that we cannot move data from memory to memory with the MOV instruction. If we want to move the data at a byte memory location called myvar into AH say, we would write
MOV AH,[myvar]

Note that the square brackets tell the machine to move the actual data into AH, not the address of the data. The assembler requires that the source and destination are of matching sizes. For example, you can't move data from a variable which was declared as a byte of data, into a 16 or 32 bit register. However, this functionality can be easily overridden. Suppose we have a byte variable called myvar1 and we wish to move the word of data starting at that location, into the AX register. We simply type
MOV word AX,[myvar1]

Of course this will take both the byte of data stored in the variable myvar1 and the byte that just happens to follow it in memory, and place the two bytes as a single word into AX. The similar overrides for moving a byte and a double word of data are denoted byte and dword respectively.

Obviously if we want to move data from a register into memory, we just put the operands in the reverse order, e.g.
MOV [myvar1],CH

Moving Addresses into Registers


Sometimes we actually want to move the address of a variable into a register. Since addresses are 32 bits, we can only move addresses into the 32 bit registers. For example, suppose that we wished to move the address of the variable myvar2 into the EAX register. We simply type
MOV EAX,myvar2

The EAX register is now a pointer to myvar2. It does not contain the contents of myvar2 (which may not even be a double word), but it contains the address of myvar2.
TOP

Variables as Pointers
Once we have moved an address into a 32 bit register, we are then free to move it into a double word variable for storage. For example, suppose that EAX has been loaded with the address of some memory location storing a byte of data and suppose that we have a double word variable mypoint, say, that we want to store this address in. We simply write
MOV [mypoint],EAX

Now here comes the tricky part. Let's suppose we want to load the contents of the memory location now pointed to by mypoint, into the CH register. Firstly we have to retrieve the address from storage:
MOV EBX,[mypoint]

Now EBX points to the location in question. Now to retrieve the byte of data at that location, we write

MOV CH,[EBX]

Here the square brackets do not denote the contents of EBX itself (for which we would just write EBX), but rather, they denote the contents of the location pointed to by EBX. Although this usage of the square brackets may seem different to the earlier usage, it is in reality the same thing, since the thing inside the brackets is basically a pointer in both cases.

The Console
Assembly language programs would be fairly boring if there were no way to get data in and out of the program from the user. Whilst we could write an entire Windows program for this purpose, making use of the various Graphical User Interface (GUI) elements that Windows provides, it turns out there is an easier way to interact with the user. This is by means of a console. A console looks somewhat like a DOS box, and allows for capturing input from the keyboard and mouse and for outputting text on the screen. The operating system provides this service to us, and fortunately it is easy to make use of with very little knowledge of the Win32 API. Of course calling Win32 API functions is not true asembly language programming, in that we are not calling hardware level functions. However, the latter are quite complicated and so we settle for telling the operating system what we want to do.
TOP

Opening a Console
Opening a console is easy. We simply invoke the Win32 API function called AllocConsole. It takes no parameters. The simplest program for doing this is as follows.
include '%fasminc%/win32ax.inc' .code start: invoke invoke AllocConsole ExitProcess,0

.end start

Needless to say, this program doesn't do very much. It merely opens a console and immediately exits, without even waiting for a keypress. In the following sections we describe how to use the console to interact with the user.

Getting Input from the Console


In order to get input from the console, we first need to get a handle for the standard input device (basically the keyboard). To do this we invoke the GetStdHandle function (another Win32 API function) with STD_INPUT_HANDLE as the only parameter. When the function returns, the EAX register contains the handle that we are after. Our first task is to store it away in a variable so that it can be used later, whenever we need it. Now we are ready to obtain input from the console. To do so, we call the Win32 ReadConsole function. It takes five parameters. The first parameter is the handle of the standard input device, which we just obtained using GetStdHandle. The second parameter is a pointer to a memory buffer (an array of bytes) large enough to hold the characters that we want to read from the input device. The third parameter is the number of characters that we would like to read from the input device (this should be less than or equal to the number of bytes the buffer can hold). The fourth parameter is a pointer to a double word that will hold the number of characters actually read from the input device (perhaps the user doesn't enter as many characters as we would like them to). Finally, the fifth parameter is reserved and we can just set it to zero. By default, ReadConsole will wait until the user enters something at the keyboard, and will not return until the user has pressed the enter key. Here is a simple program which makes use of this functionality, to wait until the user has pressed the enter key before exiting. This is an extension of the example from an earlier section, but now we actually get to see the console that is opened, because it will stay there until the enter key is pressed.

include '%fasminc%/win32ax.inc' .data inchar numread inhandle .code start: invoke AllocConsole invoke GetStdHandle,STD_INPUT_HANDLE mov [inhandle],eax invoke ReadConsole, [inhandle],inchar,1,numread,0 invoke ExitProcess,0 .end start DB ? DD ? DD ?

Standard Output
In order to print text to the console, we first need to get a handle for the standard output device. Again we use GetStdHandle, but this time with the parameter STD_OUTPUT_HANDLE. To write to the screen, we first declare and initialise a string of characters that we want to print onscreen, then we call the WriteConsole function. This function is similar to the ReadConsole function. We pass it five parameters: the handle we just obtained, a pointer to the string that we wish to output, the number of characters of the string that we want to output, a pointer to a double word which can hold the number of characters actually written and finally a zero for a reserved parameter which does nothing. Here is the program that writes to the screen and then waits for the user to press the enter key.
include '%fasminc%/win32ax.inc' .data inchar DB ? numwritten DD ? numread DD ?

outhandle inhandle string1

DD ? DD ? DB "Hello World!"

.code start: invoke AllocConsole invoke GetStdHandle,STD_OUTPUT_HANDLE mov [outhandle],eax invoke GetStdHandle,STD_INPUT_HANDLE mov [inhandle],eax invoke WriteConsole, [outhandle],string1,12,numwritten,0 invoke ReadConsole, [inhandle],inchar,1,numread,0 invoke ExitProcess,0 .end start

Screen Output Part II

A Useful String Macro


FASM provides a useful macro that saves us having to explicitly declare and initialise a string. Instead of providing a pointer to the variable which contains our string, we can just type in the string itself. Thus the line that writes to the console becomes
invoke WriteConsole,[outhandle],"Hello World!",12,numwritten,0

Of course the problem with defining a string this way is that we can only use it once. However, if this is not a problem, it is a useful time saving device for us. Of course internally, the macro just replaces the string with a pointer to a string which is then declared elsewhere without us seeing it.
TOP

Newline Character

Each call to WriteConsole simply writes to the screen immediately following the text just written. To go to a new line, we need to send a newline character to the console. This character has ASCII code 13, so we declare such a character as follows:
endline DB 13

To output this character to the console, we just output it as a one byte string, just as we would any other string:
invoke WriteConsole,[outhandle],endline,1,numwritten,0

Loops
This web page describes how to program loops in assembly language. You won't be able to get far with programming if you are not able to create loops in your code. There are multiple ways of creating loops. This page will describe the simplest way, which resembles a simple for loop that might be used in a higher level language. Firstly, you need to specify how many times your code is going to loop. This is done by loading the loop count into the ECX register. Next, you start your code block with a label. This label indicates the point that your loop will return to after it has finished each iteration of the loop. Then comes the main code which you want to execute in your loop. Finally you finish your code with the LOOP instruction, giving the label name that you specified, as a parameter.
MOV ECX,100 mylabel: ;Main code block goes here. LOOP mylabel