Sie sind auf Seite 1von 65

Introduction to

Assembly Language
2nd Semester SY 2009-2010
Benjie A. Pabroa
What is Assembly
Language
 "High"-level languages such as BASIC,
FORTRAN, Pascal, Lisp, APL, etc. are
designed to ease the strain of
programming by providing the user with a
set of somewhat sophisticated operations
that are easily accessed
Assembly as Low-level
language
 The lesson we derive is this: a very low-level
language might be very flexible and
efficient (in terms of speed and memory
use), but might be very difficult to program
in since no sophisticated operations are
provided and since the programmer must
understand in detail the operation of the
computer
 Assembly language is essentially the lowest
possible level of language.
Built-in Features
 the ability to read the values stored at
various "memory locations",
 the ability to write a new value into a
memory location,
 the ability to do integer arithmetic of limited
precision (add, subtract, multiply, divide),
 The ability to do logical operations (or, and,
not, xor),
 and the ability to "jump" to programs stored
at various locations in the computer's
memory.
Features not included
 The ability to perform graphics
 and the ability to access files
 ability to directly perform floating-point
arithmeti
Assembly vs High Level
Lang
 FORTRAN code to average together the N numbers stored
in the array X(I):

 INTEGER*2 I,X(N)
 INTEGER*4 AVG
 .
 .
 .

 AVERAGE THE ARRAY X, STORING THE RESULT AS AVG:


 AVG=0
 DO 10 I=1,N
 AVG=AVG+X(I)
 AVG=AVG/N
 .
 .
 .
Assembly vs High Level
Lang
 mov cx,n ; cx is used as the loop
 ; counter. It starts at N and
 ; counts down to zero.
mov dx,0 ; the dx register stores the
 ; two most significant bytes of
 ; the running sum
mov ax,0 ; use ax to store the least
 ; significant bytes
mov si,offset x ; use the si register to point
 ; to the currently accessed
 ; element X(I), starting with
 ; I=0
Assembly vs High Level
Lang
 addloop:
add ax,word ptr [si] ; add X(I) to the two least
 ; significant bytes of AVG
adc dx,0 ; add the "carry" into the two
 ; most significant bytes of AVG
add si,2 ; move si to point to X(I+1)
loop addloop ; decrement cx and loop again
 ; if not zero
div n ; divides AVG by N
mov avg,ax ; save the result as AVG
Assembly vs High Level
Lang
 writing it required intimate knowledge of
how the variables x, n, and avg were
stored in memory.

PC System Architecture
 Microprocessor
◦ Reading instructions from the memory and
executing them
 Access memory
Do arithmetic and logical operations
Performs other services as well
PC System Architecture

 1971:
◦ Intel’s 4004 was the first microprocessor—a 4-bit CPU (like the one
from CS231) that fit all on one chip.
 1978:
◦ The 8086 was one of the earliest 16-bit processors.
 1981:
◦ IBM uses the 8088 in their little PC project.
 1989:
◦ The 80486 includes a floating-point unit in the same chip as the main
processor, and uses RISC-based implementation ideas like pipelining
for greatly increased performance.
 1997:
◦ The Pentium II is superscalar, supports multiprocessing, and includes
special instructions for multimedia applications.
 2002:
◦ The Pentium 4 runs at insane clock rates (3.06 GHz), implements
extended multimedia instructions and has a large on-chip cache.
PC System Architecture..
 Memory
◦ Store instructions(program) or data
◦ It appears as a sequence of locations(or
addresses)
Each address – stored a byte
◦ Types:
ROM
 Stored byte may only be read by the CPU
Cannot be changed
RAM
Stored byte may be both read and
written(changed)
Volatile – all data will be lost after shutdown
Both types are random access
The Process of Assembly
 Assembly language is a compiled language
◦ Source-code must first be created with a text-
editor program
◦ Then the source-code will be compiled
◦ Assembly language compilers => assemblers
 Auxiliary Programs
◦ First: text-editor(source code editor)
◦ Second: assembler
Assembles source code to generate object code
in the process.
◦ Third: Linker
Combines object code modules created by
assembler
The Process of Assembly..
◦ Fourth: Loader
Built-in to the operating system and is never
explicitly executed.
Takes the “relocatable” code created by the
linker, “loads: it into memory at the lowest
available location, then runs it.
◦ Fifth: Debugger
Environment for running and testing assembly
language programs.
The Process of Assembly..

Object Code Linker Relocatable Code Loader

RAM
Assem bler

Source Code Other Object Code1


Other Object Code2
DOS and Simple File
Operation
 DOS
◦ provides the environment in which programs
run.
◦ Provides a set of helpful utility functions
Must be understood in order to create program
in DOS
Making an assembly Source
Code
 You can use the edit command in DOS or
just use the notepad.

AH AL
BH BL CS
CH CL DS
DH DL SS
SP ES
BP
SI
DI

Bus Cont rol Unit

ALU
CU 1
Flag Register 2
3
4
Instruction Pointer
CPU Registers
 Assembly language
◦ Thought goes into the use of the computer
memory and the CPU registers
 Register
◦ Like a memory location in that it can store a
byte (or work) value.
◦ No address in the memory, it is not part of the
computer memory(built into the CPU)
CPU Registers
 Importance of Registers in Assembly Prog.
◦ Instructions using registers > operating on
values stored at memory locations.
◦ Instructions tend to be shorter (less room to
store in memory)
◦ Register-oriented instructions operate faster that
memory-oriented instructions
Since the computer hardware can access a
register much faster than a memory location.

CPU Registers (8086
family)
AX The Accumulator SP The stack pointer
BX The Pointer Register IP The Instruction pointer
CX The Loop Counter CS The “code segment”
DX Used for multiplication DS register
The “data segment”
SI and Division
The “Source” string SS register
The “stack segment”
DI index register
The “Destination” ES register
The “Extra segment”
BP String
Used forindex register
passing FLAG register
The flag register
arguments on the stack
Segment Registers
CS Code Segment 16-bit number that points to
the active code-segment

DS Data Segment 16-bit number that points to


the active data-segment

SS Stack Segment 16-bit number that points to


the active stack-segment

ES Extra Segment 16-bit number that points to


the active extra-segment
Pointer Registers
IP Instruction Pointer 16-bit number that
points to the offset of
the next instruction

SP Stack Pointer 16-bit number that


points to the offset
that the stack is using

BP Base Pointer used to pass data to


and from the stack
General Purpose Registers
AX Accumulator Register mostly used for
calculations and for
input/output
BX Base Register Only register that can
be used as an index

CX Count Register register used for the


loop instruction
DX Data Register input/output and used
by multiply and
divide
Index Registers
SI Source Index used by string
operations as
source

DI Destination Index used by string


operations as
destination
CPU registers
◦ AX, BX, CX, & DX – more flexible that other
Can be used as word registers(16-bit val)
Or as a pairs of byte registers (8-bit vals)
◦ A General purpose registers can be “split”
AX = AH + AL
BX = BH + BL
CX = CH + CL
DX = DH + DL
◦ Ex: DX = 1234h, then DH = 12h and DL = 34h
Flag Registers
 Consist of 9 status bits(flags)
 Flags – because it can be either
◦ SET(1)
◦ NOT SET(0)
Flag Registers
Abr. Name bit nº Description
OF Overflow Flag 11 indicates an overflow when
set

DF Direction Flag 10 used for string operations to


check direction

IF Interrupt Flag 9 if set, interrupt are enabled,


else disabled

TF Trap Flag 8 if set, CPU can work in


single step mode

SF Sign Flag 7 if set, resulting number of


calculation is negative
Flag Registers..
Abr. Name bit nº Description
ZF Zero Flag 6 if set, resulting number
of calculation is zero

AF Auxiliary Carry 4 some sort of second


carry flag

PF Parity Flag 2 indicates even or odd


parity

CF Carry Flag 0 contains the left-most bit


after calculations
Test it
 You want to see all these register and flags?
◦ go to DOS
◦ Type debug
◦ type "r"
◦ The you’ll see all the registers and some
abbreviations for the flags.
◦ Type "q" to quit again.
Memory Segmentation
 How DOS uses memory
◦ databus = 16-bit
it can move and store 16 bits(1 word = 2 bytes)
at a time.
◦ If the processor store 1 word (16-bits) it stores
the bytes in reverse order in the memory.
1234h (word) ---> memory 34h (byte) 12h
(byte)
Memory value: 78h 56h
derived value 5678h
Memory Segmentation..
 Computer divides it memory into segments
◦ Standard in DOS
◦ Segments are 64KB big and have a number
◦ These numbers are stored in the segment
registers (see above).
◦ Three main segments are the code, data and
stack segment
 Overlap each other almost completely
Try type d in the debug
 4576:0100 -> memory address
where 4576 – segment number; 0100 – offset
Memory Segmentation..
 Segments overlaps
◦ The address 0000:0010 = 0001:0000
◦ Therefore, segments starts at paragraph
boundaries
A paragraph = 16 bytes
So a segment starts at an address divisible by 16

◦ 0000:0010 => 0h:10h => 0:16


Memory Location: (0*16)+16 = 0+16 = 16 (linear
address)

◦ 0001:0000 => 1h:0h => 1:0


Memory Location: (1*16)+0 = 16+0 = 16 (linear
address)
.model small
.stack

.data
My First Program
message db "Hello world, I'm learning Assembly !!!", "$"

 .code

 main proc
 mov ax,seg message
 mov ds,ax

 mov ah,09
 lea dx,message
 int 21h

 mov ax,4c00h
 int 21h
main endp

end main
Names
 Identifiers
◦ An identifier is a name you apply to items in
your program. the two types of identifiers are
"name", which refers to the address of a data
item, and "label", which refers to the address
of an instruction. The same rules apply to
names and labels

 Statements
◦ A program is made of a set of statements, there
are two types of statements, "instructions"
such as MOV and LEA, and "directives" which
tell the assembler to perform a specific action,
like ".model small“ or “.code”
Statements
 Here's the general format of a statement:

indentifier - operation - operand(s) - comment



◦ The identifier is the name as explained above.
◦ The operation is an instruction like MOV.
◦ The operands provide information for the
Operation to act on.
◦ Like
MOV (operation) AX,BX (operands).
◦ The comment is a line of text you can add as a
comment, everything the assembler sees after
a ";" is ignored.

Statements
 Example
◦ MOV AX,BX ;this is a MOV instruction

How to Assemble
 The source code can only be assembled by
an assembler or and the linker.
◦ A86
◦ MASM
◦ TASM – we will use this one
 Install TASM
 Then use the tasm.exe and tlink.exe
How to Assemble
• The Assemble
– To assemble Type the ff. on the
command prompt:
• cd c:\tasm\bin
• tasm <filename/path of the source code>
– tasm c:\first.asm
• tlink <filename/path of the object code>
– tlink c:\tasm\bin\first.obj or
– tlink first.obj
– To run call the .exe on the command
prompt:
• Example in our program(First.asm)
.model small
.stack
.data
message db "Hello world, I'm learning Assembly !!!", "$"

.code

main proc
mov ax,seg message
mov ds,ax

mov ah,09
lea dx,message
int 21h

mov ax,4c00h
int 21h
main endp
end main
Dissecting Code
 .model small
◦ Lines that start with a "." are used to provide the assembler
with information.
◦ The word(s) behind it say what kind of info.
 In this case it just tells the assembler that the program is small
and doesn't need a lot of memory. I'll get back on this later.
 .stack
◦ This one tells the assembler that the "stack" segment starts
here.
 The stack is used to store temporary data.

 .data
◦ indicates that the data segment starts here and that the stack
segment ends there.
.model small
.stack
.data
message db "Hello world, I'm learning Assembly !!!", "$"

.code
main proc
mov ax,seg message
mov ds,ax

mov ah,09
lea dx,message
int 21h

mov ax,4c00h
int 21h
main endp
end main
Dissecting Code..
 .code
◦ indicates that the code segment starts there and the data
segment ends there.

 main proc
◦ Code must be in procedures, just like in C or any other language.
◦ This indicates a procedure called main starts here.
◦ endp states that the procedure is finished.
◦ endmain main : tells the assembler that the program is finished.
◦ It also tells the assembler where to start in the program.
 At the procedure called main in this case.

 message db "xxxx"
◦ DB means Define Byte and so it does.
◦ In the data-segment it defines a couple of bytes.
◦ These bytes contain the information between the brackets.
◦ "Message" is a name to indentify this byte-string.
◦ It's called an "indentifier".


 Memory space for variables
◦ DB (Byte – 8 bit )
◦ DW (Word – 16 bit)
◦ DD (Doubleword – 32 bit)
◦ Example:
 foo db 27 ;by default all numbers are decimal
bar dw 3e1h ; appending an "h" means hexadecimal
real_fat_rat dd ? ; "?" means "don't care about the value“
◦ Variable name
 Address can’t be changed
Value can be changed
.model small
.stack
.data
message db "Hello world, I'm learning Assembly !!!", "$"

.code

main proc
mov ax, seg message
mov ds,ax

mov ah,09
lea dx,message
int 21h

mov ax,4c00h
int 21h
main endp
end main
Dissecting Code..
 mov ax, seg message
◦ AX is a register.
 You use registers all the time, so that's why you had to know
about them before.
◦ MOV is an instruction that moves data.
 It can have a few "operands“
 Here the operands are AX and seg message.
◦ seg message can be seen as a number.
 It's the number of the segment "message“ in (The data-segment)
 We have to know this number, so we can load the DS register
with it.
 Else we can't get to the bit-string in memory.
 We need to know WHERE the bit-string is located in memory.
◦ The number is loaded in the AX register.
 MOV always moves data to the operand left of the comma and
from the operand right of the comma.
The MOV Instruction
 Syntax:

◦ MOV destination, source


 Allows you to move data into and out the


registers
◦ Destination
either registers or mem. Loc.
◦ Source
can be either registers, mem. Loc. or numeric
value

 Memory-to-memory transfer NOT ALLOWED
The MOV Instruction
 foo db 27 ;by default all numbers are decimal

Codes we do earlier  bar dw 3e1h ; appending an "h" means hexadecimal


 real_fat_rat dd ? ; "?" means "don't care about the value“


otice the size of the source and destination
 mov ax,bar ; load the word-size register ax with
(must match in  ; the word value stored at location bar.
reg-reg,  mov dl,foo ; load the byte-size register dl with
mem-reg,  ; the byte value stored at location foo.
reg-mem  mov bx,ax ; load the word-size register bx with
Transfers)  ; the byte value in ax.
 mov bl,ch ; load the byte-size register bl with
 ; the byte value in ch.
 mov bar,si ; store the value in the word-size
 ; register si at the memory location
 ; labelled "bar".
 mov foo,dh ; store the byte value in the register
 ; dh at memory location foo.
 mov ax,5
onstant must consistent with the destination ; store the word 5 in the ax register.
 mov al,5 ; store the byte 5 in the al register.
 mov bar,5 ; store the word 5 at location bar.
 mov foo,5 ; store the byte 5 at location foo.
Illegal Move Statement
◦ MOV AL, 3172
◦ MOV foo, 3172

 Why the code above are Illegal?



.model small
.stack
.data
message db "Hello world, I'm learning Assembly !!!", "$"

.code

main proc
mov ax, seg message
mov ds,ax
mov ah,09
lea dx,message
int 21h

mov ax,4c00h
int 21h
main endp
end main
Dissecting Code..

 mov ds,ax
◦ Here it moves the number in the AX register (the number of
the data segment) into the DS register.
◦ We have to load this DS register this way (with two
instructions)
◦ Just typing: "mov ds,segment message" isn't possible.

 mov ah, 09
◦ MOV again. This time it load the AH register with the constant
value nine.

 lea dx, message


◦ LEA - Load Effective Address.
 This instructions stores the offset within the datasegment of the
bit-string message into the DX register.
 This offset is the second thing we need to know, when we want to
know where "message" is in the memory.
 So now we have DS:DX.

.model small
.stack
.data
message db "Hello world, I'm learning Assembly !!!", "$"

.code

main proc
mov ax,seg message
mov ds,ax

mov ah,09
lea dx,message
int 21h
mov ax,4c00h
int 21h
main endp
end main
Dissecting Code..
 int 21h
◦ This instruction causes an Interrupt.
◦ The processor calls a routine somewhere in memory.
◦ 21h tells the processor what kind of routine, in this case a DOS
routine.
◦ For now assume that INT just calls a procedure from DOS.
◦ The procedure looks at the AH register to find out what it has to do.
◦ In this example the value 9 in the AH register indicates that the
procedure should write a bit-string to the screen.

 mov ax, 4c00h


◦ Load the Ax register with the constant value 4c00h

 int 21h
◦ this time the AH register contains the value 4ch (AX=4c00h) and to
the DOS procedure that means "exit program".
◦ The value of AL is used as an "exit-code" 00h means "No error"

 After running:
◦ Go to DOS and type “FIRST.exe” to debug.
◦ Type d -> display some addresses
◦ Type u -> you will see something
 0F77:0000 B8790F MOV AX,0F79
0F77:0003 8ED8 MOV DS,AX
0F77:0005 B409 MOV AH,09

Segm ent Num ber & Offset

Machine Code inst ruct ion

0F77:0000 B8790F MOV AX,0F79


0F77:0003 8ED8 MOV DS,AX
0F77:0005 B409 MOV AH,09

0F77:0000 B8790F MOV AX,0F79

originally: mov ax, seg message


B8 ->mov ax
790F ->number

It means that data is store in the segment with number 0F79


 The other instruction lea dx,message
turned into mov dx,0.
◦ So that means that the offset of the bit-string is
0 --> 0F79:0000.
◦ Try to type d 0F79:0000

◦ Calculating other address
We will subtract 2 segments from 0F79 = 0F77
2 segments = 32 bit (0002:0000)
The other address is 0F77:0020


The Stack
 The stack is a place where data is
temporarily stored
 The SS and SP registers point to that place
like this: SS:SP
◦ So the SS register is the segment and the SP
register contains the offset
 There are a few instructions that make use
of the stack
◦ PUSH - Push a value on the stack
◦ POP - retrieve that value from the stack
The Stack
 MOV AX,1234H
PUSH AX
MOV AH,09
INT 21H
POP AX
◦ The final value of AX will be 1234h.
First we load 1234h into AX,
then we push that value to the stack.
We now store 9 in AH, so AX will be 0934h
and execute an INT.
Then we pop the AX register.
We retrieve the pushed value from the stack.
So AX contains 1234h again
The Stack
MOV AX, 1234H
MOV BX, 5678H
PUSH AX
POP BX
◦ We pushed the AX to the stack
◦ and we popped that value in BX.

◦ What is the final value of AX and BX?
The Stack
 It is easy done by the instruction .stack that
will create a stack of 1024 bytes.
 The stack uses a LIFO system (Last In First
Out)
The Stack
MOV AX,1234H
MOV BX,5678H
PUSH AX
PUSH BX
POP AX
POP BX
First the value 1234h was pushed after that the
value 5678h was pushed to the stack.
According to LIFO 5678h comes of first, so AX will
pop that value and BX will pop the next.
What is the value of AX and BX?
How does the stack look in
memory?
 it "grows" downwards in memory.
 When you push a word (2 bytes) for
example, the word will be stored at SS:SP
and SP will be decreased to times.
 So in the beginning SP points to the top of
the stack and (if you don't pay attention) it
can grow so big downwards in memory
that it overwrites the source code.
 Major system crash is the result.
Congatulation!!
 If you fully understand this stuff (registers,
flags, segments, stack, names, etc.) you
may, from now on, call yourself a

 "Level 0 Assembly Coder"

Das könnte Ihnen auch gefallen