You are on page 1of 16

Registers

In 16-bit mode, such as provided by the Pentium processor when operating as a


Virtual 8086 (this is the mode used when Windows 95 displays a DOS prompt), the
processor provides the programmer with 14 internal registers, each 16 bits wide.
They are grouped into several categories as follows:

• Four general-purpose registers, AX, BX, CX, and DX. Each of these is a
combination of two 8-bit registers which are separately accessible as AL, BL,
CL, DL (the "low'' bytes) and AH, BH, CH, and DH (the "high'' bytes). For
example, if AX contains the 16-bit number 1234h, then AL contains 34h and
AH contains 12h.
• Four special-purpose registers, SP, BP, SI, and DI.
• Four segment registers, CS, DS, ES, and SS.
• The instruction pointer, IP (sometimes referred to as the program counter).
• The status flag register, FLAGS.

Although I refer to the first four registers as "general-purpose'', each of them is designed to play a
particular role in common use:

• AX is the "accumulator''; some of the operations, such as MUL and DIV,


require that one of the operands be in the accumulator. Some other
operations, such as ADD and SUB, may be applied to any of the registers (that
is, any of the eight general- and special-purpose registers) but are more
efficient when working with the accumulator.
• BX is the "base'' register; it is the only general-purpose register which may
be used for indirect addressing. For example, the instruction MOV [BX], AX
causes the contents of AX to be stored in the memory location whose address
is given in BX.
• CX is the "count'' register. The looping instructions (LOOP, LOOPE, and
LOOPNE), the shift and rotate instructions (RCL, RCR, ROL, ROR, SHL, SHR, and
SAR), and the string instructions (with the prefixes REP, REPE, and REPNE) all
use the count register to determine how many times they will repeat.
• DX is the "data'' register; it is used together with AX for the word-size MUL
and DIV operations, and it can also hold the port number for the IN and OUT
instructions, but it is mostly available as a convenient place to store data, as
are all of the other general-purpose registers.

Here are brief descriptions of the four special-purpose registers:

• SP is the stack pointer, indicating the current position of the top of the stack.
You should generally never modify this directly, since the subroutine and
interrupt call-and-return mechanisms depend on the contents of the stack.
• BP is the base pointer, which can be used for indirect addressing similar to
BX.
• SI is the source index, used as a pointer to the current character being read
in a string instruction (LODS, MOVS, or CMPS). It is also available as an offset to
add to BX or BP when doing indirect addressing; for example, the instruction
MOV [BX+SI], AX copies the contents of AX into the memory location whose
address is the sum of the contents of BX and SI.
• DI is the destination index, used as a pointer to the current character being
written or compared in a string instruction (MOVS, STOS, CMPS, or SCAS). It is
also available as an offset, just like SI.

Since all of these registers are 16 bits wide, they can only contain addresses for memory within a range of
64K (=2^16) bytes. To support machines with more than 64K of physical memory, Intel implemented the
concept of segmented memory. At any given time, a 16-bit address will be interpreted as an offset within
a 64K segment determined by one of the four segment registers (CS, DS, ES, and SS).

As an example, in the instruction MOV [BX], AX mentioned above, the BX register really provides the
offset of a location in the current data segment; to find the true physical address into which the contents of
the accumulator will be stored, you have to add the value in BX to the address of the start of the data
segment. This segment start address is determined by taking the 16-bit number in DS and multiplying by
16. Therefore, if DS contains 1234h and BX contains 0017h, then the physical address will be 1234h
TIMES 16+0017h=12340h+0017h=12357h. (This computation illustrates one reason why hexadecimal is
so useful; multiplication by 16 corresponds to shifting the hex digits left one place and appending a zero.)
We refer to this combined address as 1234:0017 or, more generally, as DS:BX.

Since segment starts are computed by multiplying a 16-bit number by 16=2^4, the effect is that physical
addresses have a 20-bit range, so that a total of 1M (=2^20) of memory may be used. Intel considered that
this would be enough for applications of the 8086 over its projected lifetime of about five years from its
introduction in 1978; by the time microcomputers were needing more than a meg of main memory, the
next Intel processor (the iAPX432) was due to be available, with a 32-bit address space (able to address
4G---over 4 billion memory locations). However, the IBM PC's debut in 1981 and subsequent popularity
has forced Intel to continue the 80x86 family of backward-compatible processors to the present, including
support for a mode in which only 1M of memory is accessible. Processors since the 80286 have also
provided the "protected'' mode of operation, which in the Pentium gives each process a flat 32-bit address
space of up to 4G.

You might think that a segment register would only need to provide the uppermost 4 bits to extend an
address out to 20 bits, but consider one of the implications of only having 16 different, non-overlapping
segments: every segment would have to occupy a full 64K of memory, even if only a small fraction of
this space were needed. By allowing a segment to start at any address divisible by 16, the memory may be
allocated much more efficiently---if one program only needs 4K for its code segment, then theoretically
the operating system could load another program into a segment starting just 4K above the start of the
first. Of course, MS-DOS is not really this sophisticated, but the Intel designers wanted it to be possible.

Each segment register has its own special uses:

• CS determines the "code'' segment; this is where the executable code of a


program is located. It is not directly modifiable by the programmer, except by
executing one of the branching instructions. One of the reasons for
separating the code segment from other segments is that well-behaved
programs never modify their code while executing; therefore, the code
segment can be identified as "read-only''. This simplifies the work of a cache,
since no effort is required to maintain consistency between the cache and
main memory. It also permits several instances of a single program to run at
once (in a multitasking operating system), all sharing the same code segment
in memory; each instance has its own data and stack segments where the
information specific to the instance is kept. Picture multiple windows, each
running Word on a different document; each one needs its own data segment
to store its document, but they can all execute the same loaded copy of
Word.
• DS determines the "data'' segment; it is the default segment for most
memory accesses.
• ES determines the "extra'' segment; it can be used instead of DS when data
from two segments need to be accessed at once. In particular, the DI register
gives an offset relative to ES when used in the string instructions; for
example, the MOVSB instruction copies a byte from DS:SI to ES:DI (and also
causes SI and DI to be incremented or decremented, ready to copy the next
byte).
• SS determines the "stack'' segment; the stack pointer SP gives the offset of
the current top-of-stack within the stack segment. The BP register also gives
an offset relative to the stack segment by default, for convenient access to
data further down in the stack without having to modify SP. Just as with SP,
you should not modify SS unless you know exactly what you are doing.

The instruction pointer, IP, gives the address of the next instruction to be executed, relative to the code
segment. The only way to modify this is with a branch instruction.

The status register, FLAGS, is a collection of 1-bit values which reflect the current state of the processor
and the results of recent operations. Nine of the sixteen bits are used in the 8086:

• Carry (bit 0): set if the last arithmetic operation ended with a leftover carry
bit coming off the left end of the result. This signals an overflow on unsigned
numbers.
• Parity (bit 2): set if the low-order byte of the last data operation contained an
even number of 1 bits (that is, it signals an even parity condition).
• Auxiliary Carry (bit 4): used when working with binary coded decimal (BCD)
numbers.
• Zero (bit 6): set if the last computation had a zero result. After a comparison
(CMP, CMPS, or SCAS), this indicates that the values compared were equal
(since their difference was zero).
• Sign (bit 7): set if the last computation had a negative result (a 1 in the
leftmost bit).
• Trace (bit 8): when set, this puts the CPU into single-step mode, as used by
debuggers.
• Interrupt (bit 9): when set, interrupts are enabled. This bit should be cleared
while the processor is executing a critical section of code that should not be
interrupted (for example, when processing another interrupt).
• Direction (bit 10): when clear, the string operations move from low addresses
to high (the SI and DI registers are incremented after each character). When
set, the direction is reversed (SI and DI are decremented).
• Overflow (bit 11): set if the last arithmetic operation caused a signed
overflow (for example, after adding 0001h to 7FFFh, resulting in 8000h; read
as two's complement numbers, this corresponds to adding 1 to 32767 and
ending up with -32768).

There are numerous operations that will test and manipulate various of these flags,
but to get the contents of the entire FLAGS register one has to push the flags onto
the stack (with PUSHF or by calling an appropriate interrupt handler with INT) and
then pop them off into another register. To set the entire FLAGS register, the
sequence is reversed (with POPF or IRET). For example, one way to set the carry
flag (there are much better ways, including the STC instruction) is the following:

PUSHF
POP AX
OR AX, 1
PUSH AX
POPF
Most of the time you will not have to deal with the FLAGS register explicitly; instead,
you will execute one of the conditional branch instructions, Jcc, where cc is one of
the following mnemonic condition codes:

• O, Overflow
• NO, Not Overflow
• B, Below; C, Carry; NAE, Not Above or Equal
• NB, Not Below; NC, Not Carry; AE, Above or Equal
• E, Equal; Z, Zero
• NE, Not Equal; NZ, Not Zero
• BE, Below or Equal; NA, Not Above (true if either Carry or Zero is set)
• NBE, Not Below or Equal; A, Above
• S, Sign
• NS, Not Sign
• P, Parity; PE, Parity Even
• NP, Not Parity; PO, Parity Odd
• L, Less; NGE, Not Greater or Equal (true if Sign and Overflow are different)
• NL, Not Less; GE, Greater or Equal
• LE, Less or Equal; NG, Not Greater (true if Sign and Overflow are different, or
Zero is set)
• NLE, Not Less or Equal; G, Greater

All of the conditions on the same line are synonyms. The Above and Below
conditions refer to comparisons of unsigned numbers, and the Less and Greater
conditions refer to comparisons of signed (two's complement) numbers.

Addressing Modes

Operands may be specified in one of three basic forms: immediate, register and
memory.
An immediate operand is just a number (or a label, which the assembler converts to the corresponding
address). An immediate operand is used to specify a constant for one of the arithmetic or logical
operations, or to give the jump address for a branching instruction. Most assemblers, including NASM,
allow simple arithmetic expressions when computing immediate operands. For example, all of the
following are equivalent:

MOV AL, 13
MOV AL, 0xD
MOV AL, 0Ah + 3 ;Note leading 0 to distinguish from register AH
MOV AL, George * 2 - 1
assuming that the label George is associated with the address 7.

A register operand is one of the eight general- and special-purpose 16-bit registers listed above, or one of
the eight general-purpose 8-bit registers (AL, AH, ...), or one of the four segment registers. The contents
of the register are used and/or modified by the operation. In the example above, the destination operand of
the MOV instruction is the low byte of the accumulator, AL; the effect of the instruction is to store the
binary number 00001101 into the bottom eight bits of AX (leaving the other bits unchanged).

A memory operand gives the address of a location in main memory to use in the operation. The NASM
syntax for this is very simple: put the address in square brackets. The address can be given as an
arithmetic expression involving constants and labels (the displacement), plus an optional base or index
register. Here are some examples:

MOV DX, [1234h]


ADD DX, [BX + 8]
MOV [BP + SI], DL
INC BYTE [0x100 + CS:DI]
A few of comments are needed on these examples. In the third example, the
address is given by the sum of the contents of BP and SI; you can imagine that
there is a default displacement of zero here, so the address is 0 + BP + SI. In the
first example, the destination is 16 bits wide, so a 16-bit quantity will be fetched
from two adjacent memory locations: DL will be loaded with the byte from 1234h,
and DH will be loaded from 1235h. The same will happen in the second example: DL
will have the contents of address BX + 8 added to it, and DH will have the contents
of address BX + 9 added (plus any carry from the low byte). In the third example,
the source is only 8 bits wide, so only the byte at address BP + SI will be changed.
Finally, in the fourth example, the INC operation by itself is ambiguous about
whether it is incrementing a single byte or a full 16-bit word; the keyword BYTE in
front of the operand determines that only the byte at the address 100h + DI will be
affected (the alternative would be to use the keyword WORD, to add 1 to the
combination of bytes at 100h + DI and 101h + DI).

All of these addresses are really offsets into a particular segment. In the fourth example, the code segment
is explicitly called for by the segment override CS:. The default segment is the data segment, except when
the base register BP is involved, in which case the stack segment is used (as in the third example).

As one more example of a memory operand, consider the following sequence of instructions:
MOV BX, 100h
MOV SI, 20h
MOV AL, [BX + SI + 3]
The effective address of the third move instruction is computed by adding the
contents of the BX and SI registers, plus the constant 3; therefore, the byte that is
moved into AL comes from address 0123h (interpreted as an offset within the data
segment).

Instructions

Here are the most important instructions (in my opinion) that have been available
on all Intel processors since the 8086. Different assemblers may have minor
variations in how these instructions are represented in assembly code; I give the
NASM form here. Throughout this section, when specifying the valid forms of
operands, I will write reg8 to stand for any 8-bit register, reg16 for any of the eight
general- and special-purpose 16-bit registers, mem8 for a memory reference to a
single byte, mem16 for a memory reference to a word (with the low-order byte at the
given address), imm8 for an 8-bit immediate value, and imm16 for a 16-bit immediate
value. If an operand may be either a register or memory reference, I will write r/m8
or r/m16; if it may also be an immediate value, then I will write r/m/i8 or r/m/i16.
A segment register as an operand will be written segreg.

Data Movement Instructions

The fundamental data movement operation is MOV dest, source, which copies a byte or a word from the
source location to the destination. In general, either the source or the destination must be a register (you
can't copy directly from one memory location to another with MOV); the only exception is that an
immediate value may be moved straight to memory (however, there is no way to put an immediate value
into a segment register in one operation). Here are the accepted forms:

MOV reg8, r/m/i8


MOV mem8, reg8
MOV mem8, BYTE imm8

MOV reg16, r/m/i16


MOV mem16, reg16
MOV mem16, WORD imm16

MOV r/m16, segreg


MOV segreg, r/m16
The CS segment register may not be used as a destination (you wouldn't want to do
this anyway, since it would change where the next instruction comes from; to get
this effect, you need to use a proper flow control instruction such as JMP).
To perform a swap of two locations instead of a one-way copy, there is also an exchange operation:

XCHG reg8, r/m8


XCHG reg16, r/m16
As a special case of this that does nothing except occupy space and take up
processor time, the instruction to exchange the accumulator with itself (XCHG AX,
AX) is given the special "no-operation'' mnemonic:

NOP
For the special purpose of copying a far pointer (that is, a pointer that includes a
segment address, so that it can refer to a location outside the current segment)
from memory into registers, there are the LDS and LES instructions. Here are the
accepted forms:

LDS reg16, mem32


LES reg16, mem32
For example, the instruction LDS SI, [200h] is equivalent to the pair of
instructions MOV SI, [200h] and MOV DS, [202h]. The 8086 only supports loading
the pointer into the DS or ES segment register.

An operation that is frequently useful when setting up pointers is to load the "effective address'' of a
memory reference. That is, this instruction does the displacement plus base plus index calculation, but
just stores the resulting address in the destination register, rather than actually fetching the data from the
address. Here is the only form allowed on the 8086:

LEA reg16, mem


To push and pop data from the stack, the 8086 provides the following instructions.
The top of stack is located at offset SP within the stack segment, so PUSH AX, for
example, is equivalent to SUB SP, 2 (recall that the stack grows downward)
followed by MOV [SS:SP], AX (except that [SS:SP] isn't a valid form of memory
reference).

PUSH r/m16
PUSH segreg

POP r/m16
POP segreg
As with MOV, you are not allowed to POP into the CS register (although you may PUSH
CS).

Although they were not provided on the original 8086, the instructions to push and pop the FLAGS
register (as mentioned earlier) are available in Virtual-8086 mode on the Pentium (they were actually
introduced in the 80186):

PUSHF
POPF
Here are the other ways of reading or modifying the FLAGS register (apart from
setting flags as the result of an arithmetic operation, or testing them with a
conditional branch, of course). The Carry, Direction, and Interrupt Enable flags may
be cleared and set:

CLC
CLD
CLI
STC
STD
STI
The Carry flag may also be complemented, or "toggled'' between 0 and 1:

CMC
Finally, the bottom eight bits of the FLAGS register (containing the Carry, Parity,
Auxiliary Carry, Zero, and Sign flags, as described above) may be transferred to and
from the AH register:

LAHF
SAHF

Arithmetic and Logical Instructions

All of the two-operand arithmetic and logical instructions offer the same range of
addressing modes. For example, here are the valid forms of the ADD operation:

ADD reg8, r/m/i8


ADD mem8, reg8
ADD mem8, BYTE imm8

ADD reg16, r/m/i16


ADD mem16, reg16
ADD mem16, WORD imm16
Just as with the MOV instruction, the first operand is the destination and the second
is the source; the result of performing the operation on the two operands is stored
in the destination (if it gets stored anywhere). Unlike MOV, most of these instructions
also set or clear the appropriate status flags to reflect the result of the operation
(for some of the instructions, this is their only effect).

To add two numbers, use the ADD instruction. To continue adding further bytes or words of a multi-part
number, use the ADC instruction to also add one if the Carry flag is set (indicating a carry-over from the
previous byte or word). For example, to add the 32-bit immediate value 12345678h to the 32-bit double
word stored at location 500h, do ADD [500h], 5678h followed by ADC [502h], 1234h.

Subtraction is analogous: use the SUB instruction to subtract a single pair of bytes or words, and then use
the SBB ("Subtract with Borrow'') instruction to take the Carry into account for further bytes or words.
An important use of subtraction is in comparing two numbers; in this case, we are not interested in the
exact value of their difference, only in whether it is zero or negative, or whether there was a carry or
overflow. The CMP ("Compare'') instruction performs this task; it subtracts the source from the destination
and adjusts the status flags accordingly, but throws away the result. This is exactly what is needed to get
conditions such as LE to work; after doing CMP AX, 10, for example, the status flags will be set in such a
way that the LE condition is true precisely when the value in AX (treated as a signed integer) is less than
or equal to 10.

The two-operand logical instructions are AND, OR, XOR, and TEST. The first three perform the expected
bitwise operations; for example, the nth bit of the destination after the AND operation will be 1 (set, true) if
the nth bit of both the source and the destination were 1 before the operation, otherwise it will be 0 (clear,
false). The TEST instruction is to AND as CMP is to SUB; it performs a bitwise and operation, but the result is
only reflected in the flags. For example, after the instruction TEST [321h], BYTE 12h, the Zero flag will
be set if neither bit 1 nor bit 4 (12h is 00010010 in binary, indicating that bits 1 and 4 are to be tested) of
the byte at address 321h were 1, otherwise it will be clear.

Multiplication and division are also binary operations, but the corresponding instructions on the 8086
only allow one of the operands to be specified (and it can only be a register or memory reference, not an
immediate value). The other operand is implicitly contained in the accumulator (and sometimes also the
DX register). The MUL and DIV instructions operate on unsigned numbers, while IMUL and IDIV operate on
two's-complement signed numbers. Here are the valid forms for MUL; the others are analogous:

MUL reg8
MUL BYTE mem8

MUL reg16
MUL WORD mem16
For 8-bit multiplication, the quantity in AL is multiplied by the given operand and
the 16-bit result is placed in AX. For 16-bit multiplication, the 32-bit product of AX
and the operand is split, with the low word in AX and the high word in DX. In both
cases, if the result spills into the high-order byte/word, then the Carry and Overflow
flags will be set, otherwise they will be clear. The other flags will have garbage in
them; in particular, you will not get correct information from the Zero or Sign flags
(if you want that information, follow the multiplication with CMP AX, 0, for
example).

For division, the process is reversed. An 8-bit operand will be divided into the number in AX, with the
quotient stored in AL and the remainder left in AH. A 16-bit operand will be divided into the 32-bit
quantity whose high word is in DX and whose low word is in AX; the quotient will be in AX and the
remainder will be in DX after the operation. None of the status flags are defined after a division. Also, if
the division results in an error (division by zero, or a quotient that is too large), the processor will trigger
interrupt zero (as if it had executed INT 0).

The CBW and CWD instructions, which take no operands, will sign-extend AL into AX or AX into DX,
respectively, just as needed before performing a signed division. For example, if AL contains 11010110,
then after CBW the AH register will contain 11111111 (and AL will be unchanged).
Multiplication and division by powers of two are frequently performed by shifting the bits to the left or
right. There are several varieties of shift and rotate instructions, all of which allow the following forms:

RCL reg8, 1
RCL reg8, CL
RCL BYTE mem8, 1
RCL BYTE mem8, CL

RCL reg16, 1
RCL reg16, CL
RCL WORD mem16, 1
RCL WORD mem16, CL
The second operand specifies how many bit positions the result should be shifted
by: either one or the number in the CL register. For example, the accumulator may
be multiplied by 2 with SHL AX, 1; if CL contains the number 4, the accumulator
may be multiplied by 16 with SHL AX, CL.

There are three shift instructions---SAR, SHR, and SHL. The "shift-left'' instruction, SHL, shifts the highest
bit of the operand into the Carry flag and fills in the lowest bit with zero. The "shift-right'' instruction,
SHR, does the opposite, moving zero in from the top and shifting the lowest bit out into the Carry; this is
appropriate for an unsigned division, with the Carry flag giving a 1-bit remainder. On the other hand, the
"shift-arithmetic-right'' instruction, SAR, leaves a copy of the highest bit in place as it shifts; this is
appropriate for a signed division, since it preserves the sign bit.

For example, -53 is represented in 8-bit two's-complement by the binary number 11001011. After a SHL
by one position, it will be 10010110, which represents -106. After a SAR, it will be 11100101, which
represents -27. After a SHR, it will be 01100101, which represents +101 in decimal; this corresponds to the
interpretation of the original bits as the unsigned number 203 (which yields 101 when divided by 2).

When shifting multiple words by one bit, the Carry can serve as the bridge from one word to the next. For
example, suppose we want to multiply the double word (4 bytes) starting at address 1230h by 2; the
instruction SHL WORD [1230], 1 will shift the low-order word, putting its highest bit into the Carry flag.
Now we need an instruction that will shift the Carry into the lowest bit of the word at 1232h; if we wanted
to continue the process, we would also need it to shift the highest bit of that word back out into the Carry.
The effect here is that the bits in the operand plus the Carry have been rotated one position to the left. The
desired instruction is RCL WORD [1232], 1 ("rotate-carry-left''). There is a corresponding "rotate-carry-
right'' instruction, RCR; there are also two rotate instructions which directly shift the highest bit down to
the lowest and vice versa, called ROL and ROR.

There are four unary arithmetic and logical instructions. The increment and decrement operations, INC
and DEC, add or subtract one from their operand; they do not affect the Carry bit. The negation instruction,
NEG, takes the two's-complement of its operand, while the NOT instruction takes the one's-complement
(flip each bit from 1 to 0 or 0 to 1). NEG affects all the usual flags, but NOT does not affect any of them.
The valid forms of operand are the same for all of these instructions; here are the forms for INC:

INC reg8
INC BYTE mem8

INC reg16
INC WORD mem16

String Instructions

The string instructions facilitate operations on sequences of bytes or words. None of them take an explicit
operand; instead, they all work implicitly on the source and/or destination strings. The current element
(byte or word) of the source string is at DS:SI, and the current element of the destination string is at
ES:DI. Each instruction works on one element and then automatically adjusts SI and/or DI; if the
Direction flag is clear, then the index is incremented, otherwise it is decremented (when working with
overlapping strings it is sometimes necessary to work from back to front, but usually you should leave the
Direction flag clear and work on strings from front to back).

To work on an entire string at a time, each string instruction can be accompanied by a repeat prefix, either
REP or one of REPE and REPNE (or their synonyms REPZ and REPNZ). These cause the instruction to be
repeated the number of times in the count register, CX; for REPE and REPNE, the Zero flag is tested at the
end of each operation and the loop is stopped if the condition (Equal or Not Equal to zero) fails.

The MOVSB and MOVSW instructions have the following forms:

MOVSB
REP MOVSB

MOVSW
REP MOVSW
The first form copies a single byte from the source string, at address DS:SI, to the
destination string, at address ES:DI, then increments (or decrements, if the
Direction flag is set) both SI and DI. The second form performs this operation and
then decrements CX; if CX is not zero, the operation is repeated. The effect is
equivalent to the following pseudo-C code:

while (CX != 0) {
*(ES*16 + DI) = *(DS*16 + SI);
SI++;
DI++;
CX--;
}
(recall that ES*16 + DI is the physical address corresponding to the segment and
offset ES:DI). The remaining two forms move a word at a time, instead of a single
byte; correspondingly, SI and DI are incremented or decremented by 2 each time
through the loop.

The STOSB and STOSW instructions are similar to MOVSB and MOVSW, except the source byte or word comes
from AL or AX instead of the memory address in DS:SI. For example, the following is a very fast way to
initialize the block of memory from ES:1000h to ES:4FFFh with zeroes:

MOV DI, 1000h ;Starting address


MOV CX, 2000h ;Number of words
MOV AX, 0 ;Word to store at each location
CLD ;Make sure direction is increasing
REP STOSW ;Perform the initialization
Correspondingly, the LODSB and LODSW instructions are variations on the move
instructions where the destination is the accumulator (instead of the memory
address in ES:DI). These are not very useful operations with the repeat prefix;
instead, they are used as part of larger loops to perform more complex string
processing. For example, here is a program fragment that will convert the NUL-
terminated string starting at the address in DX to be all lower-case (there is a faster
way to do the conversion of each character, using the XLATB instruction, but that is
not the point here):

MOV SI, DX ;Initialize source


MOV DI, DX ; and destination indices
MOV AX, DS ;Copy DS (source segment)
MOV ES, AX ; into ES (destination segment)
CLD
NextCh LODSB ;Load next character into AL
CMP AL, 'A'
JB NotUC ;Jump if below 'A'
CMP AL, 'Z'
JA NotUC ; or above 'Z'
ADD AL, 'a' - 'A' ;Convert UC to lc
NotUC STOSB ;Store modified character back
CMP AL, 0
JNE NextCh ;Do next character if not at end of string
None of the preceding string operations have any effect on the status flags. By
contrast, the remaining two string operations are executed solely for their effect on
the status flags, just like the CMP operation on numbers. The CMPSB and CMPSW
operations compare the current bytes or words of the source and destination strings
by subtracting the destination from the source and recording the properties of the
result in FLAGS. The SCASB and SCASW operations are the variants of this that use
the accumulator (AL or AX) for the source. Each of these may be preceded by either
of the repeat prefixes REPE or REPNE, which cause the operation to be repeated up
to CX times, as long as the condition holds true after each iteration. Here is the
corresponding pseudo-C for REPE CMPSB:

while (CX != 0) {
SetFlags(*(DS*16 + SI) - *(ES*16 + DI));
SI++;
DI++;
CX--;
if (!ZeroFlag) break;
}
A common use of the REPNE SCASB instruction is to find the length of a NUL-
terminated string. Here is an example:

MOV DI, DX ;Starting address in DX (assume ES = DS)


MOV AL, 0 ;Byte to search for (NUL)
MOV CX, -1 ;Start count at FFFFh
CLD ;Increment DI after each character
REPNE SCASB ;Scan string for NUL, decrementing CX for each char
MOV AX, -2 ;CX will be -2 for length 0, -3 for length 1, ...
SUB AX, CX ;Length in AX

Program Flow Instructions

All of the previous instructions execute sequentially; that is, when one instruction
finishes, the next instruction is taken from the very next memory location. This is
the default operation for the instruction pointer, IP---after each byte of instruction is
fetched, the IP is incremented in preparation for the next fetch. The program flow
instructions provide the facilities to modify the course of execution, allowing
conditional execution (by jumping over parts of the code if certain conditions are
met) and looping (by jumping backwards in the code).

The unconditional jump instruction, JMP, causes IP (and sometimes CS) to be modified so that the next
instruction is fetched from the location given in the operand (the target). Here are the valid forms:

JMP SHORT imm8


JMP imm16
JMP imm16:imm16
JMP r/m16
JMP FAR mem32
The short version saves space when the target of the jump is within a few dozen
instructions forward or backward; the assembler computes the difference between
the new address and the next address sequentially, and just stores this difference
as one (signed) byte. The second (and most common) version allows a jump to any
location in the current code segment, while the third allows a jump to any location
in memory by also specifying an immediate value to be loaded into CS. The fourth
version will take the target address from a register or memory location; since this
address is only 16 bits, the target has to be within the segment. Finally, the far
version fetches both the offset and the segment from four consecutive bytes in
memory (compare to the LDS and LES instructions; JMP FAR mem32 could have been
called "LCS IP, mem32'').

The conditional jump instructions, Jcc, where cc is one of the condition codes listed earlier (E, NE, ...),
perform a short jump if the condition is true, based on the current contents of the status flags. For
example, the code sample that was given in the discussion of LODSB, to convert a string to lower-case,
used the JA and JB instructions; these made their jump if the result of the previous comparison found that
the current character was above 'Z' or below 'A'. Since a conditional jump can only be to a nearby target,
it is sometimes necessary to combine conditional and unconditional jumps as follows:

JNLE NoJLE
JMP target
NoJLE:
This will have the same effect as JLE target, except there is no restriction on how
far away the target may be (within the code segment).

There are two specialized versions of conditional jump that are particularly useful when executing a loop
a fixed number of times. The looping statements

LOOP imm8
LOOPE imm8
LOOPNE imm8
(as usual, the synonyms LOOPZ and LOOPNZ are also available) are very similar to
the REP, REPE, and REPNE prefixes from the string instructions. The LOOP instruction
decrements CX and makes a short jump if the count has not reached zero. The
LOOPE instruction adds the condition that it will only take the jump if the Zero flag is
set (usually indicating that the last comparison had equal operands); the LOOPNE will
only take the jump if the Zero flag is clear. The string operation REP MOVSB, for
example, could have been performed with

Repeat MOVSB
LOOP Repeat
(except this would have been considerably slower, since it requires repeatedly
fetching and decoding the two instructions instead of just fetching and decoding the
single REP MOVSB instruction once).

After looping or repetitive string operations, it is occasionally necessary to test whether the count register
reached zero (to check whether the loop ran for the full count or whether it exited early because the Zero
flag changed). The instruction

JCXZ imm8
serves exactly this purpose; it takes a short jump if the CX register contains zero. It
is short for performing CMP CX, 0 followed by JZ imm8.

All of the above branching instructions are variations on the infamous GOTO statement; they cause a
permanent change in the course of execution. To perform an operation more like a function or subroutine
call, where the flow of control will eventually return to pick up with the next instruction, the 8086
provides two mechanisms: CALL/RET and INT/IRET.

The CALL instruction offers a similar range of addressing modes to the JMP instruction, except there is no
"short'' call:

CALL imm16
CALL imm16:imm16
CALL r/m16
CALL FAR mem32
A call is the same as a jump, except the instruction pointer is first pushed onto the
stack (in the second and fourth versions, which include a new segment, the current
CS register is also pushed).
To reverse the effect of a CALL, when the subroutine is done it should execute a RET or RETF instruction;
this pops the return address off of the stack and back into IP (and RETF also pops the saved value of CS, to
return from a far call). After the return, the next instruction that will be fetched will be from the next
location after the CALL. There is an optional 16-bit immediate operand that may be specified with a return
instruction; this value is added to the stack pointer after popping off the return address, to recover
however many bytes had been pushed onto the stack with parameters before the call. For example, here is
one way to implement a subroutine to print a character, where the calling code first pushes the character
(as the low byte of a word, since there is no option to push a single byte) before making the call:

PutChar PUSH BP ;Save current values of registers that we'll modify


PUSH AX
PUSH DX
MOV BP, SP ;Copy stack pointer to BP
MOV AH, 2 ;DOS function code for printing a character
MOV DL, [BP + 8] ;Fetch character parameter from stack
;Stack contains (from tos) DX, AX, BP, return address, and parameter
INT 21h ;Call DOS function
POP DX ;Restore modified registers
POP AX
POP BP
RET 2 ;Return and pop 2 byte parameter
For completeness, here is what a typical call might look like (in fact, this is a
complete routine to print a NUL-terminated string, assuming that the string starts at
DS:SI):

NextCh LODSB ;Load next character into AL


CMP AL, 0
JE Done ;Quit if NUL
PUSH AX ;Set up parameter for call
CALL PutChar
JMP NextCh ;Continue with next character
Done:
This is just one of several common conventions for passing parameters to
subroutines; even more common is to just specify that, for example, the character
will be passed directly in the DL register.

Interrupt instruction

The other function-call-like mechanism is the interrupt. We have been using this all
along to call the standard DOS services, such as printing a character or a '$'-
terminated string. The INT instruction behaves much like the CALL FAR instruction
except for two things: it pushes the FLAGS register before pushing CS and IP (the
idea is that an interrupt should be able to completely restore the state of the
processor when it is finished, since this is also the mechanism used for handling
hardware interrupts from the rest of the system---they can happen at any time,
independent of what the processor might be working on, and they should occur as
transparently to the current process as possible), and it gets the target address
from a standard table of interrupt handler vectors kept at the bottom of memory.
When the processor executes INT n, where n is an 8-bit immediate value, it fetches
a far pointer (that is, a 4-byte combination of segment and offset) from the memory
address 0000:4n; this is the target address for the interrupt call. For example, the
address of the DOS interrupt handler, the routine called when INT 21h is executed,
is stored at locations 0000:0084 through 0000:0087; the first two bytes give the
offset, to load into IP, and the second two bytes give the segment, to load into CS.

To return from an interrupt handler, the IRET instruction is used. It pops the IP, CS, and FLAGS registers,
which causes the state of the machine to return to where it left off when the interrupt occurred.