Sie sind auf Seite 1von 74

DEPARTMENT OF INFORMATION TECHNOLOGY

ODD SEMESTER-2014
QUESTION BANK
SUBJECT CODE/ NAME: CS 6303/COMPUTER ARCHITECTURE
STAFF NAME: Mrs.S.Muthumariammal

YEAR/SEMESTER: II/III

PART-A-TWO MARK QUESTIONS AND ANSWERS


UNIT-I OVERVIEW & INSTRUCTIONS
1.Define Computer Architecture.

It is concerned with the structure and behavior of the computer.

It includes the information formats, the instruction set and techniques for addressing memory.

2.Define Computer Organization.

It describes the function and design of the various units of digital computer that store and process
information.

It refers to the operational units and their interconnections that realize the architectural
specifications.

3.What are the components of a computer.

Input unit

Memory unit

Arithmetic and Logic Unit

Output unit

Control unit

1
http://www.francisxavier.ac.in

4.Draw the block diagram of computer.

5. What is Execution time/Response time?


Response time also called execution time. The total time required for the computer to complete a
task, including disk accesses, memory accesses, I/O activities, operating system overhead, CPU
execution time, and so on.
6. What is CPU execution time, user CPU time and system CPU time?
CPU time
: The actual time the CPU spends computing for a specific task.
user CPU time: The CPU time spent in a program itself.
system CPU time: The CPU time spent in the operating system performing tasks on behalf
the program.
7.What is clock cycle and clock period?
clock cycle :The time for one clock period, usually of the processor clock, which runs at a
constant rate.
clock period :The length of each clock cycle.
6. Define CPI
The term Clock Cycles Per Instruction Which is the average number of clock cycles each
2
http://www.francisxavier.ac.in

instruction takes to execute, is often abbreviated as CPI.


7.State and explain the performance equation?
N denotes number of machine Instructions, Suppose that the average number of basic steps needed
to execute one machine instruction is S, where each basic step is completed in one clock cycle. If the
clock cycle rate is R cycles per second, the processor time is given by
T = (N x S) / R

This is often referred to as the basic performance equation.

8. Define MIPS .
MIPS:One alternative to time as the metric is MIPS(Million Instruction Per Second)
MIPS=Instruction count/(Execution time x1000000).This MIPS measurement is also called Native MIPS
to distinguish it from some alternative definitions of MIPS.
9.Define MIPS Rate:
The rate at which the instructions are executed at a given time.
10.Define Throughput and Throughput rate.

Throughput -The total amount of work done in a given time.

Throughput rate-The rate at which the total amount of work done at a given time.

11.What are the various types of operations required for instructions?


Data transfers between the main memory and the CPU registers
Arithmetic and logic operation on data
Program sequencing and control
I/O transfers
12. What is a Program?
A program is a set of instructions that specify the operations, operands and the sequence by
which processing has to occur.
13. What is a Computer Instruction?
A Computer instruction is a binary code that specifies a sequence of micro operations for
the computer.
3
http://www.francisxavier.ac.in

14. What is a Instruction Code?


An instruction code is a group of bits that instruct the computer to perform a specific
operation.
15. What is a Operation Code (Opcode)?
The operation code of an instruction is a group of bits that define operations as add,
subtract, multiply, shift and complement etc.

16. Define Instruction Format.


Instructions are represented as numbers .Therefore, entire programs can be stored in
memory to be read or written just like numbers (data).Thus simplifies software/Hardware of computer
systems. Each instruction is encoded in binary called machine code.

17.What are the Most Common Fields Of An Instruction Format?

An operation code field that specifies the operation to be performed.

An address field that designates, a memory address or register.

A mode field that specifies the way the operand or the effective address is
determined

18.Explain the following the address instruction?


Three-address instruction-it can be represented as
ADD A, B, C
Operands a,b are called source operand and c is called destination operand.
Two-address instruction-it can be represented as
ADD A, B
One address instruction-it can be represented as
LOAD A
ADD B
STORE C
19. What is the straight-line sequencing?

4
http://www.francisxavier.ac.in

The CPU control circuitry automatically proceed to fetch and execute instruction, one at a time in
the order of the increasing addresses. This is called straight line sequencing.
20.Wrie down the MIPS Assembly language notation for arithmetic operations.
Categor Instruction
y
add
Arithme subtract
tic
add
immediate

Example
Meanin
g + $s3
add
$s1 = $s2
sub
$s1,$s2,$s $s1 = $s2 $s3
addi
$s1 = $s2 + 20
$s1,$s2,$s
3
$s1,$s2,20
3

Commen
ts
Three register
operands
Three register operands
Used to add constants

21.Wrie down the MIPS Assembly language notation for data transfer operations.
Categor Instruction
Example
y
load word
lw
store word
sw
$s1,20($s2
load half
lh
$s1,20($s2
)
load
half lhu
$s1,20($s2
)
store
half
sh
unsigned
$s1,20($s2
)
load
byte
lb
$s1,20($s2
)
Data
load
byte lbu
$s1,20($s2
)
trans
store
byte
sb
unsigned
$s1,20($s2
)
fer
load
linked ll
$s1,20($s2
)
store
sc
word
$s1,20($s2
)
load
upper
lui
condition.
$s1,20($s2
)
immed.
$s1,20
word
)

Meanin
Commen
g
ts memor y to
$s1 = Memor
y[$s2 + Word from
Word
from
register to
Memor
y[$s2
+
20]
=
register
20]
$s1 = Memor y[$s2 + Halfword
memor y memor y to
$s1
register memor y to
20] = Memor y[$s2 + Halfword
register
to
Memor
y[$s2 + 20] = Halfword
register
20]
memor y to
$s1 = Memor y[$s2 + Byte
memorfrom
y
from memor y to
$s1
register
20] = Memor y[$s2 + Byte
from register to
Memor
y[$s2 + 20] = Byte
register
20]
$s1 = Memor y[$s2 + Load
memorword
y as 1st half of
Store
Memory[$s2+20]=$s
atomicword
swapas 2nd half of
20]
Loads
in upper
atomic constant
swap
1;$s1=0
1
$s1
= 20 or
* 216
16 bits

22.Wrie down the MIPS Assembly language notation for Logical operations.
Categor Instruction
y
and
or
nor
and
immediate

Example
and
or
$s1,$s2,$s
nor
$s1,$s2,$s
3
andi
$s1,$s2,$s
3
$s1,$s2,20
3

$s1
$s1
$s1
$s1

Meanin
g & $s3
= $s2
= $s2 | $s3
= ~ ($s2 | $s3)
= $s2 & 20
5

Comme
nts operands;
Three reg.
Three
operands;
by-bit reg.
AND
Three
by-bit reg.
OR operands;
Bit-by-bit
by-bit NOR AND
with constant

bitbitbitreg

http://www.francisxavier.ac.in

Logical or immediate ori


Bit-by-bit OR reg with
$s1 = $s2 | 20
shift
left sll
Shift
left by constant
constant
$s1,$s2,2 $s1 = $s2 << 10
logical
$s1,$s2,1
shift
right srl
Shift right by constant
$s1 = $s2 >> 10
0
0
logical
$s1,$s2,1
0
23.Wrie down the MIPS Assembly language notation for conditional branch operations.
Categor
Instruction
Example
y
branch
on beq
equal
$s1,$s2,25
branch on not bne
equal
$s1,$s2,25
set on less slt
$s1,$s2,$s3
Conditi than
set
on
less
sltu
onal
$s1,$s2,$s3
branch than
set
less
slti
unsigned
than
$s1,$s2,20
set
less
than
sltiu
immediat
immediate
$s1,$s2,20
e
unsigned

Meanin
Comment
g $s2) go to Equal test;
s
PC-relative
if ($s1 ==
branch
PC + 4 + 100
Not equal test;
PCif ($s1!= $s2) go to
relative
PC + 4 + 100
if ($s2 < $s3) $s1 = Compare less than; for
beq, bne
1;
Compare
less
than
if
($s2
<
$s3)
$s1
=
else $s1 = 0
unsigned
1;
less
than
if
($s2
else
$s1< =20)
0 $s1 = 1; Compare
constant
else $s1 = 0
if ($s2 < 20) $s1 = 1; Compare less than
constant unsigned
else $s1 = 0

24.Wrie down the MIPS Assembly language notation for Unconditional branch operations.
Categor
y

Instruction
jump

Example
j

2500

Uncondi jump register jr


tional
jump and link jal
jump

$ra
2500

Meanin
g
go to 10000

Commen
ts
Jump to target address

For switch, procedure


For procedure call
$ra = PC + 4; go to return
10000
go to $ra

25. What is Addressing Modes?


The different ways in which the location of an operand is specified in an instruction is called as
Addressing mode.
26.What are the different types of addressing Modes?
6
http://www.francisxavier.ac.in

Immediate mode
Register mode
Absolute mode
Indirect mode
Index mode
Base with index
Base with index and offset
Relative mode
Auto-increment mode
Auto-decrement mode

27.Define Register mode and Absolute Mode with examples.


Register mode
The operand is the contents of the processor register.
The name (address) of the register is given in the instruction.
Absolute Mode(Direct Mode):
The operand is in new location.
The address of this location is given explicitly in the instruction.
Eg: MOVE LOC,R2
The above instruction uses the register and absolute mode.
The processor register is the temporary storage where the data in the register are accessed using register
mode.
The absolute mode can represent global variables in the program.

Mode
Register mode
Absolute mode

Assembler Syntax

Addressing Function

Ri
LOC

EA=Ri
EA=LOC

Where EA-Effective Address


28.What is a Immediate addressing Mode?
The operand is given explicitly in the instruction.
Eg: Move 200 immediate ,R0
7
http://www.francisxavier.ac.in

It places the value 200 in the register R0.The immediate mode used to specify the value of source
operand.
In assembly language, the immediate subscript is not appropriate so # symbol is used. It can be rewritten as
Move #200,R0
Assembly Syntax:

Addressing Function

Immediate #value

Operand =value

29.Define Indirect addressing Mode.


The effective address of the operand is the contents of a register .
We denote the indirection by the name of the register or new address given in the instruction.
Fig:Indirect Mode
Address of an operand(B) is stored into R1 register.If we want this operand,we can get it through
register R1(indirection).
The register or new location that contains the address of an operand is called the pointer.
Mode
Indirect

Assembler Syntax

Addressing Function

Ri , LOC

EA=[Ri] or EA=[LOC]

30.Define Index addressing Mode.


The effective address of an operand is generated by adding a constant value to the contents of a
register.
The constant value uses either special purpose or general purpose register. We
indicate the index mode symbolically as,

X(Ri)
Where X denotes the constant value contained in the instruction
Ri It is the name of the register involved.
The Effective Address of the operand is,
EA=X + [Ri]
8
http://www.francisxavier.ac.in

The index register R1 contains the address of a new location and the value of X defines an
offset(also called a displacement).
To find operand,
First go to Reg R1 (using address)-read the content from R1-1000
Add the content 1000 with offset 20 get the result.
1000+20=1020
Here the constant X refers to the new address and the contents of index register define the
offset to the operand.
The sum of two values is given explicitly in the instruction and the other is stored in register.
Eg: Add 20(R1) , R2 (or) EA=>1000+20=1020
Index Mode
Assembler Syntax

Addressing Function

Index
Base with Index
Base with Index and offset

EA=[Ri]+X
EA=[Ri]+[Rj]
EA=[Ri]+[Rj] +X

X(Ri)
(Ri,Rj)
X(Ri,Rj)

31.What is a Relative Addressing mode?


It is same as index mode. The difference is, instead of general purpose register, here we can use
program counter(PC).
Relative Mode:
The Effective Address is determined by the Index mode using the PC in place of the general
purpose register (gpr).
This mode can be used to access the data operand. But its most common use is to specify the
target address in branch instruction.Eg. Branch>0 Loop
It causes the program execution to goto the branch target location. It is identified by the
name loop if the branch condition is satisfied.
Mode

Assembler Syntax

Relative

X(PC)

Addressing Function
EA=[PC]+X

32.Define Auto-increment addressing mode.


The Effective Address of the operand is the contents of a register in the instruction.
After accessing the operand, the contents of this register is automatically incremented
9
http://www.francisxavier.ac.in

to point to the next item in the list.


Mode

Assembler syntax

Auto-increment

(Ri)+

Addressing Function
EA=[Ri]; Increment Ri

33.Define Auto-decrement addressing mode.


The Effective Address of the operand is the contents of a register in the instruction.
After accessing the operand, the contents of this register is automatically decremented
to point to the next item in the list.
Mode
Auto-decrement

Assembler Syntax
-(Ri)

Addressing Function
EA=[Ri]; Decrement Ri

34. Write the formula for CPU execution time for a program.

35. Write the formula for CPU clock cycles required for a program.

36. Define MIPS


Million Instructions Per Second (MIPS) is a measurement of program execution speed based on
the number of millions of instructions.
MIPS is computed as:

37. What are the eight great ideas in computer architecture?


10
http://www.francisxavier.ac.in

The eight great ideas in computer architecture are:


1. Design for Moores Law
2. Use Abstraction to Simplify Design
3. Make the Common Case Fast
4. Performance via Parallelism
5. Performance via Pipelining
6. Performance via Prediction
7. Hierarchy of Memories
8. Dependability via Redundancy
38. What are the five classic components of a computer?
The five classic components of a computer are input, output, memory, datapath, and control, with the last
two sometimes combined and called the processor.
39. Define ISA
The instruction set architecture, or simply architecture of a computer is the interface between the
hardware and the lowest-level software. It includes anything programmers need to know to make a binary
machine language program work correctly, including instructions, I/O devices, and so on.
40. Define ABI
Typically, the operating system will encapsulate the details of doing I/O, allocating memory, and other
low-level system functions so that application programmers do not need to worry about such details. The
combination of the basic instruction set and the operating system interface provided for application
programmers is called the application binary interface (ABI).
41. Define Moores Law
Moores Law has provided so much more in resources that hardware designers can now build much faster
multiplication and division hardware. Whether the multiplicand is to be added or not is known at the
beginning of the multiplication by looking at each of the 32 multiplier bits.

11
http://www.francisxavier.ac.in

UNIT-2 ARITHMETIC OPERATIONS

1. State the principle of operation of a carry look-ahead adder.


The input carry needed by a stage is directly computed from carry signals obtained from all the
preceding stages i-1,i-2,..0, rather than waiting for normal carries to supply slowly from stage to
stage. An adder that uses this principle is called carry look-ahead adder.
2. What are the main features of Booths algorithm?
1) It handles both positive and negative multipliers uniformly.
2) It achieves some efficiency in the number of addition required when the multiplier has a few
large blocks of 1s.
3. How can we speed up the multiplication process?(CSE Nov/Dec 2003)
There are two techniques to speed up the multiplication process:
1) The first technique guarantees that the maximum number of summands that must be added is n/2
for n-bit operands.
2) The second technique reduces the time needed to add the summands.
4. What is bit pair recoding? Give an example.
Bit pair recoding halves the maximum number of summands. Group the Booth-recoded
multiplier bits in pairs and observe the following: The pair (+1 -1) is equivalent to the pair (0 +1).
That is instead of adding -1 times the multiplicand m at shift position i to +1 ( M at position i+1, the
same result is obtained by adding +1 ( M at position i.
Eg: 11010 Bit Pair recoding value is 0 -1 -2
5. What is the advantage of using Booth algorithm?
1) It handles both positive and negative multiplier uniformly.
2) It achieves efficiency in the number of additions required when the multiplier has
a few large blocks of 1s.
3) The speed gained by skipping 1s depends on the data.

12
http://www.francisxavier.ac.in

6. Write the algorithm for restoring division.


Do the following for n times:
1) Shift A and Q left one binary position.

2) Subtract M and A and place the answer back in A.


3) If the sign of A is 1, set q0 to 0 and add M back to A.
Where A- Accumulator, M- Divisor, Q- Dividend.
7. Write the algorithm for non restoring division.
Do the following for n times:
Step 1: Do the following for n times:
1) If the sign of A is 0 , shift A and Q left one bit position and subtract M from A; otherwise , shift
A and Q left and add M to A.
2) Now, if the sign of A is 0,set q0 to 1;otherwise , set q0 to0.
Step 2: if the sign of A is 1, add M to A.
8. When can you say that a number is normalized?
When the decimal point is placed to the right of the first (nonzero) significant digit, the number is
said to be normalized.
9. Explain about the special values in floating point numbers.
The end values 0 to 255 of the excess-127 exponent E( are used to represent special values such
as:
When E(= 0 and the mantissa fraction M is zero the value exact 0 is represented.
When E(= 255 and M=0, the value ( is represented.
When E(= 0 and M (0 , denormal values are represented.
When E(= 2555 and M(0, the value represented is called Not a number.
10. Write the Add/subtract rule for floating point numbers.
1) Choose the number with the smaller exponent and shift its mantissa right a number of steps equal to
the difference in exponents.
2) Set the exponent of the result equal to the larger exponent.
13
http://www.francisxavier.ac.in

3) Perform addition/subtraction on the mantissa and determine the sign of the result
4) Normalize the resulting value, if necessary.
11. Write the multiply rule for floating point numbers.
1) Add the exponent and subtract 127.
2) Multiply the mantissa and determine the sign of the result .
3) Normalize the resulting value , if necessary.

12. What is the purpose of guard bits used in floating point arithmetic
Although the mantissa of initial operands are limited to 24 bits, it is important to retain extra bits,
called as guard bits.
13. What are the ways to truncate the guard bits?
There are several ways to truncate the guard bits:
1) Chooping
2) Von Neumann rounding
3) Rounding
14. Define carry save addition(CSA) process.
Instead of letting the carries ripple along the rows, they can be saved and introduced into the next
roe at the correct weighted position. Delay in CSA is less than delay through the ripple carry adder.
15. What are generate and propagate function?
The generate function is given by
Gi=xiyi

and

The propagate function is given as


Pi=xi+yi.
16. What is floating point numbers?
In some cases, the binary point is variable and is automatically adjusted as computation proceeds.
In such case, the binary point is said to float and the numbers are called floating point numbers.
17. In floating point numbers when so you say that an underflow or overflow has occurred?

14
http://www.francisxavier.ac.in

In single precision numbers when an exponent is less than -126 then we say that an underflow has
occurred. In single precision numbers when an exponent is less than +127 then we say that an
overflow has occurred.
18. What are the difficulties faced when we use floating point arithmetic?
Mantissa overflow: The addition of two mantissas of the same sign may result in a carryout of the
most significant bit
Mantissa underflow: In the process of aligning mantissas ,digits may flow off the right end of the
mantissa.
Exponent overflow: Exponent overflow occurs when a positive exponent exceeds the maximum
possible value.
Exponent underflow: It occurs when a negative exponent exceeds the maximum possible
exponent value.

19.In conforming to the IEEE standard mention any four situations under which a processor sets
exception flag.
Underflow: If the number requires an exponent less than -126 or in a double precision, if the
number requires an exponent less than -1022 to represent its normalized form the underflow occurs.
Overflow: In a single precision, if the number requires an exponent greater than -127 or in a
double precision, if the number requires an exponent greater than +1023 to represent its normalized form
the underflow occurs.
Divide by zero: It occurs when any number is divided by zero.
Invalid: It occurs if operations such as 0/0 are attempted.
20. Why floating point number is more difficult to represent and process than integer?(CSE
May/June 2007)
An integer value requires only half the memory space as an equivalent.IEEEdouble-precision
floatingpoint value. Applications that use only integer based arithmetic will therefore also have
significantly smaller memory requirement
A floating-point operation usually runs hundreds of times slower than an equivalent integer based
arithmetic operation.
21.Give the booths recoding and bit-pair recoding of the computer.
1000111101000101(CSE May/June 2006)
Booths recoding
1

15
http://www.francisxavier.ac.in

-1

+1

-1 +1 -1

+1 -1

+1 -1

Bit-Pair recoding:

-2

+1

-1

+1

+1

22.Draw the full adder circuit and give the truth table (CSE May/June 2007)
Inputs
A
0
0
0
0
1
1
1
1

B
0
0
1
1
0
0
1
1

C
0
1
0
1
0
1
0
1

Outputs
Carry
0
0
0
1
0
1
1
1

Sum
0
1
1
0
1
0
0
1

23. Add 610 to 710 in binary and Subtract 610 from 710 in binary.
Addition,

Subtraction directly,

16
http://www.francisxavier.ac.in

Or via twos complement of -6,

24. Write the overflow conditions for addition and subtraction.


Operation
Operand A
Operand B
Result
Indicating
overflow
A+B
0
0
<0
A+B
<0
<0
0
A-B
0
<0
<0
A-B
<0
0
0

25. Write the IEEE 754 floating point format.


The IEEE 754 standard floating point representation is almost always an approximation of the real
number.

26. What is meant by sub-word parallelism?


Given that the parallelism occurs within a wide word, the extensions are classified as sub-word
parallelism. It is also classified under the more general name of data level parallelism. They have been
also called vector or SIMD, for single instruction, multiple data . The rising popularity of multimedia
applications led to arithmetic instructions that support narrower operations that can easily operate in
parallel. 7
For example, ARM added more than 100 instructions in the NEON multimedia instruction extension to
support sub-word parallelism, which can be used either with ARMv7 or ARMv8.
27. Multiply 100010 * 100110.

28. Divide 1,001,010ten by 1000ten.


17
http://www.francisxavier.ac.in

UNIT-3
PROCESSOR AND CONTROL UNIT
1.Define pipelining.
Pipelining is a technique of decomposing a sequential process into sub operations with each sub
process being executed in a special dedicated segment that operates concurrently with all other segments.
2.Define parallel processing.
Parallel processing is a term used to denote a large class of techniques that are used to provide
simultaneous data-processing tasks for the purpose of increasing the computational speed of a computer
system. Instead of processing each instruction sequentially as in a conventional computer, a parallel
processing system is able to perform concurrent data
18
http://www.francisxavier.ac.in

processing to achieve faster execution time.


3.Define instruction pipeline.
The transfer of instructions through various stages of the CPU instruction cycle.,including fetch
opcode, decode opcode, compute operand addresses. Fetch operands, execute Instructions and store
results. This amounts to realizing most (or) all of the CPU in the form of multifunction pipeline called an
instruction pipelining.
4. What are the steps required for a pipelinened processor to process the instruction?
F Fetch: read the instruction from the memory
D Decode: decode the instruction and fetch the source operand(s).
E Execute: perform the operation specified by the instruction.
W Write: store the result in the destination location
5. What are Hazards?
A hazard is also called as hurdle .The situation that prevents the next instruction in the instruction
stream from executing during its designated Clock cycle. Stall is introduced by hazard. (Ideal stage)
6. State different types of hazards that can occur in pipeline.
The types of hazards that can occur in the pipelining were,
1. Data hazards.
2. Instruction hazards.
3. Structural hazards.

7. Define Data hazards


A data hazard is any condition in which either the source or the destination operands of
an instruction are not available at the time expected in pipeline. As a result some operation has
to be delayed, and the pipeline stalls.
8. Define Instruction hazards
The pipeline may be stalled because of a delay in the availability of an instruction. For
19
http://www.francisxavier.ac.in

example, this may be a result of miss in cache, requiring the instruction to be fetched from the
main memory. Such hazards are called as Instruction hazards or Control hazards.
9.Define Structural hazards?
The structural hazards is the situation when two instructions require the use of a given
hardware resource at the same time. The most common case in which this hazard may arise is
access to memory.
10. What are the classification of data hazards?
Classification of data hazard: A pair of instructions can produce data hazard by referring
reading or writing the same memory location. Assume that i is executed before J. So, the hazards
can be classified as,
1. RAW hazard
2. WAW hazard
3. WAR hazard
11.Define RAW hazard : ( read after write)
Instruction j tries to read a source operand before instruction i writes it.
12. Define WAW hazard :( write after write)
Instruction j tries to write a source operand before instruction i writes it.
13.Define WAR hazard :( write after read)
Instruction j tries to write a source operand before instruction i reads it.
14. How data hazard can be prevented in pipelining?
Data hazards in the instruction pipelining can prevented by the following techniques.
a)Operand Forwarding
b)Software Approach

15.How Compiler is used in Pipelining?


A compiler translates a high level language program into a sequence of machine instructions. To
reduce N, we need to have suitable machine instruction set and a compiler that makes good use of it. An
optimizing compiler takes advantages of various features of the target processor to reduce the product
N*S, which is the total number of clock cycles needed to execute a program. The number of cycles is
20
http://www.francisxavier.ac.in

dependent not only on the choice of instruction, but also on the order in which they appear in the program.
The compiler may rearrange program instruction to achieve better performance of course, such changes
must not affect of the result of the computation.
16. How addressing modes affect the instruction pipelining?
Degradation of performance is an instruction pipeline may be due to address dependency where
operand address cannot be calculated without available informatition needed by addressing mode for e.g.
An instructions with register indirect mode cannot proceed to fetch the operand if the previous
instructions is loading the address into the register. Hence operand access is delayed degrading the
performance of pipeline.
17. What is locality of reference?
Many instruction in localized area of the program are executed repeatedly during some time
period and the remainder of the program is accessed relatively infrequently .this is referred as locality of
reference.
18. What is the need for reduced instruction chip?
Relatively few instruction types and addressing modes.
Fixed and easily decoded instruction formats.
Fast single-cycle instruction execution.
Hardwired rather than micro programmed control
19. Define memory access time?
The time that elapses between the initiation of an operation and completion of that operation ,for
example ,the time between the READ and the MFC signals .This is Referred to as memory access time.
20. Define memory cycle time.
The minimum time delay required between the initiations of two successive memory operations,
for example, the time between two successive READ operations.
21.Define Static Memories.
Memories that consist of circuits capable of retaining the state as long as power is applied are
known as static memories.
22. List out Various branching technique used in micro program control unit?
a) Bit-Oring
b) Using Conditional Variable
c) Wide Branch Addressing

21
http://www.francisxavier.ac.in

23. How the interrupt is handled during exception?


* CPU identifies source of interrupt
* CPU obtains memory address of interrupt handles
* pc and other CPU status information are saved
* Pc is loaded with address of interrupt handler and handling program to handle it.
24. List out the methods used to improve system performance.
The methods used to improve system performance are
1. Processor clock
2.Basic Performance Equation
3.Pipelining
4.Clock rate
5.Instruction set
6.Compiler
25. What is meant by data path element?
A data path element is a unit used to operate on or hold data within a processor. In the MIPS
implementation, the data path elements include the instruction and data memories, the register file, the
ALU, and adders.
26. What is the use of PC register?
Program Counter (PC) is the register containing the address of the instruction in the program being
executed.
27. What is meant by register file?
The processors 32 general-purpose registers are stored in a structure called a register file. A
register file is a collection of registers in which any register can be read or written by specifying the
number of the register in the file. The register file contains the register state of the computer.
28. What is meant by pipelining?
Pipelining is an implementation technique in which multiple instructions are overlapped in execution.
Pipelining improves performance by increasing instruction throughput, as opposed to decreasing the
execution time of an individual instruction.
29. What are the five steps in MIPS instruction execution?
22
http://www.francisxavier.ac.in

1. Fetch instruction from memory.


2. Read registers while decoding the instruction. The regular format of MIPS

instructions allows reading and decoding to occur simultaneously.


3. Execute the operation or calculate an address.
4. Access an operand in data memory.
5. Write the result into a register.
30. Write the formula for calculating time between instructions in a pipelined processor.

31. What is pipeline stall?


Pipeline stall, also called bubble, is a stall initiated in order to resolve a hazard. They can be seen
elsewhere in the pipeline.
32. What is meant by branch prediction?
Branch prediction is a method of resolving a branch hazard that assumes a given outcome for the branch
and proceeds from that assumption rather than waiting to ascertain the actual outcome.
33. What are exceptions and interrupts?
Exception, also called interrupt, is an unscheduled event that disrupts program execution used to detect
overflow. Eg. Arithmetic overflow, using an undefined instruction.
Interrupt is an exception that comes from outside of the processor.
Eg. I/O device request
34. Define Vectored Interrupts
Vectored interrupt is an interrupt in that the address to which the control is transferred is determined by
the cause of the exception.

23
http://www.francisxavier.ac.in

UNIT-4
PARALLELISM
1. What is Instruction Level Parallelism? (NOV/DEC 2011)
Pipelining is used to overlap the execution of instructions and improve performance. This potential
overlap among instructions is called instruction level parallelism (ILP).
2. Explain various types of Dependences in ILP.
Data Dependences
Name Dependences
Control Dependences
3. What is Multithreading?
Multithreading allows multiple threads to share the functional units of a single processor in an
overlapping fashion. To permit this sharing, the processor must duplicate the independent state of
each thread.
4. What are multiprocessors? Mention the categories of multiprocessors?
Multiprocessor are used to increase performance and improve availability. The different categories are
SISD, SIMD, MIMD.
5. What are two main approaches to multithreading?
Fine-grained multithreading
Coarse-grained multithreading

6. What is the need to use multiprocessors?


1. Microprocessors as the fastest CPUs
Collecting several much easier than redesigning 1
2. Complexity of current microprocessors
Do we have enough ideas to sustain 1.5X/yr?
Can we deliver such complexity on schedule?
3. Slow (but steady) improvement in parallel software (scientific apps, databases, OS)
4. Emergence of embedded and server markets driving microprocessors in addition to desktops
Embedded functional parallelism, producer/consumer model
5. Server figure of merit is tasks per hour vs. latency.
7. Write the advantages of Multithreading.
24
http://www.francisxavier.ac.in

If a thread gets a lot of cache misses, the other thread(s) can continue, taking advantage of the
unused computing resources, which thus can lead to faster overall execution, as these resources would
have been idle if only a single thread was executed. If a thread cannot use all the computing resources of
the CPU (because instructions depend on each other's result), running another thread permits to not leave
these idle.
If several threads work on the same set of data, they can actually share their cache, leading to
better cache usage or synchronization on its values.
8. Write the disadvantages of Multithreading.
Multiple threads can interfere with each other when sharing hardware resources such as caches
or translation lookaside buffers (TLBs). Execution times of a single-thread are not improved but can be
degraded, even when only one thread is executing. This is due to slower frequencies and/or additional
pipeline stages that arc necessary to accommodate thread-switching hardware. Hardware support for

Multithreading is more visible to software, thus requiring more changes to both application programs and
operating systems than Multi processing.
9. What is CMT?
Chip multiprocessors - also called multi-core microprocessors or CMPs for short - are now the
only way to build high-performance microprocessors, for a variety of reasons. Large uniprocessors are no
longer scaling in performance, because it is only possible to extract a limited amount of parallelism from
a typical instruction stream using conventional superscalar instruction issue techniques. In addition, one
cannot simply ratchet up the clock speed on today's processors, or the power dissipation will become
prohibitive in all but water-cooled systems.
10. What is SMT?
Simultaneous multithreading, often abbreviated as SMT, is a technique for improving the overall
efficiency of superscalar CPUs with hardware multithreading. SMT permits multiple independent threads
of execution to better utilize the resources provided by modern processor architectures.
11. Write the advantages of CMP?
CMPs have several advantages over single processor solutions energy and silicon area efficiency
i. By Incorporating smaller less complex cores onto a single chip
ii. Dynamically switching between cores and powering down unused cores
iii. Increased throughput performance by exploiting parallelism
iv.Multiple computing resources can take better advantage of instruction, thread, and process level
12. What are the Disadvantages of SMT?
Simultaneous multithreading cannot improve performance if any of the shared resources are
limiting bottlenecks for the performance. In fact, some applications run slower when simultaneous
multithreading is enabled. Critics argue that it is a considerable burden to put on software developers that
they have to test whether simultaneous multithreading is good or bad for their application in various
situations and insert extra logic to turn it off if it decreases performance.
13. What are the types of Multithreading?
Block multi-threading
25
http://www.francisxavier.ac.in

Interleaved multi-threading

14. What Thread-level parallelism (TLP)?


Explicit parallel programs already have TLP (inherent)
Sequential programs that are hard to parallelize or ILP-limited can be speculatively parallelized in
hardware.
15. List the major MIMD Styles
Centralized shared memory ("Uniform Memory Access" time or "Shared Memory Processor")
Decentralized memory (memory module CPU) get more memory bandwidth, lower memory
Drawback: Longer communication latency
Drawback: Software model more complex

16. Distinguish between shared memory multiprocessor and message-passing multiprocessor.


A multiprocessor with a shared address space, that address space can be used to communicate
data implicitly via load and store operations is shared memory multiprocessor.
A multiprocessor with a multiple address space, communication of data is done by explicitly
passing message among processor is message-passing multiprocessor.

17. Draw the basic structure of Basic Structure of a Symmetric Shared Memory Multiprocessor

18. What is multicore'?


At its simplest, multi-core is a design in which a single physical processor contains the core logic
of more than one processor. It's as if an Intel Xeon processor were opened up and inside were packaged
all the circuitry and logic for two (or more) Intel Xcon processors. The multi-core design takes several
such processor "cores" and packages them as a single physical processor.The goal of this design is to
enable a system to run more tasks simultaneously and thereby achieve greater overall system
performance.
26
http://www.francisxavier.ac.in

19. Write the software implications of a multicore processor?


Multi-core systems will deliver benefits to all software, but especially multi-threaded programs.All code
that supports HT Technology or multiple processors, for example, will benefit automatically from multicore processors, without need for modification. Most server-side enterprise packages and many desktop
productivity tools fall into this category.
20. What is coarse grained multithreading?
It switches threads only on costly stalls. Thus it is much less likely to slow down the execution
an individual thread.
21. What is multiple issue? Write any two approaches.
Multiple issue is a scheme whereby multiple instructions are launched in one clock cycle. It is a method
for increasing the potential amount of instruction-level parallelism. It is done by replicating the internal
components of the computer so that it can launch multiple instructions in every pipeline stage. The two
approaches are:
1. Static multiple issue (at compile time)
2. Dynamic multiple issue (at run time)

22. What is meant by speculation?


One of the most important methods for finding and exploiting more ILP is speculation. It is an approach
whereby the compiler or processor guesses the outcome of an instruction to remove it as dependence in
executing other instructions.
For example, we might speculate on the outcome of a branch, so that instructions after the branch could
be executed earlier.
23. Define Static Multiple Issue
Static multiple issue is an approach to implement a multiple-issue processor where many decisions are
made by the compiler before execution.
24. Define Issue Slots and Issue Packet
Issue slots are the positions from which instructions could be issued in a given clock cycle. By analogy,
these correspond to positions at the starting blocks for a sprint.
Issue packet is the set of instructions that issues together in one clock cycle; the packet may be
determined statically by the compiler or dynamically by the processor.
25. Define VLIW
Very Long Instruction Word (VLIW) is a style of instruction set architecture that launches many
operations that are defined to be independent in a single wide instruction, typically with many separate
opcode fields.
26. Define Superscalar Processor
27
http://www.francisxavier.ac.in

Superscalar is an advanced pipelining technique that enables the processor to execute more than one
instruction per clock cycle by selecting them during execution. Dynamic multiple-issue processors are
also known as superscalar processors, or simply superscalars.
27. What is meant by loop unrolling?
An important compiler technique to get more performance from loops is loop unrolling, where multiple
copies of the loop body are made. After unrolling, there is more ILP available by overlapping instructions
from different iterations.
28. What is meant by anti-dependence? How is it removed?
Anti-dependence is an ordering forced by the reuse of a name, typically a register, rather than by a true
dependence that carries a value between two instructions. It is also called as name dependence.
Register renaming is the technique used to remove anti-dependence in which the registers are renamed by
the compiler or hardware.
29. What is the use of reservation station and reorder buffer?
Reservation station is a buffer within a functional unit that holds the operands and the operation.
Reorder buffer is the buffer that holds results in a dynamically scheduled processor until it is safe to store
the results to memory or a register.
30. Differentiate in-order execution from out-of-order execution.
Out-of-order execution is a situation in pipelined execution when an instruction is blocked from executing
does not cause the following instructions to wait. It preserves the data flow order of the program.
In-order execution requires the instruction fetch and decode unit to issue instructions in order, which
allows dependences to be tracked, and requires the commit unit to write results to registers and memory in
program fetch order. This conservative mode is called in-order commit.

31. What is meant by hardware multithreading?


Hardware multithreading allows multiple threads to share the functional units of a single processor in an
overlapping fashion to try to utilize the hardware resources efficiently. To permit this sharing, the
processor must duplicate the independent state of each thread. It Increases the utilization of a processor.
32. What are the two main approaches to hardware multithreading?
There are two main approaches to hardware multithreading. Fine-grained multithreading switches
between threads on each instruction, resulting in interleaved execution of multiple threads. This
interleaving is often done in a round-robin fashion, skipping any threads that are stalled at that clock
cycle.
Coarse-grained multithreading is an alternative to fine-grained multithreading. It switches threads only on
costly stalls, such as last-level cache misses.
33. What is SMT?
Simultaneous Multithreading (SMT) is a variation on hardware multithreading that uses the resources of a
multiple-issue, dynamically scheduled pipelined processor to exploit thread-level parallelism. It also
exploits instruction level parallelism.
34. Differentiate SMT from hardware multithreading.
28
http://www.francisxavier.ac.in

Since SMT relies on the existing dynamic mechanisms, it does not switch resources every cycle. Instead,
SMT is always executing instructions from multiple threads, leaving it up to the hardware to associate
instruction slots and renamed registers with their proper threads.
35. What are the three multithreading options?
The three multithreading options are:
1. A superscalar with coarse-grained multithreading
2. A superscalar with fine-grained multithreading
3. A superscalar with simultaneous multithreading
36. Define SMP
Shared memory multiprocessor (SMP) is one that offers the programmer a single physical address space
across all processors - which is nearly always the case for multicore chips. Processors communicate
through shared variables in memory, with all processors capable of accessing any memory location via
loads and stores.

UNIT-5
MEMORY AND I/O SYSTEMS
1. What are the multimedia applications which use caches?
Some Multimedia application areas where cache is extensively used are
*Multimedia Entertainment
*Education
*Office Systems
*Audio and video Mail
29
http://www.francisxavier.ac.in

*Computer Architecture - Set 6


2. Explain virtual memory technique.
Techniques that automatically move program and data blocks into the physical memory
when they are required for execution are called virtual memory technique
3. What are virtual and logical addresses?
The binary addresses that the processor issues for either instruction or data are called
virtual or logical addresses.
4. Define translation buffer.
Most commercial virtual memory systems incorporate a mechanism that can avoid the bulk of the
main memory access called for by the virtual to physical addresses translation buffer. This may be done
with a cache memory called a translation buffer.
5. What is branch delay slot?
The location containing an instruction that may be fetched and then discarded because of
the branch is called branch delay slot.
6. What is optical memory?
Optical or light based techniques for data storage, such memories usually employ optical
disk which resemble magnetic disk in that they store binary information in concentric tracks on
an electromechanically rotated disks. The information is read as or written optically, however
with a laser replacing the read write arm of a magnetic disk drive. Optical memory offer high
storage capacities but their access rate is are generally less than those of magnetic disk.

7. What are static and dynamic memories?


Static memory are memories which require periodic no refreshing. Dynamic memories
are memories, which require periodic refreshing.
8. What are the components of memory management unit?
A facility for dynamic storage relocation that maps logical memory references into
physical memory addresses.
30
http://www.francisxavier.ac.in

A provision for sharing common programs stored in memory by different users .


9. What is the role of MAR and MDR?
The MAR (memory address register) is used to hold the address of the location to or from
which data are to be transferred and the MDR(memory data register) contains the data to be
written into or read out of the addressed location.
10. Distinguish Between Static RAM and Dynamic RAM?
Static RAM are fast, but they come at high cost because their cells require several
transistors. Less expensive RAM can be implemented if simpler cells are used. However such
cells do not retain their state indefinitely; Hence they are called Dynamic RAM.
11. Distiguish between asynchronies DRAM and synchronous RAM.
The specialized memory controller circuit provides the necessary control signals, RAS
And CAS ,that govern the timing. The processor must take into account the delay in the response
of the memory. Such memories are referred to as asynchronous DRAMS.The DRAM whose
operations is directly synchronized with a clock signal. Such Memories are known as
synchronous DRAM.
12. What do you mean associative mapping technique?
The tag of an address received from the CPU is compared to the tag bits of each block of the cache
to see if the desired block is present. This is called associative mapping technique.
13. What is SCSI?
Small computer system interface can be used for all kinds of devices including RAID
storage subsystems and optical disks for large- volume storage applications.

14. What are the two types of latencies associated with storage?
The latency associated with storage is divided into 2 categories
1. Seek Latencies which can be classified into Overlapped seek,Mid transfer seek and Elevator seek
31
http://www.francisxavier.ac.in

2. Rotational Latencies which can be reduced either by Zero latency read or Write and Interleave factor.
15. What do you mean by Disk Spanning?
Disk spanning is a method of attaching drives to a single host uadapter. All drives appear as a
single contiguous logical unit. Data is written to the first drive first and when the drive is full, the
controller switches to the second drive, then the second drive writes until its full.
16. What is SCSI?
Small computer system interface can be used for all kinds of devices including RAID storage
subsystems and optical disks for large- volume storage applications.
17. Define the term RELIABILITY
Means feature that help to avoid and detect such faults. A realible system does not silently
continue and delivery result that include interrected and corrupted data, instead it corrects the corruption
when possible or else stops
18.Define the term AVAILABLITY:
Means features that follow the systerm to stay operational even offen faults do occur. A highly
available systerm could dis able do the main functioning portion and continue operating at the reduced
capacity
19. How the interrupt is handled during exception?
* cpu identifies source of interrupt
* cpu obtains memory address of interrupt handles
* pc and other cpu status information are saved
* Pc is loaded with address of interrupt handler and handling program to handle it
20. What is IO mapped input output?
A memory reference instruction activated the READ M (or)WRITE M control line and does not
affect the IO device. Separate IO instruction are required to activate the READ IO and WRITE IO lines
,which cause a word to be transferred between the address aio port and the CPU. The memory and IO
address space are kept separate.
21.Specify the three types of the DMA transfer techniques?
--Single transfer mode(cyclestealing mode)
--Block Transfer Mode(Brust Mode)
--Demand Transfer Mode --Cascade Mode

32
http://www.francisxavier.ac.in

22. What is an interrupt?


An interrupt is an event that causes the execution of one program to be suspended and another
program to be executed.
23.What are the uses of interrupts?
*Recovery from errors
*Debugging
*Communication between programs
*Use of interrupts in operating system
24.Define vectored interrupts.
In order to reduce the overhead involved in the polling process, a device requesting an interrupt
may identify itself directly to the CPU. Then, the CPU can immediately start executing the corresponding
interrupt-service routine. The term vectored interrupts refers to all interrupthandling schemes base on this
approach.
25. Name any three of the standard I/O interface.
*SCSI (small computer system interface),bus standards
*Back plane bus standards
*IEEE 796 bus (multibus signals)
*NUBUS & IEEE 488 bus standard
26. What is an I/O channel?
An i/o channel is actually a special purpose processor, also called peripheral processor.The main
processor initiates a transfer by passing the required information in the input output channel. the channel
then takes over and controls the actual transfer of data.

27.What is a bus?
A collection of wires that connects several devices is called a bus.
28.Define word length?
Each group of n bits is referred to as a word of information and n is called the word length.
29. Why program controlled I/O is unsuitable for high-speed data transfer?
In program controlled i/o considerable overhead is incurred..because several program instruction
have to be executed for each data word transferred between the external devices and MM.Many high
speed peripheral; devices have a synchronous modes of operation.that is data transfer are controlled by a
clock of fixed frequency, independent of the cpu.
33
http://www.francisxavier.ac.in

30.what is the function of i/o interface?


The function is to coordinate the transfer of data between the cpu and external devices.
31.what is NUBUS?
A NUBUS is a processor independent, synchronous bus standard intended for use in 32 bitmicro
processor system. It defines a backplane into which upto 16 devices may be plugged each in the form of
circuit board of standard dimensions.
32. Name some of the IO devices.
*Video terminals
*Video displays
*Alphanumeric displays
*Graphics displays
* Flat panel displays
*Printers
*Plotters
33. What are the steps taken when an interrupt occurs?
*Source of the interrupt
*The memory address of the required ISP
* The program counter &cpu information saved in subroutine
*Transfer control back to the interrupted program
34.Define interface.
The word interface refers to the boundary between two circuits or devices
35.What is programmed I/O?
Data transfer to and from peripherals may be handled using this mode. Programmed I/O
operations are the result of I/O instructions written in the computer program.
36.Types of buses.
-Synchronous bus
-Asynchronous bus
34
http://www.francisxavier.ac.in

37.Define Synchronous bus.


- Synchronous bus on other hand contains synchronous clock that is used to validate each and every
signal.
- Synchronous buses are affected noise only when clock signal occurs.
- Synchronous bus designers must control with meta stability when attempting different clock signal
Frequencies
- Synchronous bus of meta stability arises in any flip flop. when time will be violated.
38. Define Asynchronous bus.
- Asynchronous buses can mistake noise pulses at any time for valid handshake signal.
- Asynchronous bus designer must deal with events that like synchronously.
- It must contend with meta stability when events that drive bus transaction.
-When flip flop experiences effects can occur in downstream circuitry unless proper design technique
which are used
39. What are the temporal and spatial localities of references?
Temporal locality (locality in time): if an item is referenced, it will tend to be
referenced again soon.
Spatial locality (locality in space): if an item is referenced, items whose
addresses are close by will tend to be referenced soon.
40. Write the structure of memory hierarchy.

35
http://www.francisxavier.ac.in

41. What are the various memory technologies?


The various memory technologies are:
1. SRAM semiconductor memory
2. DRAM semiconductor memory
3. Flash semiconductor memory
4. Magnetic disk

42. Differentiate SRAM from DRAM.


SRAMs are simply integrated circuits that are memory arrays with a single access port that can provide
either a read or a write. SRAMs have a fixed access time to any datum. SRAMs dont need to refresh and
so the access time is very close to the cycle time. SRAMs typically use six to eight transistors per bit to
prevent the information from being disturbed when read. SRAM needs only minimal power to retain the
charge in standby mode.
In a dynamic RAM (DRAM), the value kept in a cell is stored as a charge in a capacitor. A single
transistor is then used to access this stored charge, either to read the value or to overwrite the charge
stored there. Because DRAMs use only a single transistor per bit of storage, they are much denser and
cheaper per bit than SRAM.
43. What is flash memory?
Flash memory is a type of electrically erasable programmable read-only memory (EEPROM). Unlike
disks and DRAM, EEPROM technologies can wear out flash memory bits. To cope with such limits, most
flash products include a controller to spread the writes by remapping blocks that have been written many
times to less trodden blocks. This technique is called wear levelling.
44. Define Rotational Latency
Rotational latency, also called rotational delay, is the time required for the desired sector of a disk to
rotate under the read/write head, usually assumed to be half the rotation time.
45. What is direct-mapped cache?
Direct-mapped cache is a cache structure in which each memory location is mapped to exactly one
location in the cache. For example, almost all direct-mapped caches use this mapping to find a block,
(Block address) modulo (Number of blocks in the cache)
46. Consider a cache with 64 blocks and a block size of 16 bytes. To what block number does byte
address 1200 map?

36
http://www.francisxavier.ac.in

The block is given by,

47. How many total bits are required for a direct-mapped cache with 16 KiB of
data and 4-word blocks, assuming a 32-bit address?

48. What are the writing strategies in cache memory?


Write-through is a scheme in which writes always update both the cache and the next lower level of the
memory hierarchy, ensuring that data is always consistent between the two.
Write-back is a scheme that handles writes by updating values only to the block in the cache, then writing
the modified block to the lower level of the hierarchy when the block is replaced.
49. What are the steps to be taken in an instruction cache miss?
The steps to be taken on an instruction cache miss are
1. Send the original PC value (current PC 4) to the memory.
2. Instruct main memory to perform a read and wait for the memory to
37
http://www.francisxavier.ac.in

complete its access.


3. Write the cache entry, putting the data from memory in the data portion of the entry, writing the upper
bits of the address (from the ALU) into the tag
field, and turning the valid bit on.
4. Restart the instruction execution at the first step, which will refetch the
instruction, this time finding it in the cache.
50. Define TLB
Translation-Lookaside Buffer (TLB) is a cache that keeps track of recently used address mappings to try
to avoid an access to the page table.
51. What is meant by virtual memory?
Virtual memory is a technique that uses main memory as a cache for secondary storage. Two major
motivations for virtual memory: to allow efficient and safe sharing of memory among multiple programs,
and to remove the programming burdens of a small, limited amount of main memory.
52. Differentiate physical address from logical address.
Physical address is an address in main memory.
Logical address (or) virtual address is the CPU generated addresses that corresponds to a location in
virtual space and is translated by address mapping to a physical address when memory is accessed.
53. Define Page Fault
Page fault is an event that occurs when an accessed page is not present in main memory.

54. What is meant by address mapping?


Address translation also called address mapping is the process by which a virtual address is mapped to an
address used to access memory.

38
http://www.francisxavier.ac.in

PART-B
16 MARK QUESTIONS
UNIT-I
OVERVIEW & INSTRUCTIONS
1. Explain the Eight ideas of the Computer architects in detail.(8)
The eight great ideas in computer architecture are:
1. Design for Moores Law
2. Use Abstraction to Simplify Design
3. Make the Common Case Fast
4. Performance via Parallelism
5. Performance via Pipelining
6. Performance via Prediction
39
http://www.francisxavier.ac.in

7. Hierarchy of Memories
8. Dependability via Redundancy
2. Explain the components of a computer with the block diagram in detail.(16)

3. Explain the technologies for building computer over time with a neat graph.(6)
Electronics technology continues to evolve
Increased capacity and performance
Reduced cost

4. Explain the chip manufacturing process with a neat diagram in detail.(10)


Silicon: semiconductor
Add materials to transform properties:
Conductors
Insulators
Switch

40
http://www.francisxavier.ac.in

Cost per wafer


Dies per wafer Yield
Dies per wafer Wafer area Die area
Cost per die

Yield

1
(1 (Defects per area Die area/2))2

4.Explain the techniques used to measure the performance of a computer.(8)


Response time
How long it takes to do a task
Throughput
Total work done per unit time
e.g., tasks/transactions/ per hour
How are response time and throughput affected by
Replacing the processor with a faster version?
Adding more processors?
Define Performance = 1/Execution Time

X is n time faster than Y

Performance X Performance Y
Execution timeY Execution timeX n

41
http://www.francisxavier.ac.in

Example: time taken to run a program


10s on A, 15s on B
Execution TimeB / Execution TimeA
= 15s / 10s = 1.5
So A is 1.5 times faster than B
Elapsed time
Total response time, including all aspects
Processing, I/O, OS overhead, idle time
Determines system performance
CPU time
Time spent processing a given job
Discounts I/O time, other jobs shares
Comprises user CPU time and system CPU time
Different programs are affected differently by CPU and system performance
CPU Clocking
CPU Time
Instruction Count and CPI
Instructions Clock cycles Seconds
CPU Time

Program
Instruction Clock cycle
Performance depends on
Algorithm: affects IC, possibly CPI
Programming language: affects IC, CPI
Compiler: affects IC, CPI
Instruction set architecture: affects IC, CPI, Tc

42
http://www.francisxavier.ac.in

5.(i)Prove that how performance and execution are inverse to each other.(2)

(ii) If computer A runs a program in 10 seconds and computer B runs the same program in 15 seconds,
how much faster is A than B?(2)

Performance X Performance Y
Execution timeY Execution timeX n
(iii)Write the formula to calculate the CPU execution time for a program.(2)
CPU Time CPU Clock Cycles Clock Cycle Time

CPU Clock Cycles


Clock Rate

(iv) Write the formula to calculate the CPU clock cycles.(2)


Clock Cycles Ins truction Count Cycles per Ins truction
CPU Tim e Ins truction Count CPI Clock Cycle Tim e

Ins truction Count CPI


ClockRate

(v) Write the formula to calculate the classic CPU Performance equation.(2)

CPU Time

Instructions Clock cycles Seconds

Program
Instruction Clock cycle

6. Explain how clock rate and power are related to each other in microprocessor over years with a
neat graph.(6)

Power Capacitive load Voltage2 Frequency

Suppose a new CPU has


43
http://www.francisxavier.ac.in

85% of capacitive load of old CPU


15% voltage and 15% frequency reduction

Pnew Cold 0.85 (Vold 0.85)2 Fold 0.85

0.854 0.52
2
Pold
Cold Vold Fold
7. Explain the need to switch from uniprocessors to multiprocessors and draw the performance
chart for processors over years. (6)

Multicore microprocessors
More than one processor per chip
Requires explicitly parallel programming
Compare with instruction level parallelism
Hardware executes multiple instructions at once
Hidden from the programmer
Hard to do
Programming for performance
Load balancing
Optimizing communication and synchronization
8. Explain the basic instruction types with examples.(6)
Instructions fall into several broad categories that you should be familiar with:
Data movement.
Arithmetic.
Boolean.
Bit manipulation.
44
http://www.francisxavier.ac.in

I/O.
Control transfer.
Special purpose.

9. (i)Explain the different types of instruction set architecture in detail(6)


Instruction set architectures are measured according to:
Main memory space occupied by a program.
Instruction complexity.
Instruction length (in bits).
Total number of instructions in the instruction set.
In designing an instruction set, consideration is given to:
Instruction length.
Whether short, long, or variable.
Number of operands.
Number of addressable registers.
Memory organization.
Whether byte- or word addressable.
Addressing modes.
Choose any or all: direct, indirect or indexed.
Byte ordering, or endianness, is another major architectural consideration.
If we have a two-byte integer, the integer may be stored so that the least significant byte is
followed by the most significant byte or vice versa.
In little endian machines, the least significant byte is followed by the most significant byte.
Big endian machines store the most significant byte first (at the lower address).
Stack architectures require us to think about arithmetic expressions a little differently.
We are accustomed to writing expressions using infix notation, such as: Z = X + Y.
Stack arithmetic requires that we use postfix notation: Z = XY+.
This is also called reverse Polish notation, (somewhat) in honor of its Polish inventor, Jan
Lukasiewicz (1878 - 1956).
10. What do you mean by addressing modes? Explain various addressing modes with the help of
examples.(16)

Immediate addressing is where the data is part of the instruction.

Direct addressing is where the address of the data is given in the instruction.

Register addressing is where the data is located in a register.

Indirect addressing gives the address of the address of the data in the instruction.

Register indirect addressing uses a register to store the address of the address of the data.
45
http://www.francisxavier.ac.in

Indexed addressing uses a register (implicitly or explicitly) as an offset, which is added to the
address in the operand to determine the effective address of the data.

Based addressing is similar except that a base register is used instead of an index register.

The difference between these two is that an index register holds an offset relative to the address
given in the instruction, a base register holds a base address where the address field represents a
displacement from this base.

In stack addressing the operand is assumed to be on top of the stack.

There are many variations to these addressing modes including:

Indirect indexed.

Base/offset.

Self-relative

Auto increment - decrement.

46
http://www.francisxavier.ac.in

UNIT-2
ARITHMETIC OPERATIONS
1.Explain the design of ALU in detail.(16)

An arithmetic logic unit (ALU)


Performs arithmetic and logic operations
A fundamental building block of the Central Processing Unit (CPU) of a computer
Even the simplest microprocessors contain one for purposes such as maintaining timers

A combinational logic circuit

47
http://www.francisxavier.ac.in

2.Explain with an example how to multiply two unsigned binary numbers.(8)

Paper and pencil example (unsigned):

Multiplica
Multiplier

Produc

1000
1001
1000
0000
0000
1000
01001000
48
http://www.francisxavier.ac.in

m bits x n bits = m+n bit product

Binary makes it easy:

0 => place 0

1 => place a copy

( 0 x multiplicand)
( 1 x multiplicand)

4 versions of multiply hardware & algorithm:

successive refinement

3.Describe in detail booths multiplication algorithm and its hardware implementation?


Current Bit

Bit to the Right

Explanation

Example

Op

Begins run of 1s

000111100

sub

Middle of run of 1s

0001111000

none

End of run of 1s

0001111000

add

Middle of run of 0s

0001111000

none

Originally for Speed (when shift was faster than add)

Replace a string of 1s in multiplier with an initial subtract when we first see a one and then
later add for the bit after the last one

Booth's algorithm can be implemented by repeatedly adding (with ordinary unsigned binary addition) one
of two predetermined values A and S to a product P, then performing a rightward arithmetic shift on P.
Let m and r be the multiplicand and multiplier, respectively; and let x and y represent the number of bits
in m and r.
1. Determine the values of A and S, and the initial value of P. All of these numbers should have a
length equal to (x + y + 1).
49
http://www.francisxavier.ac.in

1. A: Fill the most significant (leftmost) bits with the value of m. Fill the remaining (y + 1)
bits with zeros.
2. S: Fill the most significant bits with the value of (m) in two's complement notation. Fill
the remaining (y + 1) bits with zeros.
3. P: Fill the most significant x bits with zeros. To the right of this, append the value of r. Fill
the least significant (rightmost) bit with a zero.
2. Determine the two least significant (rightmost) bits of P.
1. If they are 01, find the value of P + A. Ignore any overflow.
2. If they are 10, find the value of P + S. Ignore any overflow.
3. If they are 00, do nothing. Use P directly in the next step.
4. If they are 11, do nothing. Use P directly in the next step.
3. Arithmetically shift the value obtained in the 2nd step by a single place to the right. Let P now
equal this new value.
4. Repeat steps 2 and 3 until they have been done y times.
5. Drop the least significant (rightmost) bit from P. This is the product of m and r.

4.Explain the Working of a Carry-Look Ahead adder. (16)

50
http://www.francisxavier.ac.in

5.Derive and explain an algorithm for adding and subtracting two floating point binary
numbers.(8)
ADDITION
example on decimal value given in scientific notation:
3.25 x 10 ** 3
+ 2.63 x 10 ** -1
----------------first step: align decimal points
second step: add

3.25 x 10 ** 3
+ 0.000263 x 10 ** 3
-------------------3.250263 x 10 ** 3
(presumes use of infinite precision, without regard for accuracy)
third step: normalize the result (already normalized!)

51
http://www.francisxavier.ac.in

SUBTRACTION
like addition as far as alignment of radix points
then the algorithm for subtraction of sign mag. numbers takes over.

before subtracting,
compare magnitudes (don't forget the hidden bit!)
change sign bit if order of operands is changed.
don't forget to normalize number afterward.
Step 1: Calculate difference d of the two exponents - d=|E1 - E2|
Step 2: Shift significand of smaller number by d base- positions to the right
Step 3: Add aligned significands and set exponent of result to exponent of larger operand
Step 4: Normalize resultant significand and adjust exponent if necessary
Step 5: Round resultant significand and adjust exponent if necessary

6.Describe the algorithm for integer division with suitable examples.(16)

52
http://www.francisxavier.ac.in

7.Perform the integer division using non-restoring and restoring division.

(10)

9/4
The restoring-division algorithm:
S1: DO n times
Shift A and Q left one binary position.
Subtract M from A, placing the answer back in A.
If the sign of A is 1, set q0 to 0 and add M back to A (restore A); otherwise, set q0 to 1.
The non-restoring division algorithm:
S1: Do n times
If the sign of A is 0, shift A and Q left one binary position and subtract M from A; otherwise, shift A
and Q left and add M to A.
S2: If the sign of A is 1, add M to A.

53
http://www.francisxavier.ac.in

Time
Clock cy cle

F1

D1

E1

W1

F2

D2

E2

W2

F3

D3

E3

W3

F4

D4

E4

Instruction
I1

I2

I3

I4

W4

UNIT-3

(a) Instruction execution div ided into f our steps


Interstageuff
b ers

D : Decode
instruction
and f etch
operands

F : Fetch
instruction
B1

E: Execute
operation
B2

(b) Hardware organization

W : Write
results
B3

PROCESSOR AND CONTROL UNIT

Figure 8.2. A 4-stage pipeline.

1.Discuss the basic concepts of pipelining.

54
http://www.francisxavier.ac.in

2.State and explain the different types of hazards that can occur in a pipeline.

Data Hazards

Instruction Hazards

3. Explain the basic MIPS implementation of instruction set


Our implementation of the MIPS is simplified
l

memory-reference instructions: lw, sw

arithmetic-logical instructions: add, sub, and, or, slt

control flow instructions: beq, j

Generic implementation
l

use the program counter (PC) to supply the instruction address and fetch the
instruction from memory (and update the PC)

decode the instruction (and read registers)

execute the instruction

All instructions (except j) use the ALU after reading the registers
Clocking Methodologies
The clocking methodology defines when signals can be read and when they are written
l

An edge-triggered methodology

Fetching instructions involves


l

reading the instruction from the Instruction Memory

updating the PC to hold the address of the next instruction

Decoding instructions involves


l

sending the fetched instructions opcode and function field bits to the control unit

Executing R Format Operations


R format operations (add, sub, slt, and, or)
Executing Load and Store Operations
Load and store operations involves
55
http://www.francisxavier.ac.in

compute memory address by adding the base register (read from the Register File during
decode) to the 16-bit signed-extended offset field in the instruction

store value (read from the Register File during decode) written to the Data Memory

load value, read from the Data Memory, written to the Register File

4.What is data hazard?Explain the methods for dealing with the data hazards

Data hazards occur when data is used before it is ready

The use of the result of the SUB instruction in the next three instructions causes a data hazard, since
the register $2 is not written until after those instructions read it

Solutions for Data Hazards


56
http://www.francisxavier.ac.in

Stalling
Forwarding:
connect new value directly to next stage
Reordering
5.Describe the data and control path techniques in pipelining.(10)

Data path part of the CPU where data signals flow

Control unit guides data signals through data path


Pipelining a way of achieving greater performance

Data Path:

The whole point of pipelining is to allow multiple instructions to execute at the same time.

We may need to perform several operations in the same cycle.


Increment the PC and add registers at the same time.
Fetch one instruction while another one reads or writes data.

Clock cycle

lw

$t0, 4($sp)

sub
$a1

$v0,

$a0,

and
$t3

$t1,

$t2,

or
$s2

$s0,

$s1,

add
$0

$t5,

$t6,

IF

ID

EX

MEM

WB

IF

ID

EX

MEM

WB

IF

ID

EX

MEM

WB

IF

ID

EX

MEM

WB

IF

ID

EX

MEM

WB

Thus, like the single-cycle datapath, a pipelined processor will need to duplicate hardware
elements that are needed several times in the same clock cycle.
57
http://www.francisxavier.ac.in

Control Path:

Control unit guides data signals through data path

6.What is instruction hazard?Explain in detail how to handle the instruction hazards in


pipelining with relevant examples.(10)
Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the
pipeline stalls.
Cache miss
Branch
Scoreboards are designed to control the flow of data between registers and multiple arithmetic units in
the presence of conflicts caused by hardware resource limitations (structural hazards) and by
dependencies between instructions (data hazards). Data hazards can be classified as flow
dependencies (Read-After-Write), output
dependencies (Write-After-Write)
and antidependencies (Write-After-Read).
Read-After-Write (RAW) Hazards
A Read-After-Write hazard occurs when an instruction requires the the result of a previously
issued, but as yet uncompleted instruction. In the RAW example shown in the figure, the second
instruction requires the value in R6 which has not yet been produced by the first instruction.

Write-After-Write (WAW) Hazards


A Write-After-Write hazard occurs when an instruction tries to write its result to the same register
as a previously issued, but as yet uncompleted instruction. In the WAW example shown in the
figure, both instructions write their results to R6. Although this latter example is unlikely to arise
in normal programming practice, it must nevertheless give the correct result. Without proper
interlocks the add operation would complete first and the result in R6 would then be overwritten
by that of the multiplication.
Write-After-Read (WAR) Hazards
A Write-After-Read hazard occurs when an instruction tries to write to a register which has not yet
been read by a previously issued, but as yet uncompleted instruction. This hazard cannot occur in
most systems, but could occur in the CDC 6600 because of the way instructions were issued to the
arithmetic units. The WAR example shown in the figure is based on the CDC 6600, in which
floating-point values were held in X registers.
58
http://www.francisxavier.ac.in

The WAR hazard here is on register X4 in the third instruction. It arises because instructions
which are held up by a RAW hazard are nevertheless issued to their arithmetic unit, where they
wait for their operands. Thus the second instruction can be issued immediately after the first, but it
is held up in the add unit waiting for its operands because of the WAR hazard on X3 which cannot
be read until the divide unit completes its operation. The third instruction can likewise be issued
immediately after the second and it can start its operation. The floating-point add operation
completes in very much less time than division, however, and the add unit is therefore ready to
store its result in X4 before the multiply unit has read the current value in X4. Thus there has to be
an interlock between the multiply and add instructions to prevent the add instruction from writing
to X4 before the multiply instruction has read its current value

7.Describe the techniques for handling control hazards in pipelining.(10)


Control hazards - attempt to make decision before condition is evaluated
Solution to control hazard:
Stall
stop loading instructions until result is available
Predict
assume an outcome and continue fetching (undo if prediction is wrong)
lose cycles only on mis-prediction
Delayed branch
specify in architecture that the instruction immediately following branch is always
executed

8.Write note on exception handling.(6)

Exceptions definition: unexpected change in control flow

Another form of control hazard.

For example:
add $1, $2, $1;

causing an arithmetic overflow

sw $3, 400($1);
add $5, $1, $2;
59
http://www.francisxavier.ac.in

Invalid $1 contaminates other registers or memory locations!


Two Types of Exceptions: Interrupts and Traps

Interrupts
Caused by external events:
Network, Keyboard, Disk I/O, Timer
Page fault - virtual memory
System call - user request for OS action
Asynchronous to program execution
May be handled between instructions
Simply suspend and resume user program

Traps
Caused by internal events
Exceptional conditions (overflow)
Undefined Instruction
Hardware malfunction
Usually Synchronous to program execution
Condition must be remedied by the handler
Instruction may be retried or simulated and program continued or program may be
aborted

UNIT 4
Instruction Level Parallelism
1. Explain Instruction level parallelism
Architectural technique that allows the overlap of individual machine operations ( add, mul,
load, store )
60
http://www.francisxavier.ac.in

Multiple operations will execute in parallel (simultaneously)


Goal: Speed Up the execution
Example:
load R1 R2

add R3 R3, 1

add R3 R3, 1

add R4 R3, R2

add R4 R4, R2

store [R4] R0

ILP
Overlap individual machine operations (add, mul, load) so that they execute in parallel
Transparent to the user
Goal: speed up execution
ILP Challenges
In order to achieve parallelism we should not have dependences among instructions which are
executing in parallel:
H/W terminology Data Hazards ( RAW, WAR, WAW)
S/W terminology Data Dependencies
Types of Dependencies
Name dependencies
Output dependence
Anti-dependence
Data True dependence
Control Dependence
Resource Dependence

2. Explain the difficulties faced by parallel processing programs


Parallel Processing
61
http://www.francisxavier.ac.in

Having separate processors getting separate chunks of the program ( processors programmed to
do so)
Nontransparent to the user
Goal: speed up and quality up

3. Explain shared memory multiprocessor


In a shared memory multiprocessor
Multiple processors can read and write shared memory
Shared memory might be cached in more than one processor
Cache coherence ensures same view by all processors

Centralized shared-memory multiprocessor or


Symmetric shared-memory multiprocessor (SMP)

Multiple processors connected to a single centralized


memory since all processors see the same memory
organization uniform memory access (UMA)

Shared-memory because all processors can access the


entire memory address space

4. Explain in detail Flynns classification of parallel hardware


Flynn uses the stream concept for describing a machine's structure
A stream simply means a sequence of items (data or instructions).
The classification of computer architectures based on the number of instruction steams and data
streams (Flynns Taxonomy).
Flynns Taxonomy
SISD: Single instruction single data
Classical von Neumann architecture
SIMD: Single instruction multiple data
MISD: Multiple instructions single data

62
http://www.francisxavier.ac.in

Non existent, just listed for completeness


MIMD: Multiple instructions multiple data
Most common and general parallel machine
5. Explain in detail hardware Multithreading
General idea: Have multiple thread contexts in a single processor
When the hardware executes from those hardware contexts determines the granularity of
multithreading
Why?
To tolerate latency (initial motivation)
Latency of memory operations, dependent instructions, branch resolution
By utilizing processing resources more efficiently
To improve system throughput
By exploiting thread-level parallelism
By improving superscalar/OoO processor utilization
To reduce context switch penalty
Benefit
Latency tolerance
Better hardware utilization (when?)
Reduced context switch penalty
Cost
Requires multiple thread contexts to be implemented in hardware (area, power, latency cost)
Usually reduced single-thread performance
Resource sharing, contention
Switching penalty (can be reduced with additional hardware)

63
http://www.francisxavier.ac.in

6. Explain SISD and MIMD


SISD (Singe-Instruction stream, Singe-Data stream)
SISD corresponds to the traditional mono-processor ( von Neumann computer). A single data
stream is being processed by one instruction stream
OR
A single-processor computer (uni-processor) in which a single stream of instructions is generated from
the program.

where

CU= Control Unit, PE= Processing Element,

M=

Memory
MIMD (Multiple-Instruction streams, Multiple-Data streams)
Each processor has a separate program.
An instruction stream is generated from each program.
Each instruction operates on different data.
This last machine type builds the group for the traditional multi-processors. Several processing
units operate on multiple-data streams.

7. Explain SIMD and MISD


SIMD (Single-Instruction stream, Multiple-Data streams)
Each instruction is executed on a different set of data by different processors i.e multiple
processing units of the same type process on multiple-data streams.
This group is dedicated to array processing machines.
Sometimes, vector processors can also be seen as a part of this group.

64
http://www.francisxavier.ac.in


MISD (Multiple-Instruction streams, Singe-Data stream)
Each processor executes a different sequence of instructions.
In case of MISD computers, multiple processing units operate on one single-data stream .
In practice, this kind of organization has never been used

8. Explain Multicore processors


A multi-core processor is a single computing component with two or more independent actual central
processing units (called "cores"), which are the units that read and execute program instructions. The
instructions are ordinary CPU instructions such as add, move data, and branch, but the multiple cores can
run multiple instructions at the same time, increasing overall speed for programs amenable to parallel
computing. Manufacturers typically integrate the cores onto a single integrated circuit die (known as a
chip multiprocessor or CMP), or onto multiple dies in a single chip package.
9. Explain the different types of multithreading
Fine-grained
Cycle by cycle
Coarse-grained
Switch on event (e.g., cache miss)
Switch on quantum/timeout
Simultaneous
Instructions from multiple threads executed concurrently in the same cycle
65
http://www.francisxavier.ac.in

UNIT 5
Memory and I/O Syetem

1. Explain in detail about memory Technologies

Random Access Memory (vs. Serial Access Memory)

Different flavors at different levels

Physical Makeup (CMOS, DRAM)

Low Level Architectures (FPM,EDO,BEDO,SDRAM)

Cache uses SRAM: Static Random Access Memory

No refresh (6 transistors/bit vs. 1 transistor


Size: DRAM/SRAM 4-8,
Cost/Cycle time: SRAM/DRAM 8-16

Main Memory is DRAM: Dynamic Random Access Memory

Dynamic since needs to be refreshed periodically

Addresses divided into 2 halves (Memory as a 2D matrix):

RAS or Row Access Strobe

CAS or Column Access Strobe

Synchronous DRAM (SDRAM): Ability to transfer a burst of data given a starting address
and a burst length suitable for transferring a block of data from main memory to cache.

Page Mode DRAM: All bits on the same ROW (Spatial Locality)

Dont need to wait for wordline to recharge

Toggle CAS with new column address

Extended Data Out (EDO)

Overlap Data output w/ CAS toggle

Later brother: Burst EDO (CAS toggle used to get next addr)

Rambus DRAM (RDRAM)


- Pipelined control
66
http://www.francisxavier.ac.in

2. Expain in detail about memory Hierarchy with neat diagram


The memory unit is an essential component in any digital computer since it is needed for storing
programs and data
Not all accumulated information is needed by the CPU at the same time
Therefore, it is more economical to use low-cost storage devices to serve as a backup for storing
the information that is not currently used by CPU

3. Describe the basic operations of cache in detail with diagram

A cache hit occurs if the cache contains the data that were looking for. Hits are good, because the
cache can return the data much faster than main memory.

A cache miss occurs if the cache does not contain the requested data. This is bad, since the CPU
must then wait for the slower main memory.

There are two basic measurements of cache performance.


The hit rate is the percentage of memory accesses that are handled by the cache.
The miss rate (1 - hit rate) is the percentage of accesses that must be handled by the slower
main RAM.
Reducing hit time
1. Giving Reads Priority over Writes
67
http://www.francisxavier.ac.in

E.g., Read complete before earlier writes in write buffer


2. Avoiding Address Translation during Cache Indexing
Reducing Miss Penalty
3. Multilevel Caches
Reducing Miss Rate
4. Larger Block size (Compulsory misses)

5. Larger Cache size (Capacity misses)


6. Higher Associativity (Conflict misses)

4. Discuss the various mapping schemes used in cache design(10)


A direct-mapped cache is the simplest approach: each main memory address maps to exactly one cache
block.

For example, on the right


is a 16-byte main memory
and a 4-byte cache (four
1-byte blocks).

Memory locations 0, 4, 8
and 12 all map to cache
block 0.

Addresses 1, 5, 9 and 13
map to cache block 1, etc.

How can we compute this


mapping?

Set-Associative Cache

68
http://www.francisxavier.ac.in

5. Discuss the methods used to measure and improve the performance of the cache.
Performance is always a key issue for caches.
We consider improving cache performance by:
(1) reducing the miss rate, and
(2) reducing the miss penalty.
For (1) we can reduce the probability that different memory blocks will contend for the
same cache location.
For (2), we can add additional levels to the hierarchy, which is called multilevel caching.
We can determine the CPU time as
CPUTime (CCCPUExecution CCMemoryStalls ) tCC
The memory-stall clock cycles come from cache misses.
It can be defined as the sum of the stall cycles coming from writes + those coming from
reads:
Memory-Stall CC = Read-stall cycles + Write-stall cycles, where

Re ads
Re ad Miss Rate Re ad Miss Penalty
Pr ogram
Writes

Write stall cycles


Write Miss Rate69
Write Miss Penalty WriteBuffe rStalls
Pr ogram

http://www.francisxavier.ac.in
Re ad stall cycles

6. Explain the virtual memory address translation and TLB withnecessary diagram.
Virtual memory address translation
In a virtual memory system, the program memory is divided into fixed sized pages and allocated in fixed
sized physical memory frames. The pages do not have to be contiguous in memory. A page table keeps
track of where each page is located in physical memory. This allows the operating system to load a
program of any size into any available frames. Only the currently used pages need to be loaded. Unused
pages can remain on disk until they are referenced. This allows many large programs to be executed on a
relatively small memory system. A resident flag in the page table indicates whether or not the page is in
memory. The page table also includes several other flags to keep track of memory usage. A use flag is set
whenever the page is referenced. A dirty bit is set whenever the page is changed to inform the operating
system that the page in memory is different than the page on disk.

TLB
Just like any other cache, the TLB can be organized as fully associative, set associative, or direct mapped
TLB access time is typically smaller than cache access time (because TLBs are much smaller than
caches)
70
http://www.francisxavier.ac.in

TLBs are typically not more than 128 to 256 entries even on high end machines

A TLB miss is it a page fault or merely a TLB miss?


l

If the page is loaded into main memory, then the TLB miss can be handled (in hardware or
software) by loading the translation information from the page table into the TLB
Takes 10s of cycles to find and load the translation info into the TLB

If the page is not in main memory, then its a true page fault
Takes 1,000,000s of cycles to service a page fault

TLB misses are much more frequent than true page faults
7. Draw the typical block diagram of a DMA controller and explain how it is
used for direct data transfer between memory and peripherals.
Concentrates on data transfer between the processor and I/O devices. Data are transferred by
executing instructions such as
Move DATAIN R0
An instruction to transfer input or output data is executed only after the processor determines that
the I/O devices is ready. To do this the processor either polls a status flag in the device interface or waits
for the device to send an interrupt request. In either case considerable over head is incurred, because
several program instructions must be executed for each data word transferred. In addition to polling the
status register of the device, instructions are needed for incrementing the memory address and keeping
tract of the word count. When interrupts are used there is the additional overhead associated with saving
and restoring the program counter and other state information.

DMA Controller

Although a DMA controller can transfer data without intervention by the processor, its operation
must be under the control of a program executed by the processor. To initiate the transfer of a block of
words, the processor sends the starting address. The number of words in the block, and the direction of
the transfer. On receiving this information, the DMA controller proceeds to perform the requested
operation. When the entire block has been transferred the controller informs the processor by raising an
interrupt signal.

While a DMA transfer is taking place the program that requested the transfer cannot continue, and
the processor can be used to execute another program. After the DMA transfer is completed the processor
can return to the program that requested the transfer.
71
http://www.francisxavier.ac.in

Figure: Registers in a DMA interface

DMA data transfer


To start a DMA transfer of a block of data from the main memory to one of the disks a program
writes the address and word count information in to the register of the corresponding channel of the disk
controller. It also provides the disk controller. It also provides the disk controller with information to
identify the data for future retrieval. The DMA controller proceeds independently to implement the
specified operation. When the DMA transfer is completed, this fact is record in the status and control
register of the DMA channel by setting the Done bit. At the same time if the IE bit is set the controller
sends an interrupt request to the processor and sets the IRQ bit. The status register can also be used to
record other information, such as whether the transfer took place correctly or errors occurred.

8. Explain in detail about interrupts with diagram


Interrupts are a mechanism by which other modules (e.g.
I/O) may interrupt normal sequence of processing
Four general classes of interrupts
Program - e.g. overflow, division by zero
Timer
72
http://www.francisxavier.ac.in

Generated by internal processor timer


Used in pre-emptive multi-tasking
I/O - from I/O controller
Hardware failure
e.g. memory parity error
Particularly useful when one module is much slower than
another, e.g. disk access (milliseconds) vs. CPU
(microseconds or faster)
9.Explain in detail about I/O processor.
A simple arrangement to connect I/O devices to a computer is to use a single bus arrangement as
shown in Figure. The bus enables all the devices connected to it to exchange information. Typically it
consists of three sets for lines used to carry address, data and control signals. Each I/O device is assigned
a unique set of addresses. When the processor places particular address on the address lines, the device
that recognizes this address responds to the commands issued on the control lines. The processor requests
either a read or a write operation and the requested data are transferred over the data lines. As mentioned
in Section 2.7 when I/O devices and the memory share the same address space, the arrangement is called
memory mapped I/O.
With memory mapped I/O any machine instruction that can access memory can be used to
transferred data to or from an I/O device. For example if DATAIN is the address

73
http://www.francisxavier.ac.in

74
http://www.francisxavier.ac.in

Das könnte Ihnen auch gefallen