Beruflich Dokumente
Kultur Dokumente
ODD SEMESTER-2014
QUESTION BANK
SUBJECT CODE/ NAME: CS 6303/COMPUTER ARCHITECTURE
STAFF NAME: Mrs.S.Muthumariammal
YEAR/SEMESTER: II/III
It includes the information formats, the instruction set and techniques for addressing memory.
It describes the function and design of the various units of digital computer that store and process
information.
It refers to the operational units and their interconnections that realize the architectural
specifications.
Input unit
Memory unit
Output unit
Control unit
1
http://www.francisxavier.ac.in
8. Define MIPS .
MIPS:One alternative to time as the metric is MIPS(Million Instruction Per Second)
MIPS=Instruction count/(Execution time x1000000).This MIPS measurement is also called Native MIPS
to distinguish it from some alternative definitions of MIPS.
9.Define MIPS Rate:
The rate at which the instructions are executed at a given time.
10.Define Throughput and Throughput rate.
Throughput rate-The rate at which the total amount of work done at a given time.
A mode field that specifies the way the operand or the effective address is
determined
4
http://www.francisxavier.ac.in
The CPU control circuitry automatically proceed to fetch and execute instruction, one at a time in
the order of the increasing addresses. This is called straight line sequencing.
20.Wrie down the MIPS Assembly language notation for arithmetic operations.
Categor Instruction
y
add
Arithme subtract
tic
add
immediate
Example
Meanin
g + $s3
add
$s1 = $s2
sub
$s1,$s2,$s $s1 = $s2 $s3
addi
$s1 = $s2 + 20
$s1,$s2,$s
3
$s1,$s2,20
3
Commen
ts
Three register
operands
Three register operands
Used to add constants
21.Wrie down the MIPS Assembly language notation for data transfer operations.
Categor Instruction
Example
y
load word
lw
store word
sw
$s1,20($s2
load half
lh
$s1,20($s2
)
load
half lhu
$s1,20($s2
)
store
half
sh
unsigned
$s1,20($s2
)
load
byte
lb
$s1,20($s2
)
Data
load
byte lbu
$s1,20($s2
)
trans
store
byte
sb
unsigned
$s1,20($s2
)
fer
load
linked ll
$s1,20($s2
)
store
sc
word
$s1,20($s2
)
load
upper
lui
condition.
$s1,20($s2
)
immed.
$s1,20
word
)
Meanin
Commen
g
ts memor y to
$s1 = Memor
y[$s2 + Word from
Word
from
register to
Memor
y[$s2
+
20]
=
register
20]
$s1 = Memor y[$s2 + Halfword
memor y memor y to
$s1
register memor y to
20] = Memor y[$s2 + Halfword
register
to
Memor
y[$s2 + 20] = Halfword
register
20]
memor y to
$s1 = Memor y[$s2 + Byte
memorfrom
y
from memor y to
$s1
register
20] = Memor y[$s2 + Byte
from register to
Memor
y[$s2 + 20] = Byte
register
20]
$s1 = Memor y[$s2 + Load
memorword
y as 1st half of
Store
Memory[$s2+20]=$s
atomicword
swapas 2nd half of
20]
Loads
in upper
atomic constant
swap
1;$s1=0
1
$s1
= 20 or
* 216
16 bits
22.Wrie down the MIPS Assembly language notation for Logical operations.
Categor Instruction
y
and
or
nor
and
immediate
Example
and
or
$s1,$s2,$s
nor
$s1,$s2,$s
3
andi
$s1,$s2,$s
3
$s1,$s2,20
3
$s1
$s1
$s1
$s1
Meanin
g & $s3
= $s2
= $s2 | $s3
= ~ ($s2 | $s3)
= $s2 & 20
5
Comme
nts operands;
Three reg.
Three
operands;
by-bit reg.
AND
Three
by-bit reg.
OR operands;
Bit-by-bit
by-bit NOR AND
with constant
bitbitbitreg
http://www.francisxavier.ac.in
Meanin
Comment
g $s2) go to Equal test;
s
PC-relative
if ($s1 ==
branch
PC + 4 + 100
Not equal test;
PCif ($s1!= $s2) go to
relative
PC + 4 + 100
if ($s2 < $s3) $s1 = Compare less than; for
beq, bne
1;
Compare
less
than
if
($s2
<
$s3)
$s1
=
else $s1 = 0
unsigned
1;
less
than
if
($s2
else
$s1< =20)
0 $s1 = 1; Compare
constant
else $s1 = 0
if ($s2 < 20) $s1 = 1; Compare less than
constant unsigned
else $s1 = 0
24.Wrie down the MIPS Assembly language notation for Unconditional branch operations.
Categor
y
Instruction
jump
Example
j
2500
$ra
2500
Meanin
g
go to 10000
Commen
ts
Jump to target address
Immediate mode
Register mode
Absolute mode
Indirect mode
Index mode
Base with index
Base with index and offset
Relative mode
Auto-increment mode
Auto-decrement mode
Mode
Register mode
Absolute mode
Assembler Syntax
Addressing Function
Ri
LOC
EA=Ri
EA=LOC
It places the value 200 in the register R0.The immediate mode used to specify the value of source
operand.
In assembly language, the immediate subscript is not appropriate so # symbol is used. It can be rewritten as
Move #200,R0
Assembly Syntax:
Addressing Function
Immediate #value
Operand =value
Assembler Syntax
Addressing Function
Ri , LOC
EA=[Ri] or EA=[LOC]
X(Ri)
Where X denotes the constant value contained in the instruction
Ri It is the name of the register involved.
The Effective Address of the operand is,
EA=X + [Ri]
8
http://www.francisxavier.ac.in
The index register R1 contains the address of a new location and the value of X defines an
offset(also called a displacement).
To find operand,
First go to Reg R1 (using address)-read the content from R1-1000
Add the content 1000 with offset 20 get the result.
1000+20=1020
Here the constant X refers to the new address and the contents of index register define the
offset to the operand.
The sum of two values is given explicitly in the instruction and the other is stored in register.
Eg: Add 20(R1) , R2 (or) EA=>1000+20=1020
Index Mode
Assembler Syntax
Addressing Function
Index
Base with Index
Base with Index and offset
EA=[Ri]+X
EA=[Ri]+[Rj]
EA=[Ri]+[Rj] +X
X(Ri)
(Ri,Rj)
X(Ri,Rj)
Assembler Syntax
Relative
X(PC)
Addressing Function
EA=[PC]+X
Assembler syntax
Auto-increment
(Ri)+
Addressing Function
EA=[Ri]; Increment Ri
Assembler Syntax
-(Ri)
Addressing Function
EA=[Ri]; Decrement Ri
34. Write the formula for CPU execution time for a program.
35. Write the formula for CPU clock cycles required for a program.
11
http://www.francisxavier.ac.in
12
http://www.francisxavier.ac.in
3) Perform addition/subtraction on the mantissa and determine the sign of the result
4) Normalize the resulting value, if necessary.
11. Write the multiply rule for floating point numbers.
1) Add the exponent and subtract 127.
2) Multiply the mantissa and determine the sign of the result .
3) Normalize the resulting value , if necessary.
12. What is the purpose of guard bits used in floating point arithmetic
Although the mantissa of initial operands are limited to 24 bits, it is important to retain extra bits,
called as guard bits.
13. What are the ways to truncate the guard bits?
There are several ways to truncate the guard bits:
1) Chooping
2) Von Neumann rounding
3) Rounding
14. Define carry save addition(CSA) process.
Instead of letting the carries ripple along the rows, they can be saved and introduced into the next
roe at the correct weighted position. Delay in CSA is less than delay through the ripple carry adder.
15. What are generate and propagate function?
The generate function is given by
Gi=xiyi
and
14
http://www.francisxavier.ac.in
In single precision numbers when an exponent is less than -126 then we say that an underflow has
occurred. In single precision numbers when an exponent is less than +127 then we say that an
overflow has occurred.
18. What are the difficulties faced when we use floating point arithmetic?
Mantissa overflow: The addition of two mantissas of the same sign may result in a carryout of the
most significant bit
Mantissa underflow: In the process of aligning mantissas ,digits may flow off the right end of the
mantissa.
Exponent overflow: Exponent overflow occurs when a positive exponent exceeds the maximum
possible value.
Exponent underflow: It occurs when a negative exponent exceeds the maximum possible
exponent value.
19.In conforming to the IEEE standard mention any four situations under which a processor sets
exception flag.
Underflow: If the number requires an exponent less than -126 or in a double precision, if the
number requires an exponent less than -1022 to represent its normalized form the underflow occurs.
Overflow: In a single precision, if the number requires an exponent greater than -127 or in a
double precision, if the number requires an exponent greater than +1023 to represent its normalized form
the underflow occurs.
Divide by zero: It occurs when any number is divided by zero.
Invalid: It occurs if operations such as 0/0 are attempted.
20. Why floating point number is more difficult to represent and process than integer?(CSE
May/June 2007)
An integer value requires only half the memory space as an equivalent.IEEEdouble-precision
floatingpoint value. Applications that use only integer based arithmetic will therefore also have
significantly smaller memory requirement
A floating-point operation usually runs hundreds of times slower than an equivalent integer based
arithmetic operation.
21.Give the booths recoding and bit-pair recoding of the computer.
1000111101000101(CSE May/June 2006)
Booths recoding
1
15
http://www.francisxavier.ac.in
-1
+1
-1 +1 -1
+1 -1
+1 -1
Bit-Pair recoding:
-2
+1
-1
+1
+1
22.Draw the full adder circuit and give the truth table (CSE May/June 2007)
Inputs
A
0
0
0
0
1
1
1
1
B
0
0
1
1
0
0
1
1
C
0
1
0
1
0
1
0
1
Outputs
Carry
0
0
0
1
0
1
1
1
Sum
0
1
1
0
1
0
0
1
23. Add 610 to 710 in binary and Subtract 610 from 710 in binary.
Addition,
Subtraction directly,
16
http://www.francisxavier.ac.in
UNIT-3
PROCESSOR AND CONTROL UNIT
1.Define pipelining.
Pipelining is a technique of decomposing a sequential process into sub operations with each sub
process being executed in a special dedicated segment that operates concurrently with all other segments.
2.Define parallel processing.
Parallel processing is a term used to denote a large class of techniques that are used to provide
simultaneous data-processing tasks for the purpose of increasing the computational speed of a computer
system. Instead of processing each instruction sequentially as in a conventional computer, a parallel
processing system is able to perform concurrent data
18
http://www.francisxavier.ac.in
example, this may be a result of miss in cache, requiring the instruction to be fetched from the
main memory. Such hazards are called as Instruction hazards or Control hazards.
9.Define Structural hazards?
The structural hazards is the situation when two instructions require the use of a given
hardware resource at the same time. The most common case in which this hazard may arise is
access to memory.
10. What are the classification of data hazards?
Classification of data hazard: A pair of instructions can produce data hazard by referring
reading or writing the same memory location. Assume that i is executed before J. So, the hazards
can be classified as,
1. RAW hazard
2. WAW hazard
3. WAR hazard
11.Define RAW hazard : ( read after write)
Instruction j tries to read a source operand before instruction i writes it.
12. Define WAW hazard :( write after write)
Instruction j tries to write a source operand before instruction i writes it.
13.Define WAR hazard :( write after read)
Instruction j tries to write a source operand before instruction i reads it.
14. How data hazard can be prevented in pipelining?
Data hazards in the instruction pipelining can prevented by the following techniques.
a)Operand Forwarding
b)Software Approach
dependent not only on the choice of instruction, but also on the order in which they appear in the program.
The compiler may rearrange program instruction to achieve better performance of course, such changes
must not affect of the result of the computation.
16. How addressing modes affect the instruction pipelining?
Degradation of performance is an instruction pipeline may be due to address dependency where
operand address cannot be calculated without available informatition needed by addressing mode for e.g.
An instructions with register indirect mode cannot proceed to fetch the operand if the previous
instructions is loading the address into the register. Hence operand access is delayed degrading the
performance of pipeline.
17. What is locality of reference?
Many instruction in localized area of the program are executed repeatedly during some time
period and the remainder of the program is accessed relatively infrequently .this is referred as locality of
reference.
18. What is the need for reduced instruction chip?
Relatively few instruction types and addressing modes.
Fixed and easily decoded instruction formats.
Fast single-cycle instruction execution.
Hardwired rather than micro programmed control
19. Define memory access time?
The time that elapses between the initiation of an operation and completion of that operation ,for
example ,the time between the READ and the MFC signals .This is Referred to as memory access time.
20. Define memory cycle time.
The minimum time delay required between the initiations of two successive memory operations,
for example, the time between two successive READ operations.
21.Define Static Memories.
Memories that consist of circuits capable of retaining the state as long as power is applied are
known as static memories.
22. List out Various branching technique used in micro program control unit?
a) Bit-Oring
b) Using Conditional Variable
c) Wide Branch Addressing
21
http://www.francisxavier.ac.in
23
http://www.francisxavier.ac.in
UNIT-4
PARALLELISM
1. What is Instruction Level Parallelism? (NOV/DEC 2011)
Pipelining is used to overlap the execution of instructions and improve performance. This potential
overlap among instructions is called instruction level parallelism (ILP).
2. Explain various types of Dependences in ILP.
Data Dependences
Name Dependences
Control Dependences
3. What is Multithreading?
Multithreading allows multiple threads to share the functional units of a single processor in an
overlapping fashion. To permit this sharing, the processor must duplicate the independent state of
each thread.
4. What are multiprocessors? Mention the categories of multiprocessors?
Multiprocessor are used to increase performance and improve availability. The different categories are
SISD, SIMD, MIMD.
5. What are two main approaches to multithreading?
Fine-grained multithreading
Coarse-grained multithreading
If a thread gets a lot of cache misses, the other thread(s) can continue, taking advantage of the
unused computing resources, which thus can lead to faster overall execution, as these resources would
have been idle if only a single thread was executed. If a thread cannot use all the computing resources of
the CPU (because instructions depend on each other's result), running another thread permits to not leave
these idle.
If several threads work on the same set of data, they can actually share their cache, leading to
better cache usage or synchronization on its values.
8. Write the disadvantages of Multithreading.
Multiple threads can interfere with each other when sharing hardware resources such as caches
or translation lookaside buffers (TLBs). Execution times of a single-thread are not improved but can be
degraded, even when only one thread is executing. This is due to slower frequencies and/or additional
pipeline stages that arc necessary to accommodate thread-switching hardware. Hardware support for
Multithreading is more visible to software, thus requiring more changes to both application programs and
operating systems than Multi processing.
9. What is CMT?
Chip multiprocessors - also called multi-core microprocessors or CMPs for short - are now the
only way to build high-performance microprocessors, for a variety of reasons. Large uniprocessors are no
longer scaling in performance, because it is only possible to extract a limited amount of parallelism from
a typical instruction stream using conventional superscalar instruction issue techniques. In addition, one
cannot simply ratchet up the clock speed on today's processors, or the power dissipation will become
prohibitive in all but water-cooled systems.
10. What is SMT?
Simultaneous multithreading, often abbreviated as SMT, is a technique for improving the overall
efficiency of superscalar CPUs with hardware multithreading. SMT permits multiple independent threads
of execution to better utilize the resources provided by modern processor architectures.
11. Write the advantages of CMP?
CMPs have several advantages over single processor solutions energy and silicon area efficiency
i. By Incorporating smaller less complex cores onto a single chip
ii. Dynamically switching between cores and powering down unused cores
iii. Increased throughput performance by exploiting parallelism
iv.Multiple computing resources can take better advantage of instruction, thread, and process level
12. What are the Disadvantages of SMT?
Simultaneous multithreading cannot improve performance if any of the shared resources are
limiting bottlenecks for the performance. In fact, some applications run slower when simultaneous
multithreading is enabled. Critics argue that it is a considerable burden to put on software developers that
they have to test whether simultaneous multithreading is good or bad for their application in various
situations and insert extra logic to turn it off if it decreases performance.
13. What are the types of Multithreading?
Block multi-threading
25
http://www.francisxavier.ac.in
Interleaved multi-threading
17. Draw the basic structure of Basic Structure of a Symmetric Shared Memory Multiprocessor
Superscalar is an advanced pipelining technique that enables the processor to execute more than one
instruction per clock cycle by selecting them during execution. Dynamic multiple-issue processors are
also known as superscalar processors, or simply superscalars.
27. What is meant by loop unrolling?
An important compiler technique to get more performance from loops is loop unrolling, where multiple
copies of the loop body are made. After unrolling, there is more ILP available by overlapping instructions
from different iterations.
28. What is meant by anti-dependence? How is it removed?
Anti-dependence is an ordering forced by the reuse of a name, typically a register, rather than by a true
dependence that carries a value between two instructions. It is also called as name dependence.
Register renaming is the technique used to remove anti-dependence in which the registers are renamed by
the compiler or hardware.
29. What is the use of reservation station and reorder buffer?
Reservation station is a buffer within a functional unit that holds the operands and the operation.
Reorder buffer is the buffer that holds results in a dynamically scheduled processor until it is safe to store
the results to memory or a register.
30. Differentiate in-order execution from out-of-order execution.
Out-of-order execution is a situation in pipelined execution when an instruction is blocked from executing
does not cause the following instructions to wait. It preserves the data flow order of the program.
In-order execution requires the instruction fetch and decode unit to issue instructions in order, which
allows dependences to be tracked, and requires the commit unit to write results to registers and memory in
program fetch order. This conservative mode is called in-order commit.
Since SMT relies on the existing dynamic mechanisms, it does not switch resources every cycle. Instead,
SMT is always executing instructions from multiple threads, leaving it up to the hardware to associate
instruction slots and renamed registers with their proper threads.
35. What are the three multithreading options?
The three multithreading options are:
1. A superscalar with coarse-grained multithreading
2. A superscalar with fine-grained multithreading
3. A superscalar with simultaneous multithreading
36. Define SMP
Shared memory multiprocessor (SMP) is one that offers the programmer a single physical address space
across all processors - which is nearly always the case for multicore chips. Processors communicate
through shared variables in memory, with all processors capable of accessing any memory location via
loads and stores.
UNIT-5
MEMORY AND I/O SYSTEMS
1. What are the multimedia applications which use caches?
Some Multimedia application areas where cache is extensively used are
*Multimedia Entertainment
*Education
*Office Systems
*Audio and video Mail
29
http://www.francisxavier.ac.in
14. What are the two types of latencies associated with storage?
The latency associated with storage is divided into 2 categories
1. Seek Latencies which can be classified into Overlapped seek,Mid transfer seek and Elevator seek
31
http://www.francisxavier.ac.in
2. Rotational Latencies which can be reduced either by Zero latency read or Write and Interleave factor.
15. What do you mean by Disk Spanning?
Disk spanning is a method of attaching drives to a single host uadapter. All drives appear as a
single contiguous logical unit. Data is written to the first drive first and when the drive is full, the
controller switches to the second drive, then the second drive writes until its full.
16. What is SCSI?
Small computer system interface can be used for all kinds of devices including RAID storage
subsystems and optical disks for large- volume storage applications.
17. Define the term RELIABILITY
Means feature that help to avoid and detect such faults. A realible system does not silently
continue and delivery result that include interrected and corrupted data, instead it corrects the corruption
when possible or else stops
18.Define the term AVAILABLITY:
Means features that follow the systerm to stay operational even offen faults do occur. A highly
available systerm could dis able do the main functioning portion and continue operating at the reduced
capacity
19. How the interrupt is handled during exception?
* cpu identifies source of interrupt
* cpu obtains memory address of interrupt handles
* pc and other cpu status information are saved
* Pc is loaded with address of interrupt handler and handling program to handle it
20. What is IO mapped input output?
A memory reference instruction activated the READ M (or)WRITE M control line and does not
affect the IO device. Separate IO instruction are required to activate the READ IO and WRITE IO lines
,which cause a word to be transferred between the address aio port and the CPU. The memory and IO
address space are kept separate.
21.Specify the three types of the DMA transfer techniques?
--Single transfer mode(cyclestealing mode)
--Block Transfer Mode(Brust Mode)
--Demand Transfer Mode --Cascade Mode
32
http://www.francisxavier.ac.in
27.What is a bus?
A collection of wires that connects several devices is called a bus.
28.Define word length?
Each group of n bits is referred to as a word of information and n is called the word length.
29. Why program controlled I/O is unsuitable for high-speed data transfer?
In program controlled i/o considerable overhead is incurred..because several program instruction
have to be executed for each data word transferred between the external devices and MM.Many high
speed peripheral; devices have a synchronous modes of operation.that is data transfer are controlled by a
clock of fixed frequency, independent of the cpu.
33
http://www.francisxavier.ac.in
35
http://www.francisxavier.ac.in
36
http://www.francisxavier.ac.in
47. How many total bits are required for a direct-mapped cache with 16 KiB of
data and 4-word blocks, assuming a 32-bit address?
38
http://www.francisxavier.ac.in
PART-B
16 MARK QUESTIONS
UNIT-I
OVERVIEW & INSTRUCTIONS
1. Explain the Eight ideas of the Computer architects in detail.(8)
The eight great ideas in computer architecture are:
1. Design for Moores Law
2. Use Abstraction to Simplify Design
3. Make the Common Case Fast
4. Performance via Parallelism
5. Performance via Pipelining
6. Performance via Prediction
39
http://www.francisxavier.ac.in
7. Hierarchy of Memories
8. Dependability via Redundancy
2. Explain the components of a computer with the block diagram in detail.(16)
3. Explain the technologies for building computer over time with a neat graph.(6)
Electronics technology continues to evolve
Increased capacity and performance
Reduced cost
40
http://www.francisxavier.ac.in
Yield
1
(1 (Defects per area Die area/2))2
Performance X Performance Y
Execution timeY Execution timeX n
41
http://www.francisxavier.ac.in
Program
Instruction Clock cycle
Performance depends on
Algorithm: affects IC, possibly CPI
Programming language: affects IC, CPI
Compiler: affects IC, CPI
Instruction set architecture: affects IC, CPI, Tc
42
http://www.francisxavier.ac.in
5.(i)Prove that how performance and execution are inverse to each other.(2)
(ii) If computer A runs a program in 10 seconds and computer B runs the same program in 15 seconds,
how much faster is A than B?(2)
Performance X Performance Y
Execution timeY Execution timeX n
(iii)Write the formula to calculate the CPU execution time for a program.(2)
CPU Time CPU Clock Cycles Clock Cycle Time
(v) Write the formula to calculate the classic CPU Performance equation.(2)
CPU Time
Program
Instruction Clock cycle
6. Explain how clock rate and power are related to each other in microprocessor over years with a
neat graph.(6)
0.854 0.52
2
Pold
Cold Vold Fold
7. Explain the need to switch from uniprocessors to multiprocessors and draw the performance
chart for processors over years. (6)
Multicore microprocessors
More than one processor per chip
Requires explicitly parallel programming
Compare with instruction level parallelism
Hardware executes multiple instructions at once
Hidden from the programmer
Hard to do
Programming for performance
Load balancing
Optimizing communication and synchronization
8. Explain the basic instruction types with examples.(6)
Instructions fall into several broad categories that you should be familiar with:
Data movement.
Arithmetic.
Boolean.
Bit manipulation.
44
http://www.francisxavier.ac.in
I/O.
Control transfer.
Special purpose.
Direct addressing is where the address of the data is given in the instruction.
Indirect addressing gives the address of the address of the data in the instruction.
Register indirect addressing uses a register to store the address of the address of the data.
45
http://www.francisxavier.ac.in
Indexed addressing uses a register (implicitly or explicitly) as an offset, which is added to the
address in the operand to determine the effective address of the data.
Based addressing is similar except that a base register is used instead of an index register.
The difference between these two is that an index register holds an offset relative to the address
given in the instruction, a base register holds a base address where the address field represents a
displacement from this base.
Indirect indexed.
Base/offset.
Self-relative
46
http://www.francisxavier.ac.in
UNIT-2
ARITHMETIC OPERATIONS
1.Explain the design of ALU in detail.(16)
47
http://www.francisxavier.ac.in
Multiplica
Multiplier
Produc
1000
1001
1000
0000
0000
1000
01001000
48
http://www.francisxavier.ac.in
0 => place 0
( 0 x multiplicand)
( 1 x multiplicand)
successive refinement
Explanation
Example
Op
Begins run of 1s
000111100
sub
Middle of run of 1s
0001111000
none
End of run of 1s
0001111000
add
Middle of run of 0s
0001111000
none
Replace a string of 1s in multiplier with an initial subtract when we first see a one and then
later add for the bit after the last one
Booth's algorithm can be implemented by repeatedly adding (with ordinary unsigned binary addition) one
of two predetermined values A and S to a product P, then performing a rightward arithmetic shift on P.
Let m and r be the multiplicand and multiplier, respectively; and let x and y represent the number of bits
in m and r.
1. Determine the values of A and S, and the initial value of P. All of these numbers should have a
length equal to (x + y + 1).
49
http://www.francisxavier.ac.in
1. A: Fill the most significant (leftmost) bits with the value of m. Fill the remaining (y + 1)
bits with zeros.
2. S: Fill the most significant bits with the value of (m) in two's complement notation. Fill
the remaining (y + 1) bits with zeros.
3. P: Fill the most significant x bits with zeros. To the right of this, append the value of r. Fill
the least significant (rightmost) bit with a zero.
2. Determine the two least significant (rightmost) bits of P.
1. If they are 01, find the value of P + A. Ignore any overflow.
2. If they are 10, find the value of P + S. Ignore any overflow.
3. If they are 00, do nothing. Use P directly in the next step.
4. If they are 11, do nothing. Use P directly in the next step.
3. Arithmetically shift the value obtained in the 2nd step by a single place to the right. Let P now
equal this new value.
4. Repeat steps 2 and 3 until they have been done y times.
5. Drop the least significant (rightmost) bit from P. This is the product of m and r.
50
http://www.francisxavier.ac.in
5.Derive and explain an algorithm for adding and subtracting two floating point binary
numbers.(8)
ADDITION
example on decimal value given in scientific notation:
3.25 x 10 ** 3
+ 2.63 x 10 ** -1
----------------first step: align decimal points
second step: add
3.25 x 10 ** 3
+ 0.000263 x 10 ** 3
-------------------3.250263 x 10 ** 3
(presumes use of infinite precision, without regard for accuracy)
third step: normalize the result (already normalized!)
51
http://www.francisxavier.ac.in
SUBTRACTION
like addition as far as alignment of radix points
then the algorithm for subtraction of sign mag. numbers takes over.
before subtracting,
compare magnitudes (don't forget the hidden bit!)
change sign bit if order of operands is changed.
don't forget to normalize number afterward.
Step 1: Calculate difference d of the two exponents - d=|E1 - E2|
Step 2: Shift significand of smaller number by d base- positions to the right
Step 3: Add aligned significands and set exponent of result to exponent of larger operand
Step 4: Normalize resultant significand and adjust exponent if necessary
Step 5: Round resultant significand and adjust exponent if necessary
52
http://www.francisxavier.ac.in
(10)
9/4
The restoring-division algorithm:
S1: DO n times
Shift A and Q left one binary position.
Subtract M from A, placing the answer back in A.
If the sign of A is 1, set q0 to 0 and add M back to A (restore A); otherwise, set q0 to 1.
The non-restoring division algorithm:
S1: Do n times
If the sign of A is 0, shift A and Q left one binary position and subtract M from A; otherwise, shift A
and Q left and add M to A.
S2: If the sign of A is 1, add M to A.
53
http://www.francisxavier.ac.in
Time
Clock cy cle
F1
D1
E1
W1
F2
D2
E2
W2
F3
D3
E3
W3
F4
D4
E4
Instruction
I1
I2
I3
I4
W4
UNIT-3
D : Decode
instruction
and f etch
operands
F : Fetch
instruction
B1
E: Execute
operation
B2
W : Write
results
B3
54
http://www.francisxavier.ac.in
2.State and explain the different types of hazards that can occur in a pipeline.
Data Hazards
Instruction Hazards
Generic implementation
l
use the program counter (PC) to supply the instruction address and fetch the
instruction from memory (and update the PC)
All instructions (except j) use the ALU after reading the registers
Clocking Methodologies
The clocking methodology defines when signals can be read and when they are written
l
An edge-triggered methodology
sending the fetched instructions opcode and function field bits to the control unit
compute memory address by adding the base register (read from the Register File during
decode) to the 16-bit signed-extended offset field in the instruction
store value (read from the Register File during decode) written to the Data Memory
load value, read from the Data Memory, written to the Register File
4.What is data hazard?Explain the methods for dealing with the data hazards
The use of the result of the SUB instruction in the next three instructions causes a data hazard, since
the register $2 is not written until after those instructions read it
Stalling
Forwarding:
connect new value directly to next stage
Reordering
5.Describe the data and control path techniques in pipelining.(10)
Data Path:
The whole point of pipelining is to allow multiple instructions to execute at the same time.
Clock cycle
lw
$t0, 4($sp)
sub
$a1
$v0,
$a0,
and
$t3
$t1,
$t2,
or
$s2
$s0,
$s1,
add
$0
$t5,
$t6,
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
WB
Thus, like the single-cycle datapath, a pipelined processor will need to duplicate hardware
elements that are needed several times in the same clock cycle.
57
http://www.francisxavier.ac.in
Control Path:
The WAR hazard here is on register X4 in the third instruction. It arises because instructions
which are held up by a RAW hazard are nevertheless issued to their arithmetic unit, where they
wait for their operands. Thus the second instruction can be issued immediately after the first, but it
is held up in the add unit waiting for its operands because of the WAR hazard on X3 which cannot
be read until the divide unit completes its operation. The third instruction can likewise be issued
immediately after the second and it can start its operation. The floating-point add operation
completes in very much less time than division, however, and the add unit is therefore ready to
store its result in X4 before the multiply unit has read the current value in X4. Thus there has to be
an interlock between the multiply and add instructions to prevent the add instruction from writing
to X4 before the multiply instruction has read its current value
For example:
add $1, $2, $1;
sw $3, 400($1);
add $5, $1, $2;
59
http://www.francisxavier.ac.in
Interrupts
Caused by external events:
Network, Keyboard, Disk I/O, Timer
Page fault - virtual memory
System call - user request for OS action
Asynchronous to program execution
May be handled between instructions
Simply suspend and resume user program
Traps
Caused by internal events
Exceptional conditions (overflow)
Undefined Instruction
Hardware malfunction
Usually Synchronous to program execution
Condition must be remedied by the handler
Instruction may be retried or simulated and program continued or program may be
aborted
UNIT 4
Instruction Level Parallelism
1. Explain Instruction level parallelism
Architectural technique that allows the overlap of individual machine operations ( add, mul,
load, store )
60
http://www.francisxavier.ac.in
add R3 R3, 1
add R3 R3, 1
add R4 R3, R2
add R4 R4, R2
store [R4] R0
ILP
Overlap individual machine operations (add, mul, load) so that they execute in parallel
Transparent to the user
Goal: speed up execution
ILP Challenges
In order to achieve parallelism we should not have dependences among instructions which are
executing in parallel:
H/W terminology Data Hazards ( RAW, WAR, WAW)
S/W terminology Data Dependencies
Types of Dependencies
Name dependencies
Output dependence
Anti-dependence
Data True dependence
Control Dependence
Resource Dependence
Having separate processors getting separate chunks of the program ( processors programmed to
do so)
Nontransparent to the user
Goal: speed up and quality up
62
http://www.francisxavier.ac.in
63
http://www.francisxavier.ac.in
where
M=
Memory
MIMD (Multiple-Instruction streams, Multiple-Data streams)
Each processor has a separate program.
An instruction stream is generated from each program.
Each instruction operates on different data.
This last machine type builds the group for the traditional multi-processors. Several processing
units operate on multiple-data streams.
64
http://www.francisxavier.ac.in
MISD (Multiple-Instruction streams, Singe-Data stream)
Each processor executes a different sequence of instructions.
In case of MISD computers, multiple processing units operate on one single-data stream .
In practice, this kind of organization has never been used
UNIT 5
Memory and I/O Syetem
Synchronous DRAM (SDRAM): Ability to transfer a burst of data given a starting address
and a burst length suitable for transferring a block of data from main memory to cache.
Page Mode DRAM: All bits on the same ROW (Spatial Locality)
Later brother: Burst EDO (CAS toggle used to get next addr)
A cache hit occurs if the cache contains the data that were looking for. Hits are good, because the
cache can return the data much faster than main memory.
A cache miss occurs if the cache does not contain the requested data. This is bad, since the CPU
must then wait for the slower main memory.
Memory locations 0, 4, 8
and 12 all map to cache
block 0.
Addresses 1, 5, 9 and 13
map to cache block 1, etc.
Set-Associative Cache
68
http://www.francisxavier.ac.in
5. Discuss the methods used to measure and improve the performance of the cache.
Performance is always a key issue for caches.
We consider improving cache performance by:
(1) reducing the miss rate, and
(2) reducing the miss penalty.
For (1) we can reduce the probability that different memory blocks will contend for the
same cache location.
For (2), we can add additional levels to the hierarchy, which is called multilevel caching.
We can determine the CPU time as
CPUTime (CCCPUExecution CCMemoryStalls ) tCC
The memory-stall clock cycles come from cache misses.
It can be defined as the sum of the stall cycles coming from writes + those coming from
reads:
Memory-Stall CC = Read-stall cycles + Write-stall cycles, where
Re ads
Re ad Miss Rate Re ad Miss Penalty
Pr ogram
Writes
http://www.francisxavier.ac.in
Re ad stall cycles
6. Explain the virtual memory address translation and TLB withnecessary diagram.
Virtual memory address translation
In a virtual memory system, the program memory is divided into fixed sized pages and allocated in fixed
sized physical memory frames. The pages do not have to be contiguous in memory. A page table keeps
track of where each page is located in physical memory. This allows the operating system to load a
program of any size into any available frames. Only the currently used pages need to be loaded. Unused
pages can remain on disk until they are referenced. This allows many large programs to be executed on a
relatively small memory system. A resident flag in the page table indicates whether or not the page is in
memory. The page table also includes several other flags to keep track of memory usage. A use flag is set
whenever the page is referenced. A dirty bit is set whenever the page is changed to inform the operating
system that the page in memory is different than the page on disk.
TLB
Just like any other cache, the TLB can be organized as fully associative, set associative, or direct mapped
TLB access time is typically smaller than cache access time (because TLBs are much smaller than
caches)
70
http://www.francisxavier.ac.in
TLBs are typically not more than 128 to 256 entries even on high end machines
If the page is loaded into main memory, then the TLB miss can be handled (in hardware or
software) by loading the translation information from the page table into the TLB
Takes 10s of cycles to find and load the translation info into the TLB
If the page is not in main memory, then its a true page fault
Takes 1,000,000s of cycles to service a page fault
TLB misses are much more frequent than true page faults
7. Draw the typical block diagram of a DMA controller and explain how it is
used for direct data transfer between memory and peripherals.
Concentrates on data transfer between the processor and I/O devices. Data are transferred by
executing instructions such as
Move DATAIN R0
An instruction to transfer input or output data is executed only after the processor determines that
the I/O devices is ready. To do this the processor either polls a status flag in the device interface or waits
for the device to send an interrupt request. In either case considerable over head is incurred, because
several program instructions must be executed for each data word transferred. In addition to polling the
status register of the device, instructions are needed for incrementing the memory address and keeping
tract of the word count. When interrupts are used there is the additional overhead associated with saving
and restoring the program counter and other state information.
DMA Controller
Although a DMA controller can transfer data without intervention by the processor, its operation
must be under the control of a program executed by the processor. To initiate the transfer of a block of
words, the processor sends the starting address. The number of words in the block, and the direction of
the transfer. On receiving this information, the DMA controller proceeds to perform the requested
operation. When the entire block has been transferred the controller informs the processor by raising an
interrupt signal.
While a DMA transfer is taking place the program that requested the transfer cannot continue, and
the processor can be used to execute another program. After the DMA transfer is completed the processor
can return to the program that requested the transfer.
71
http://www.francisxavier.ac.in
73
http://www.francisxavier.ac.in
74
http://www.francisxavier.ac.in