Sie sind auf Seite 1von 96

CS8491-COMPUTER ARCHITECTURE

QUESTION BANK( 2 Marks & 13 Marks)


CS8491 COMPUTER ARCHITECTURE L T P C

3 0 03
OBJECTIVES:

• To learn the basic structure and operations of a computer.


• To learn the arithmetic and logic unit and implementation of fixed-point and floating point
arithmetic unit.
• To learn the basics of pipelined execution.
• To understand parallelism and multi-core processors.
• To understand the memory hierarchies, cache memories and virtual memories.
• To learn the different ways of communication with I/O devices.

UNIT I BASIC STRUCTURE OF A COMPUTER SYSTEM 9


Functional Units – Basic Operational Concepts – Performance – Instructions: Language of the Computer
– Operations, Operands – Instruction representation – Logical operations – decision making – MIPS
Addressing.

UNIT II ARITHMETIC FOR COMPUTERS 9


Addition and Subtraction – Multiplication – Division – Floating Point Representation – Floating Point
Operations – Subword Parallelism

UNIT III PROCESSOR AND CONTROL UNIT 9


A Basic MIPS implementation – Building a Datapath – Control Implementation Scheme – Pipelining –
Pipelined datapath and control – Handling Data Hazards & Control Hazards – Exceptions.

UNIT IV PARALLELISIM 9

Parallel processing challenges – Flynn„s classification – SISD, MIMD, SIMD, SPMD, and Vector
Architectures - Hardware multithreading – Multi-core processors and other Shared Memory
Multiprocessors - Introduction to Graphics Processing Units, Clusters, Warehouse Scale Computers and
other Message-Passing Multiprocessors.

UNIT V MEMORY & I/O SYSTEMS 9

Memory Hierarchy - memory technologies – cache memory – measuring and improving cache
performance – virtual memory, TLB„s – Accessing I/O Devices – Interrupts – Direct Memory Access –
Bus structure – Bus operation – Arbitration – Interface circuits - USB.

TOTAL : 45 PERIODS

OUTCOMES:
On Completion of the course, the students should be able to:
 Understand the basics structure of computers, operations and instructions.
 Design arithmetic and logic unit.
 Understand pipelined execution and design control unit.
 Understand parallel processing architectures.
 Understand the various memory systems and I/O communication.

TEXT BOOKS:
1. David A. Patterson and John L. Hennessy, Computer Organization and Design: The Hardware/Software
Interface, Fifth Edition, Morgan Kaufmann / Elsevier, 2014.

2. Carl Hamacher, Zvonko Vranesic, Safwat Zaky and Naraig Manjikian, Computer Organization and Embedded
Systems, Sixth Edition, Tata McGraw Hill, 2012.

PART-A-TWO MARK QUESTIONS AND ANSWERS


UNIT-I BASIC STRUCTURE OF A COMPUTER SYSTEM
1. Define Computer Architecture.
 It is concerned with the structure and behavior of the computer.
 It includes the information formats, the instruction set and techniques for addressing memory.
2. Define Computer Organization.
 It describes the function and design of the various units of digital computer that store and process
information.
 It refers to the operational units and their interconnections that realize the architectural
specifications.
3. What are the components of a computer.
 Input unit
 Memory unit
 Arithmetic and Logic Unit
 Output unit
 Control unit

4. Draw the block diagram of computer.


5. What is Execution time/Response time?
Response time also called execution time. The total time required for the computer to complete a
task, including disk accesses, memory accesses, I/O activities, operating system overhead, CPU
execution time, and so on.

6. What is CPU execution time, user CPU time and system CPU time?
CPU time : The actual time the CPU spends computing for a specific task.
user CPU time: The CPU time spent in a program itself.
system CPU time: The CPU time spent in the operating system performing tasks on behalf
the program.

7.What is clock cycle and clock period?


clock cycle :The time for one clock period, usually of the processor clock, which runs at a
constant rate.
clock period :The length of each clock cycle.
6. Define CPI
The term Clock Cycles Per Instruction Which is the average number of clock cycles each
instruction takes to execute, is often abbreviated as CPI.
7. State and explain the performance equation?
N denotes number of machine Instructions, Suppose that the average number of basic steps needed
to execute one machine instruction is S, where each basic step is completed in one clock cycle. If the
clock cycle rate is R cycles per second, the processor time is given by
T = (N x S) / R This is often referred to as the basic performance equation.
8. Define MIPS .
MIPS:One alternative to time as the metric is MIPS(Million Instruction Per Second)
MIPS=Instruction count/(Execution time x1000000).This MIPS measurement is also called Native MIPS
to distinguish it from some alternative definitions of MIPS.
9. Define MIPS Rate:
The rate at which the instructions are executed at a given time.
10. Define Throughput and Throughput rate.
 Throughput -The total amount of work done in a given time.
 Throughput rate-The rate at which the total amount of work done at a given time.
11. What are the various types of operations required for instructions?
• Data transfers between the main memory and the CPU registers
• Arithmetic and logic operation on data
• Program sequencing and control
• I/O transfers
12. What is a Program?
A program is a set of instructions that specify the operations, operands and the sequence by
which processing has to occur.
13. What is a Computer Instruction?
A Computer instruction is a binary code that specifies a sequence of micro operations for
the computer.
14. What is a Instruction Code?
An instruction code is a group of bits that instruct the computer to perform a specific
operation.
15. What is a Operation Code (Opcode)?
The operation code of an instruction is a group of bits that define operations as add,
subtract, multiply, shift and complement etc.
16. Define Instruction Format.
Instructions are represented as numbers .Therefore, entire programs can be stored in
memory to be read or written just like numbers(data).Thus simplifies software/Hardware of computer
systems. Each instruction is encoded in binary called machine code.
17. What are the Most Common Fields Of An Instruction Format?
 An operation code field that specifies the operation to be performed.
 An address field that designates, a memory address or register.
 A mode field that specifies the way the operand or the effective address is
determined
18. Explain the following the address instruction?
• Three-address instruction-it can be represented as
ADD A, B, C
Operands a,b are called source operand and c is called destination operand.
• Two-address instruction-it can be represented as
ADD A, B
• One address instruction-it can be represented as
LOAD A
ADD B
STORE C
19. What is the straight-line sequencing?
The CPU control circuitry automatically proceed to fetch and execute instruction, one at a time in
the order of the increasing addresses. This is called straight line sequencing.
20.Wrie down the MIPS Assembly language notation for arithmetic operations.
Category Instruction Example Meaning Comments
add add $s1,$s2,$s3 $s1 = $s2 + $s3 Three register operands
Arithmetic subtract sub $s1,$s2,$s3 $s1 = $s2 – $s3 Three register operands
add immediate addi $s1,$s2,20 $s1 = $s2 + 20 Used to add constants

21.Wrie down the MIPS Assembly language notation for data transfer operations.
Category Instruction Example Meaning Comments
load word lw $s1,20($s2) $s1 = Memor y[$s2 + 20] Word from memor y to register
store word sw $s1,20($s2) Memor y[$s2 + 20] = $s1 Word from register to memor y
load half lh $s1,20($s2) $s1 = Memor y[$s2 + 20] Halfword memor y to register
load half unsigned lhu $s1,20($s2) $s1 = Memor y[$s2 + 20] Halfword memor y to register
store half sh $s1,20($s2) Memor y[$s2 + 20] = $s1 Halfword register to memor y
Data load byte lb $s1,20($s2) $s1 = Memor y[$s2 + 20] Byte from memor y to register
transfer load byte unsigned lbu $s1,20($s2) $s1 = Memor y[$s2 + 20] Byte from memor y to register
store byte sb $s1,20($s2) Memor y[$s2 + 20] = $s1 Byte from register to memor y
load linked word ll $s1,20($s2) $s1 = Memor y[$s2 + 20] Load word as 1st half of atomic swap
store condition. sc $s1,20($s2) Memory[$s2+20]=$s1;$s1=0 or Store word as 2nd half of atomic swap
wordupper immed.
load lui $s1,20 $s1 = 20 * 216
1 Loads constant in upper 16 bits

22.Wrie down the MIPS Assembly language notation for Logical operations.
Category Instruction Example Meaning Comments
and and $s1,$s2,$s3 $s1 = $s2 & $s3 Three reg. operands; bit-by-bit AND
or or $s1,$s2,$s3 $s1 = $s2 | $s3 Three reg. operands; bit-by-bit OR
nor nor $s1,$s2,$s3 $s1 = ~ ($s2 | $s3) Three reg. operands; bit-by-bit NOR
Logical and immediate Bit-by-bit AND reg with constant
andi $s1,$s2,20 $s1 = $s2 & 20
or immediate ori $s1,$s2,20 $s1 = $s2 | 20 Bit-by-bit OR reg with constant
shift left logical sll $s1,$s2,10 $s1 = $s2 << 10 Shift left by constant
shift right logical srl $s1,$s2,10 $s1 = $s2 >> 10 Shift right by constant

23.Wrie down the MIPS Assembly language notation for conditional branch operations.
Category Instruction Example Meaning Comments
branch on equal beq $s1,$s2,25 if ($s1 == $s2) go to Equal test; PC-relative branch
PC + 4 + 100
branch on not equal bne $s1,$s2,25 if ($s1!= $s2) go to Not equal test; PC-relative
PC + 4 + 100
set on less than slt $s1,$s2,$s3 if ($s2 < $s3) $s1 = 1; Compare less than; for beq, bne
else $s1 = 0
set on less than sltu $s1,$s2,$s3 if ($s2 < $s3) $s1 = 1; Compare less than unsigned
Conditional unsigned else $s1 = 0
branch set less than Compare less than constant
slti $s1,$s2,20 if ($s2 < 20) $s1 = 1;
immediate else $s1 = 0
set less than sltiu $s1,$s2,20 if ($s2 < 20) $s1 = 1; Compare less than constant
immediate unsigned else $s1 = 0 unsigned

24.Wrie down the MIPS Assembly language notation for Unconditional branch operations.
Category Instruction Example Meaning Comments
jump j 2500 go to 10000 Jump to target address
Unconditional jump register jr $ra go to $ra For switch, procedure return
jump jump and link jal 2500 $ra = PC + 4; go to 10000 For procedure call

25. What is Addressing Modes?


The different ways in which the location of an operand is specified in an instruction is called as
Addressing mode.

26. What are the different types of addressing Modes?


 Immediate addressing mode
 Register addressing mode
 Base or displacement addressing mode
 PC-relative addressing mode
 Pseudo-direct addressing mode

27.Define Register mode and Absolute Mode with examples.


Register mode
The operand is the contents of the processor register.
The name (address) of the register is given in the instruction.
Absolute Mode(Direct Mode):
 The operand is in new location.
 The address of this location is given explicitly in the instruction.
Eg: MOVE LOC,R2

The above instruction uses the register and absolute mode.


The processor register is the temporary storage where the data in the register are accessed using register
mode.
The absolute mode can represent global variables in the program.

Mode Assembler Syntax Addressing Function

Register mode Ri EA=Ri


Absolute mode LOC EA=LOC

Where EA-Effective Address


28.What is a Immediate addressing Mode?

The operand is given explicitly in the instruction.


Eg: Move 200 immediate ,R0

It places the value 200 in the register R0.The immediate mode used to specify the value of source
operand.

In assembly language, the immediate subscript is not appropriate so # symbol is used. It can be re-
written as
Move #200,R0
Assembly Syntax: Addressing Function
Immediate #value Operand =value

UNIT-2 ARITHMETIC FOR COMPUTERS

1. State the principle of operation of a carry look-ahead adder.


The input carry needed by a stage is directly computed from carry signals obtained from all the
preceding stages i-1,i-2,…..0, rather than waiting for normal carries to supply slowly from stage to
stage. An adder that uses this principle is called carry look-ahead adder.
2. What are the main features of Booth’s algorithm?
1) It handles both positive and negative multipliers uniformly.
2) It achieves some efficiency in the number of addition required when the multiplier has a few
large blocks of 1s.
3. How can we speed up the multiplication process?(CSE Nov/Dec 2003)
There are two techniques to speed up the multiplication process:
1) The first technique guarantees that the maximum number of summands that must be added is n/2
for n-bit operands.
2) The second technique reduces the time needed to add the summands.

4. What is bit pair recoding? Give an example.


Bit pair recoding halves the maximum number of summands. Group the Booth-recoded
multiplier bits in pairs and observe the following: The pair (+1 -1) is equivalent to the pair (0 +1).
That is instead of adding -1 times the multiplicand m at shift position i to +1 ( M at position i+1, the
same result is obtained by adding +1 ( M at position i.
Eg: 11010 – Bit Pair recoding value is 0 -1 -2
5. What is the advantage of using Booth algorithm?
1) It handles both positive and negative multiplier uniformly.
2) It achieves efficiency in the number of additions required when the multiplier has
a few large blocks of 1‟s.
3) The speed gained by skipping 1‟s depends on the data.
6. Write the algorithm for restoring division.
Do the following for n times:
1) Shift A and Q left one binary position.
2) Subtract M and A and place the answer back in A.
3) If the sign of A is 1, set q0 to 0 and add M back to A.
Where A- Accumulator, M- Divisor, Q- Dividend.
7. Write the algorithm for non restoring division.
Do the following for n times:
Step 1: Do the following for n times:
1) If the sign of A is 0 , shift A and Q left one bit position and subtract M from A; otherwise , shift
A and Q left and add M to A.
2) Now, if the sign of A is 0,set q0 to 1;otherwise , set q0 to0.
Step 2: if the sign of A is 1, add M to A.
8. When can you say that a number is normalized?
When the decimal point is placed to the right of the first (nonzero) significant digit, the number is
said to be normalized.
9. Explain about the special values in floating point numbers.
The end values 0 to 255 of the excess-127 exponent E( are used to represent special values such
as:
When E(= 0 and the mantissa fraction M is zero the value exact 0 is represented.
When E(= 255 and M=0, the value ( is represented.
When E(= 0 and M (0 , denormal values are represented.
When E(= 2555 and M(0, the value represented is called Not a number.
10. Write the Add/subtract rule for floating point numbers.
1) Choose the number with the smaller exponent and shift its mantissa right a number of steps equal to
the difference in exponents.
2) Set the exponent of the result equal to the larger exponent.
3) Perform addition/subtraction on the mantissa and determine the sign of the result
4) Normalize the resulting value, if necessary.
11. Write the multiply rule for floating point numbers.
1) Add the exponent and subtract 127.
2) Multiply the mantissa and determine the sign of the result .
3) Normalize the resulting value , if necessary.
12. What is the purpose of guard bits used in floating point arithmetic
Although the mantissa of initial operands are limited to 24 bits, it is important to retain extra bits,
called as guard bits.
13. What are the ways to truncate the guard bits?
There are several ways to truncate the guard bits:
1) Chooping
2) Von Neumann rounding
3) Rounding
14. Define carry save addition(CSA) process.
Instead of letting the carries ripple along the rows, they can be saved and introduced into the next
roe at the correct weighted position. Delay in CSA is less than delay through the ripple carry adder.
15. What are generate and propagate function?
The generate function is given by
Gi=xiyi and
The propagate function is given as
Pi=xi+yi.
16. What is floating point numbers?
In some cases, the binary point is variable and is automatically adjusted as computation proceeds.
In such case, the binary point is said to float and the numbers are called floating point numbers.
17. In floating point numbers when so you say that an underflow or overflow has occurred?
In single precision numbers when an exponent is less than -126 then we say that an underflow has
occurred. In single precision numbers when an exponent is less than +127 then we say that an
overflow has occurred.
18. What are the difficulties faced when we use floating point arithmetic?
Mantissa overflow: The addition of two mantissas of the same sign may result in a carryout of the
most significant bit
Mantissa underflow: In the process of aligning mantissas ,digits may flow off the right end of the
mantissa.
Exponent overflow: Exponent overflow occurs when a positive exponent exceeds the maximum
possible value.
Exponent underflow: It occurs when a negative exponent exceeds the maximum possible
exponent value.
19.In conforming to the IEEE standard mention any four situations under which a processor sets
exception flag.
Underflow: If the number requires an exponent less than -126 or in a double precision, if the
number requires an exponent less than -1022 to represent its normalized form the underflow occurs.
Overflow: In a single precision, if the number requires an exponent greater than -127 or in a
double precision, if the number requires an exponent greater than +1023 to represent its normalized form
the underflow occurs.
Divide by zero: It occurs when any number is divided by zero.
Invalid: It occurs if operations such as 0/0 are attempted.
20. Why floating point number is more difficult to represent and process than integer?(CSE
May/June 2007)
An integer value requires only half the memory space as an equivalent.IEEEdouble-precision
floatingpoint value. Applications that use only integer based arithmetic will therefore also have
significantly smaller memory requirement
A floating-point operation usually runs hundreds of times slower than an equivalent integer based
arithmetic operation.
21.Give the booth’s recoding and bit-pair recoding of the computer.
1000111101000101(CSE May/June 2006)
Booth‟s recoding
1 0 0 0 1 1 1 1 0 1 0 0 0 1 0 1 0

-1 0 0 +1 0 0 0 -1 +1 -1 0 0 +1 -1 +1 -1

Bit-Pair recoding:

1 0 0 0 1 1 1 1 0 1 0 0 0 1 0 1 0

-2 +1 0 -1 +1 0 +1 1
22.Draw the full adder circuit and give the truth table (CSE May/June 2007)
Inputs Outputs
A B C Carry Sum
0 0 0 0 0
0 0 1 0 1
0 1 0 0 1
0 1 1 1 0
1 0 0 0 1
1 0 1 1 0
1 1 0 1 0
1 1 1 1 1

23.What is a guard bit and what are the ways to truncate the guard bits?
The first of two extra bits kept on the right during intermediate calculations of floating point numbers,
called guard bit.
There are several ways to truncate the guard bits
 Chopping
 Von-Neumann rounding
 Rounding
24. Write the Add/subtract rule for floating point numbers.
 Choose the number with the smaller exponent and shift its mantissa right a number of steps
equal to the difference in exponents.
 Set the exponent of the result equal to the larger exponent.
 Perform addition/subtraction on the mantissa and determine the sign of the result
 Normalize the resulting value, if necessary.

25.Why floating point number is more difficult to represent and process than integer?
An integer value requires only half the memory space as an equivalent. IEEE double- precision
floating point value. Applications that use only integer based arithmetic will therefore also have
significantly smaller memory requirement
A floating-point operation usually runs hundreds of times slower than an equivalent integer based
arithmetic operation.

26.Define sub-word parallelism?


A sub-word is a lower precision unit of data contained within a word. In sub-word parallelism,
multiple sub-words are packed into a word and then process whole words. With the appropriate sub-
word boundaries this technique results in parallel processing of sub-words. Since the same instruction
is applied to all sub-words within the word, this is a form of SIMD (Single Instruction Multiple Data)
processing.
27.Define ALU.
Arithmetic and logic unit is the brain of the computer, thedevice that performs the arithmetic
operations like addition, subtraction, and multiplication or logical operations like OR, AND, NOT
and EX-OR.
28.What is arithmetic overflow?
In a digital computer, the amount that a calculated value is greater than a given register or storage
location can store or represent such condition is called arithmetic overflow.
29.What is floating point numbers? In floating point numbers when so you say that an underflow or
overflow has occurred?
In some cases, the binary point is variable and is automatically adjusted as computation proceeds.
In such case, the binary point is said to float and the numbers are called floating point numbers.
In single precision numbers when an exponent is less than -126 then we say that an underflow has
occurred. In single precision numbers when an exponent is less than +127 then we say that an
overflow has occurred.
30. How overflow occur in subtraction?
Overflow occurs because computer arithmetic is not closed with respect to subtraction. Overflow
cannot occur in subtraction if the operands have different (same) signs. To detect and compensate
for overflow, one need n+1 bits, if a n-bit number representation is employed.
UNIT-3 PROCESSOR AND CONTROL UNIT

1. Define pipelining. What are the steps required for a pipelined processor to process the
instruction? What are the advantages of Pipeline?
Pipelining is a technique in which multiple instructions are overlapped in execution.
It is a technique of decomposing a sequential process into sub operations with each sub process being
executed in a special dedicated segment that operates concurrently with all other segments.
 IF – Instruction Fetch
 ID – Instruction Decode
 EX – Execution or address calculation
 MEM – Data Memory Access
 WB – Write Back
Advantages
Main advantage of pipelining is that the cycle time of the processor is reduced and it can
increase the instruction throughput.
2. What are Hazards? State different types of hazards that can occur in pipeline.
A hazard is also called as hurdle. The situation that prevents the next instruction in the instruction stream
from executing during its designated Clock cycle. Stall is introduced by hazard. (Ideal stage)
The types of hazards that can occur in the pipelining were,
1. Data hazards.
2. Instruction hazards.
3. Structural hazards.
3. What is meant by branch prediction?
Branch prediction is a method of resolving a branch hazard. It assumes a given outcome for the
branch and proceeds from that assumption rather than waiting to ascertain the actual outcome.
Dynamic branch prediction:
Dynamic branch prediction is a prediction of branches at runtime using runtime information
4. What is Branch prediction buffer?
 Branch prediction buffer also called branch history table.
 It is a small memory that is indexed by lower portion of the address of the branch instruction.
 It contains one or more bits indicating whether the branch was recently taken or not.
5. What is exception?
Exception is an unscheduled event that disrupts program execution and used to detect overflow.
Exceptions are created to handle unexpected events from within the processor.
There are two kinds of exceptions are available in MIPS architecture such as
 Execution of an undefined instruction
 An arithmetic overflow exception
6. Define Sign extend.
In the load and store or branch operations, the intermediate is added to a 32-bit operand (register or PC
contents to create an operand or perhaps a new address, for instance in a branch predict instruction. Since
the immediate may be + or -, then to odd it to a 32-bit argument, we must sign extend it to 32-bit. The sign
extension unit performs this.
7. Define Data hazards
A data hazard is any condition in which either the source or the destination operands of
an instruction are not available at the time expected in pipeline. As a result some operation has
to be delayed, and the pipeline stalls.
8. Define Instruction hazards
The pipeline may be stalled because of a delay in the availability of an instruction. For
example, this may be a result of miss in cache, requiring the instruction to be fetched from the
main memory. Such hazards are called as Instruction hazards or Control hazards.
9. Define Structural hazards?
The structural hazards is the situation when two instructions require the use of a given
hardware resource at the same time. The most common case in which this hazard may arise is
access to memory.
10. What are the classification of data hazards?
Classification of data hazard: A pair of instructions can produce data hazard by referring
reading or writing the same memory location. Assume that i is executed before J. So, the hazards
can be classified as,
1. RAW hazard
2. WAW hazard
3. WAR hazard
11. Define RAW hazard : ( read after write)
Instruction „j‟ tries to read a source operand before instruction „i‟ writes it.
12. Define WAW hazard :( write after write)
Instruction „j‟ tries to write a source operand before instruction „i‟ writes it.
13.Define WAR hazard :( write after read)
Instruction „j‟ tries to write a source operand before instruction „i‟ reads it.
14. How data hazard can be prevented in pipelining?
Data hazards in the instruction pipelining can prevented by the following techniques.
a)Operand Forwarding
b)Software Approach
15.How Compiler is used in Pipelining?
A compiler translates a high level language program into a sequence of machine instructions. To
reduce N, we need to have suitable machine instruction set and a compiler that makes good use of it. An
optimizing compiler takes advantages of various features of the target processor to reduce the product
N*S, which is the total number of clock cycles needed to execute a program. The number of cycles is
dependent not only on the choice of instruction, but also on the order in which they appear in the program.
The compiler may rearrange program instruction to achieve better performance of course, such changes
must not affect of the result of the computation.

16. How addressing modes affect the instruction pipelining?


Degradation of performance is an instruction pipeline may be due to address dependency where
operand address cannot be calculated without available informatition needed by addressing mode for e.g.
An instructions with register indirect mode cannot proceed to fetch the operand if the previous
instructions is loading the address into the register. Hence operand access is delayed degrading the
performance of pipeline.
17. What is locality of reference?
Many instruction in localized area of the program are executed repeatedly during some time period
and the remainder of the program is accessed relatively infrequently .this is referred as locality of
reference.
18. What is the need for reduced instruction chip?
• Relatively few instruction types and addressing modes.
• Fixed and easily decoded instruction formats.
• Fast single-cycle instruction execution.
• Hardwired rather than micro programmed control
19. Define memory access time?
The time that elapses between the initiation of an operation and completion of that operation ,for
example ,the time between the READ and the MFC signals .This is Referred to as memory access time.
20. Define memory cycle time.
The minimum time delay required between the initiations of two successive memory operations, for
example, the time between two successive READ operations.
21.Define Static Memories.
Memories that consist of circuits capable of retaining the state as long as power is applied are known
as static memories.
22. List out Various branching technique used in micro program control unit?
a) Bit-Oring
b) Using Conditional Variable
c) Wide Branch Addressing
23. How the interrupt is handled during exception?
* CPU identifies source of interrupt
* CPU obtains memory address of interrupt handles
* pc and other CPU status information are saved
* Pc is loaded with address of interrupt handler and handling program to handle it.
24. List out the methods used to improve system performance.
The methods used to improve system performance are
1. Processor clock
2.Basic Performance Equation
3.Pipelining
4.Clock rate
5.Instruction set
6.Compiler
UNIT-4 PARALLELISM
1. What is meant by ILP?
Pipelining exploits the potential parallelism among instructions. This parallelism
is called instruction-level parallelism (ILP). There are two primary methods for
increasing the potential amount of instruction-level parallelism.
1. Increasing the depth of the pipeline to overlap more instructions.
2. Multiple issue.
2. What is Multithreading?
Multithreading allows multiple threads to share the functional units of a single processor in an
overlapping fashion. To permit this sharing, the processor must duplicate the independent state of
each thread.
The two main approaches to multithreading are
 Fine-grained multithreading
Fine grained multithreading is a version of hardware multithreading that implies switching between
threads after every instruction.
 Coarse-grained multithreading
It switches threads only on costly stalls. Thus it is much less likely to slow down the execution
an individual thread.
3. What is meant by speculation?
One of the most important methods for finding and exploiting more ILP is
speculation. It is an approach whereby the compiler or processor guesses the outcome of
an instruction to remove it as dependence in executing other instructions.
For example, we might speculate on the outcome of a branch, so that instructions
after the branch could be executed earlier.
4. Define Superscalar processor.
Superscalar is an advanced pipelining technique that enables the processor to execute
more than one instruction per clock cycle by selecting them during execution. Dynamic
multiple-issue processors are also known as superscalar processors, or simply
superscalars.
5. Differentiate UMA from NUMA.
Uniform memory access (UMA) is a multiprocessor in which latency to any word
in main memory is about the same no matter which processor requests the access.
Non uniform memory access (NUMA) is a type of single address space
multiprocessor in which some memory accesses are much faster than others depending on
which processor asks for which word.
6. What is strong and weak Scaling?
There are two basis ways to measure the parallel performance of a given application, depending on
whether or not one is CPU-bound or memory bound. These are referred to a strong and wek scaling
respectively.
Strong scaling problem size stays fixed but the number of processing elements are increased. This
is used as justification for programs that take a long time to run.
Weak scaling the problem size assigned to each processing element stays constant and additional
elements are used to solve a larger total problem. Therefore, this type of measurement is justification for
programs that take a lot of memory or other system resources.
7. Explain various types of Dependences in ILP.
 Data Dependences
 Name Dependences
 Control Dependences
8. What are multiprocessors? Mention the categories of multiprocessors?
Multiprocessor are used to increase performance and improve availability. The different categories are
SISD, SIMD, MIMD.
9. What are two main approaches to multithreading?
 Fine-grained multithreading
 Coarse-grained multithreading
9. a. What is the need to use multiprocessors?
1. Microprocessors as the fastest CPUs
Collecting several much easier than redesigning 1
2. Complexity of current microprocessors
Do we have enough ideas to sustain 1.5X/yr?
Can we deliver such complexity on schedule?
3. Slow (but steady) improvement in parallel software (scientific apps, databases, OS)
4. Emergence of embedded and server markets driving microprocessors in addition to desktops
Embedded functional parallelism, producer/consumer model
5. Server figure of merit is tasks per hour vs. latency.

10. Write the advantages of Multithreading.


If a thread gets a lot of cache misses, the other thread(s) can continue, taking advantage of the
unused computing resources, which thus can lead to faster overall execution, as these resources would
have been idle if only a single thread was executed. If a thread cannot use all the computing resources of
the CPU (because instructions depend on each other's result), running another thread permits to not leave
these idle.
If several threads work on the same set of data, they can actually share their cache, leading to
better cache usage or synchronization on its values.
11. Write the disadvantages of Multithreading.
Multiple threads can interfere with each other when sharing hardware resources such as caches
or translation lookaside buffers (TLBs). Execution times of a single-thread are not improved but can be
degraded, even when only one thread is executing. This is due to slower frequencies and/or additional
pipeline stages that arc necessary to accommodate thread-switching hardware. Hardware support for
Multithreading is more visible to software, thus requiring more changes to both application programs and
operating systems than Multi processing.
12. What is CMT?
Chip multiprocessors - also called multi-core microprocessors or CMPs for short - are now the
only way to build high-performance microprocessors, for a variety of reasons. Large uniprocessors are no
longer scaling in performance, because it is only possible to extract a limited amount of parallelism from
a typical instruction stream using conventional superscalar instruction issue techniques. In addition, one
cannot simply ratchet up the clock speed on today's processors, or the power dissipation will become
prohibitive in all but water-cooled systems.
13. What is SMT?
Simultaneous multithreading, often abbreviated as SMT, is a technique for improving the overall
efficiency of superscalar CPUs with hardware multithreading. SMT permits multiple independent threads
of execution to better utilize the resources provided by modern processor architectures.
14. Write the advantages of CMP?
CMPs have several advantages over single processor solutions energy and silicon area efficiency
i. By Incorporating smaller less complex cores onto a single chip
ii. Dynamically switching between cores and powering down unused cores
iii. Increased throughput performance by exploiting parallelism
iv.Multiple computing resources can take better advantage of instruction, thread, and process level

15. What are the Disadvantages of SMT?


Simultaneous multithreading cannot improve performance if any of the shared resources are
limiting bottlenecks for the performance. In fact, some applications run slower when simultaneous
multithreading is enabled. Critics argue that it is a considerable burden to put on software developers that
they have to test whether simultaneous multithreading is good or bad for their application in various
situations and insert extra logic to turn it off if it decreases performance.
16. What are the types of Multithreading?
 Block multi-threading
 Interleaved multi-threading
17. What Thread-level parallelism (TLP)?
 Explicit parallel programs already have TLP (inherent)
 Sequential programs that are hard to parallelize or ILP-limited can be speculatively parallelized in
hardware.
18. List the major MIMD Styles
Centralized shared memory ("Uniform Memory Access" time or "Shared Memory Processor")
Decentralized memory (memory module CPU) get more memory bandwidth, lower memory
Drawback: Longer communication latency
Drawback: Software model more complex
19. Distinguish between shared memory multiprocessor and message-passing multiprocessor.
 A multiprocessor with a shared address space, that address space can be used to communicate
data implicitly via load and store operations is shared memory multiprocessor.
 A multiprocessor with a multiple address space, communication of data is done by explicitly
passing message among processor is message-passing multiprocessor.

20. Draw the basic structure of Basic Structure of a Symmetric Shared Memory Multiprocessor
21. What is multicore'?
At its simplest, multi-core is a design in which a single physical processor contains the core logic
of more than one processor. It's as if an Intel Xeon processor were opened up and inside were packaged
all the circuitry and logic for two (or more) Intel Xcon processors. The multi-core design takes several
such processor "cores" and packages them as a single physical processor.The goal of this design is to
enable a system to run more tasks simultaneously and thereby achieve greater overall system
performance.
22. Write the software implications of a multicore processor?
Multi-core systems will deliver benefits to all software, but especially multi-threaded programs.All code
that supports HT Technology or multiple processors, for example, will benefit automatically from multi-
core processors, without need for modification. Most server-side enterprise packages and many desktop
productivity tools fall into this category.
23. What is coarse grained multithreading?
It switches threads only on costly stalls. Thus it is much less likely to slow down the execution
an individual thread.
UNIT-5 MEMORY AND I/O SYSTEMS

1. What are the multimedia applications which use caches?


Some Multimedia application areas where cache is extensively used are
*Multimedia Entertainment
*Education
*Office Systems
*Audio and video Mail
*Computer Architecture - Set 6
2. Explain virtual memory technique.
Techniques that automatically move program and data blocks into the physical memory
when they are required for execution are called virtual memory technique
3. What are virtual and logical addresses?
The binary addresses that the processor issues for either instruction or data are called
virtual or logical addresses.
4. Define translation buffer.
Most commercial virtual memory systems incorporate a mechanism that can avoid the bulk of the
main memory access called for by the virtual to physical addresses translation buffer. This may be done
with a cache memory called a translation buffer.
5. What is branch delay slot?
The location containing an instruction that may be fetched and then discarded because of
the branch is called branch delay slot.
6. What is optical memory?
Optical or light based techniques for data storage, such memories usually employ optical
disk which resemble magnetic disk in that they store binary information in concentric tracks on
an electromechanically rotated disks. The information is read as or written optically, however
with a laser replacing the read write arm of a magnetic disk drive. Optical memory offer high
storage capacities but their access rate is are generally less than those of magnetic disk.
7. What are static and dynamic memories?
Static memory are memories which require periodic no refreshing. Dynamic memories
are memories, which require periodic refreshing.
8. What are the components of memory management unit?
A facility for dynamic storage relocation that maps logical memory references into
physical memory addresses.
A provision for sharing common programs stored in memory by different users .
9. What is the role of MAR and MDR?
The MAR (memory address register) is used to hold the address of the location to or from
which data are to be transferred and the MDR(memory data register) contains the data to be
written into or read out of the addressed location.
10. Distinguish Between Static RAM and Dynamic RAM?
Static RAM are fast, but they come at high cost because their cells require several
transistors. Less expensive RAM can be implemented if simpler cells are used. However such
cells do not retain their state indefinitely; Hence they are called Dynamic RAM.
11. Distiguish between asynchronies DRAM and synchronous RAM.
The specialized memory controller circuit provides the necessary control signals, RAS
And CAS ,that govern the timing. The processor must take into account the delay in the response
of the memory. Such memories are referred to as asynchronous DRAMS.The DRAM whose
operations is directly synchronized with a clock signal. Such Memories are known as
synchronous DRAM.
12. What do you mean associative mapping technique?
The tag of an address received from the CPU is compared to the tag bits of each block of the cache
to see if the desired block is present. This is called associative mapping technique.
13. What is SCSI?
Small computer system interface can be used for all kinds of devices including RAID
storage subsystems and optical disks for large- volume storage applications.
14. What are the two types of latencies associated with storage?
The latency associated with storage is divided into 2 categories
1. Seek Latencies which can be classified into Overlapped seek,Mid transfer seek and Elevator seek
2. Rotational Latencies which can be reduced either by Zero latency read or Write and Interleave factor.
15. What do you mean by Disk Spanning?
Disk spanning is a method of attaching drives to a single host uadapter. All drives appear as a
single contiguous logical unit. Data is written to the first drive first and when the drive is full, the
controller switches to the second drive, then the second drive writes until its full.
16. What is SCSI?
Small computer system interface can be used for all kinds of devices including RAID storage
subsystems and optical disks for large- volume storage applications.
17. Define the term RELIABILITY
“Means feature that help to avoid and detect such faults. A realible system does not silently
continue and delivery result that include interrected and corrupted data, instead it corrects the corruption
when possible or else stops
18.Define the term AVAILABLITY:
“Means features that follow the systerm to stay operational even offen faults do occur. A highly
available systerm could dis able do the main functioning portion and continue operating at the reduced
capacity”
19. How the interrupt is handled during exception?
* cpu identifies source of interrupt
* cpu obtains memory address of interrupt handles
* pc and other cpu status information are saved
* Pc is loaded with address of interrupt handler and handling program to handle it

20. What is IO mapped input output?


A memory reference instruction activated the READ M (or)WRITE M control line and does not
affect the IO device. Separate IO instruction are required to activate the READ IO and WRITE IO lines
,which cause a word to be transferred between the address aio port and the CPU. The memory and IO
address space are kept separate.
21.Specify the three types of the DMA transfer techniques?
--Single transfer mode(cyclestealing mode)
--Block Transfer Mode(Brust Mode)
--Demand Transfer Mode
--Cascade Mode
22. What is an interrupt?
An interrupt is an event that causes the execution of one program to be suspended and another
program to be executed.
23.What are the uses of interrupts?
*Recovery from errors
*Debugging
*Communication between programs
*Use of interrupts in operating system
24. Define vectored interrupts.
In order to reduce the overhead involved in the polling process, a device requesting an interrupt
may identify itself directly to the CPU. Then, the CPU can immediately start executing the corresponding
interrupt-service routine. The term vectored interrupts refers to all interrupthandling schemes base on this
approach.

25. Name any three of the standard I/O interface.


*SCSI (small computer system interface),bus standards
*Back plane bus standards
*IEEE 796 bus (multibus signals)
*NUBUS & IEEE 488 bus standard
26. What is an I/O channel?
An i/o channel is actually a special purpose processor, also called peripheral processor.The main
processor initiates a transfer by passing the required information in the input output channel. the channel
then takes over and controls the actual transfer of data.

27. What is a bus?


A collection of wires that connects several devices is called a bus.
28. Define word length?
Each group of n bits is referred to as a word of information and n is called the word length.
29. Why program controlled I/O is unsuitable for high-speed data transfer?
In program controlled i/o considerable overhead is incurred..because several program instruction
have to be executed for each data word transferred between the external devices and MM.Many high
speed peripheral; devices have a synchronous modes of operation.that is data transfer are controlled by a
clock of fixed frequency, independent of the cpu.
30. What is the function of i/o interface?
The function is to coordinate the transfer of data between the cpu and external devices.
31. What is NUBUS?
A NUBUS is a processor independent, synchronous bus standard intended for use in 32 bitmicro
processor system. It defines a backplane into which upto 16 devices may be plugged each in the form of
circuit board of standard dimensions.
32. Name some of the IO devices.
*Video terminals
*Video displays
*Alphanumeric displays
*Graphics displays
* Flat panel displays
*Printers
*Plotters
33. What are the steps taken when an interrupt occurs?
*Source of the interrupt
*The memory address of the required ISP
* The program counter &cpu information saved in subroutine
*Transfer control back to the interrupted program
34. Define interface.
The word interface refers to the boundary between two circuits or devices
35. What is programmed I/O?
Data transfer to and from peripherals may be handled using this mode. Programmed I/O
operations are the result of I/O instructions written in the computer program.
36. Define Synchronous bus.
- Synchronous bus on other hand contains synchronous clock that is used to validate each and every
signal.
- Synchronous buses are affected noise only when clock signal occurs.
- Synchronous bus designers must control with meta stability when attempting different clock signal
Frequencies
- Synchronous bus of meta stability arises in any flip flop. when time will be violated.
37. Define Asynchronous bus.
- Asynchronous buses can mistake noise pulses at any time for valid handshake signal.
- Asynchronous bus designer must deal with events that like synchronously.
- It must contend with meta stability when events that drive bus transaction.
-When flip flop experiences effects can occur in downstream circuitry unless proper design technique
which are used
38. Differentiate Programmed I/O and Interrupt I/O.
Programmed I/O is the process of I/O instruction written in computer program.
In programmed I/O technique to transfer data required constant monitoring on peripheral by CPU once data
transfer is initiated, CPU have to wait for next transfer.
Interrupt I/O is done by using interrupt and some special command.
In interrupt initiated I/O once data transfer initiated, CPU execute next program without wasting time and
interface keep monitoring the device. When the interface determines that the device is ready to transfer
data, it generates an interrupt request, CPU stop the current task, execute the transferring process and then
return to the previous on processing task.
39. Define Memory Interleaving.
In computing, interleaved memory is a design made to compensate for the relatively slow speed of dynamic
random-access memory (DRAM) or core memory, by spreading memory addresses evenly across memory
banks.
40. What is the purpose of Dirty/Modified bit in cache memory?
Dirty bit set indicates that the associated cache line has been changed since it was read from
memory(“dirty”), meaning that the processor has written data to that line and the new value has not
propagated all the way to the main memory.
41. Define Memory Hierarchy. What is the need to implement memory as a hierarchy?
In computer architecture the memory hierarchy is a concept used for storing & discussing
performance issues in computer architectural design, algorithm prediction, and the lower level
programming constructs such as involving locality of reference. The memory hierarchy In computer
storage distinguishes each level in the hierarchy by response time. Since response time, complexity, and
capacity are related, the levels may also be distinguished by their performance and controlling
technologies.
Computer pioneers correctly predicted that programmers would want unlimited amounts of fast
memory. An economical solution to that desire is a memory hierarchy which take advantage of locality and
cost

42. What are the various memory technologies?

There are four primary technologies used in memory hierarchies


 SRAM technology(main memory)
 DRAM technology(main memory)
 Flash Semiconductor technology
 Magnetic disks.
43. Define Hit ratio.
Hit ratio is the fraction of memory accesses found in the upper level. It is often used as a measure of the
performance of the memory hierarchy.
44. Define Virtual memory. State the advantages of Virtual memory.
Techniques that automatically move program and daa blocks into physical memory when they are required
for execution are called virtual memory technique.
Advantages
 It can eliminate external fragmentation
 Pages are mapped appropriately any way
 It has more efficient swapping
 Allocating memory is easy and cheap.
45. Define cache and cache miss.
Cache was the name chosen to represent the level of the memory hierarchy between the processor and main
memory.
Cache miss is a request for data from the cache that cannot be filled because the data is not present in the
cache.
PART-B

1. Explain various Memory Technologies


There are four primary technologies used in the memory hierarchy.
1. Main memory is implemented by DRAM (Dynamic Random Access Memory)
2. Upper level cache memory (Closer to processor) is implemented by SRAM (Static Random Access
Memory)
3. Flash memory with nonvolatile
4. Lowest level in memory hierarchy with slow and large storage called Magnetic Disk
Static Memories
Memory circuit which is capable of retaining their state as long as power is applied knows as static
memory.

 Two transistor inverters are cross connected to implement a basic flip-flop.


 The cell is connected to one word line and two bits lines by transistors T1 and T2.
 When word line is at ground level, the transistors are turned off and the latch retains its state.
 Read operation: In order to read state of SRAM cell, the word line is activated to close switches
T1 and T2. Sense/Write circuits at the bottom monitor the state of b and b‟.
 Write operation – the state of cell is set by placing the appropriate bit (b or b‟) and then activating
the word line.
Static RAMs (SRAMs):
 Consist of circuits that are capable of retaining their state as long as the power is applied.
 Volatile memories, because their contents are lost when power is interrupted.
 Access times of static RAMs are in the range of few nanoseconds.
 However, the cost is usually high.
Advantage of SRAM
 SRAM can be accessed quickly
 Access times of just few nanoseconds.
Disadvantage of SRAM
 High cost
 Volatile memory
Dynamic memory system
Large dynamic memory systems can be implemented using DRAM chips in a similar way to static
memory systems.
Placing large memory systems directly on the motherboard will occupy a large amount of space.
Also, this arrangement is inflexible since the memory system cannot be expanded easily. Packaging
considerations have led to the development of larger memory units known as SIMMs (Single In-line
Memory Modules) and DIMMs (Dual In-line Memory Modules).
Memory modules are an assembly of memory chips on a small board that plugs vertically onto a
single socket on the motherboard.
 Occupy less space on the motherboard.
 Allows for easy expansion by replacement.
Dynamic RAMs (DRAMs):
 Do not retain their state indefinitely.
 Contents must be periodically refreshed.
 Contents may be refreshed while accessing them for reading.
 Less expensive RAMS can be implemented if simpler cells are used.
Memory Technology Typical Access Time Price Per Bit in 2012
SRAM Semiconductor Memory 0.5 – 2.5 ns $500 - $1000
DRAM Semiconductor Memory 50 – 70 ns $10 - $20
Flash Semiconductor Memory 5000 – 50,000 ns $0.75 - $1.0
Magnetic Disk 5,000,000 – 20,000,000 ns $0.05 - $0.10
During Read operation the information stored in the cells selected by the word line and transmit this
information to the output data line.

Example for 16 X 8 Organization i.e. 16 wordline of 8 bit each


During write operation the sense/write circuit receives input information and store it in the cells of
selected word line.
There are two control signals (R/W and CS)
R/W – Read/Write
CS – Chip Select
The circuit can store 128-bits and require 14 external connections for address, data and control
lines and also need two lines for power supply and ground connection.
The below table shows that, how the density, cost, and access time of DRAMs have changed over
the years.
Synchronous DRAMs
 Operation is directly synchronized with processor clock signal.
 The outputs of the sense circuits are connected to a latch.
 During a Read operation, the contents of the cells in a row are loaded onto the latches.
 During a refresh operation, the contents of the cells are refreshed without changing the contents of
the latches.
 Data held in the latches correspond to the selected columns are transferred to the output.
 For a burst mode of operation, successive columns are selected using column address counter and
clock.
 CAS signal need not be generated externally. A new data is placed during raising edge of the
clock.

Timing diagram

 Row address is latched under the control of RAS signals and the memory takes 2 or 3 cycle to
activate the selected row.
 Column address is latched under the control of CAS signals and after a delay of „1‟ clock cycle,
the first set of data bit is places on the data lines.
 The SDRAM automatically increment the column address to address to access the next three set of
bits in the selected row which are placed on the data lines in the next „3‟ clock cycle.
 SDRAM have build-in refresh circuitry. This circuit behaves as a refresh counter, which provides
the address of the rows that are selected for refreshing. Each row must be refreshed at least every
64-ms
 SDRAM can be used with clock speed above 100MHZ
Latency, Bandwidth, and DDRSDRAMs
Memory latency is the time it takes to transfer a word of data to or from memory. Memory
bandwidth is the number of bits or bytes that can be transferred in one second.
DDRSDRAMs (Double Data Rate Synchronous Dynamic RAM)
SDRAM performs all actions on the rising edge of the clock signals. The faster version is called
DDRSDRAM to perform data transfer on the both edges of the clock i.e. rising and falling edge. Their
bandwidth is essentially doubled for long burst transfer but latency of these devices is similar to that of
the SDRAM. The cell array is organized in two banks. Each bank can be accessed separately.
Rambus memory
A very wide bus is expensive and requires a lot of space on a motherboard. An alternative
approach is to implement a narrow bus that is much faster. This approach (was used by Rambus Inc.) is
called Rambus.
Key features
Fast signaling method used to transfer information between chips.
Advantage
 Rambus channel – provide a complete specification for the design of such communication link.
 Rambus allow a clock frequency of 400MHZ.
 If data transfer on both edges of the clock then data transfer rate is 800MHZ.
 Rambus use multiple banks of cell array to access more than one word at a time such memory chip
is called RDRAM (Rambus DRAM).
 RDRAM chip can be assembled into the large modules called RIMM (Rambus in-line Memory
Module.
 It uses master-slave relationship. Master is a processor and slave is a RDRAM
Flash memory:
 Has similar approach to EEPROM.
 One difference
 EEPROM – It is possible to read and write the content of a single cell
 Flash memory - It is possible to read the content of a single cell but only possible to write
an entire block of cells.
 Flash devices have greater density.
 Higher capacity and low storage cost per bit.
 Power consumption of flash memory is very low, making it attractive for use in equipment that
is battery-driven.
 Single flash chips are not sufficiently large, so larger memory modules are implemented using
flash cards and flash drives.
Flash card – Large module is mount flash chip on a small card. Card is simply plug-in to accessible
slot. These cards also vary in memory size.
Flash drives – similar to hard disk, different is hard disk can store many Gigabytes (GB). But flash
drivers can store less than one GB. Hard disk provides an extremely low cost per bit. But flash drivers
are higher cost per bit.
Magnetic Disk
The magnetic hard disk consists of a collection of platters, which rotate on a spindle at 5400 to
15,000 revolutions per minute. The metal platters are covered with magnetic recording material on both
sides, similar to the material found on a cassette or videotape. To read and write information on a hard
disk, a movable arm containing a small electromagnetic coil called a read-write head is located just above
each surface. The entire drive is permanently sealed to control the environment inside the drive, which, in
turn, allows the disk heads to be much closer to the drive surface.

Each disk surface is divided into concentric circles, called tracks. There are typically tens of
thousands of tracks per surface. Each track is in turn divided into sectors that contain the information;
each track may have thousands of sectors. Sectors are typically 512 to 4096 bytes in size. The sequence
recorded on the magnetic media is a sector number, a gap, the information for that sector including error
correction code, a gap, the sector number of the next sector, and so on.
The disk heads for each surface are connected together and move in conjunction, so that every head is
over the same track of every surface. The term cylinder is used to refer to all the tracks under the heads at
a given point on all surfaces.

Seek time – time required to move the read/write head to the proper track.
Rotational delay/Rotational Latency time – the amount of time it takes for the desired sector of a
disk (i.e., the sector from which data is to be read or written) to rotate under the read-write heads of
the disk drive. This is the time for half a rotation of the disk.
Disk access time - The sum of seek time and latency time is called disk access time.
2. Discuss in detail about Cache Memory. Explain mapping functions in cache memory to
determine how memory blocks are placed in cache.
Processor is much faster than the main memory
 As a result, the processor has to spend much of its time waiting while instructions and data
are being fetched from the main memory.
 To achieving good performance
Speed of the main memory cannot be increased beyond a certain point. Cache memory is an
architectural arrangement which makes the main memory appear faster to the processor than it really is.
Cache memory is based on the property of computer programs known as “locality of reference”.
Analysis of programs indicates that many instructions in localized areas of a program are executed
repeatedly during some period of time, while the others are accessed relatively less frequently. These
instructions may be the ones in a loop, nested loop or few procedures calling each other repeatedly. This
is called “locality of reference”.
Temporal locality of reference:
 Recently executed instruction is likely to be executed again very soon.
Spatial locality of reference:
 Instructions with addresses close to a recently instruction are likely to be executed soon.

 Processor issues a Read request; a block of words is transferred from the main memory to the
cache, one word at a time.
 Subsequent references to the data in this block of words are found in the cache.
 At any given time, only some blocks in the main memory are held in the cache. Which blocks in
the main memory are in the cache is determined by a “mapping function”.
 When the cache is full, and a block of words needs to be transferred from the main memory, some
block of words in the cache must be replaced. This is determined by a “replacement algorithm”.
Cache Hit
Existence of a cache is transparent to the processor. The processor issues Read and
Write requests in the same manner.
If the data is in the cache it is called a Read or Write hit.
 Read hit:
 The data is obtained from the cache.
 Write hit:
 Cache has a replica of the contents of the main memory.
 Contents of the cache and the main memory may be updated simultaneously. This is the
write-through protocol.
 Update the contents of the cache, and mark it as updated by setting a bit known as the dirty
bit or modified bit. The contents of the main memory are updated when this block is
replaced. This is write-back or copy-back protocol.
Cache Miss
If the data is not present in the cache, then a Read miss or Write miss occurs.
Read miss:
 Block of words containing this requested word is transferred from the memory.
 After the block is transferred, the desired word is forwarded to the processor.
 The desired word may also be forwarded to the processor as soon as it is transferred without
waiting for the entire block to be transferred. This is called load-through or early-restart.
Write-miss:
 Write-through protocol is used, and then the contents of the main memory are updated
directly.
 If write-back protocol is used, the block containing the addressed word is first brought into
the cache. The desired word is overwritten with new information.
Cache Coherence Problem
 The copies of the data in the cache and the main memory are different. This is called the cache
coherence problem.
 One option is to force a write-back before the main memory is updated from the disk.
Mapping functions
Mapping functions determine how memory blocks are placed in the cache.
 A simple processor example:
 Cache consisting of 128 blocks of 16 words each.
 Total size of cache is 2048 (2K) words.
 Main memory is addressable by a 16-bit address.
 Main memory has 64K words.
 Main memory has 4K blocks of 16 words each.
 Three mapping functions:
 Direct mapping
 Associative mapping
 Set-associative mapping.
Direct mapping
Block j of the main memory maps to j modulo 128 of the cache. i.e. .0 maps to 0, 129 maps to 1 as
so on. More than one memory block is mapped onto the same position in the cache. This may lead to
contention for cache blocks even if the cache is not full. Resolve the contention by allowing new block to
replace the old block, leading to a trivial replacement algorithm.
Memory address is divided into three fields:
 Low order 4 bits determine one of the 16 words in a block.
 When a new block is brought into the cache, the next 7 bits determine which cache block
this new block is placed in.
 High order 5 bits determine which of the possible 32 blocks is currently present in the
cache. These are tag bits.
This mapping is simple to implement, but not very flexible.
Direct Mapping
Associative mapping
Main memory block can be placed into any cache position.

Associative mapping
Memory address is divided into two fields:
 Low order 4 bits identify the word within a block.
 High order 12 bits or tag bits identify a memory block when it is resident in the cache.
 Flexible, and uses cache space efficiently. Replacement algorithms can be used to replace an
existing block in the cache when the cache is full. Cost is higher than direct-mapped cache
because of the need to search all 128 patterns to determine whether a given block is in the cache.
Set-Associative mapping
Blocks of cache are grouped into sets. The mapping function allows a block of the main memory
to reside in any block of a specific set. Divide the cache into 64 sets, with two blocks per set. Memory
block 0, 64, 128 etc. map to block 0, and they can occupy either of the two positions.
Set Associative mapping
Memory address is divided into three fields:
 6 bit field determines the set number.
 High order 6 bit fields are compared to the tag fields of the two blocks in a set.
Set-associative mapping is a combination of direct and associative mapping.
 Various replacement algorithm‟s are used
 First-in-first-out replacement algorithm
 Random replacement algorithm
 Least-recently used replacement algorithm
 Most-recently used replacement algorithm
3. Explain in detail about Virtual memories.
When a program does not completely fit into the main memory the parts of it not currently being
executed are stored in secondary memory and transferred to main memory when it is required. Here, the
virtual memory technique is used to extend the apparent size of the physical memory. It uses the
secondary storage such as disks to extend the apparent size of the physical memory and automatically
transfer the program and data block into the physical memory when they are required file execution is
called virtual memory techniques.
Virtual memory organization
The segments which are currently being executed are kept in main memory and remaining
segments are stored in secondary storage such as magnetic disk. If the required segments which are not in
main memory, the executed segments is replaced by the required segments. In modern computer OS
perform these task. Processor generates virtual or logical address.

Logical address Translate by using Physical address


hardware
Virtual addresses are translated or software
into physical addresses by a combination of hardware and software
subsystems. (MMU)
 If virtual address refers to a part of the program that is currently in the main memory, it is
accessed immediately.
 If the address refers to a part of the program that is not currently in the main memory, it is first
transferred to the main memory before it can be used.
Memory management unit (MMU) translates virtual addresses into physical addresses. If the
desired data or instructions are in the main memory they are fetched as described previously.
If the desired data or instructions are not in the main memory, they must be transferred from
secondary storage to the main memory. MMU causes the operating system to bring the data from the
secondary storage into the main memory.
Address translation
All program and data are composed of fixed length called pages. The pages commonly range from
2K to 16K bytes in length. Whenever the page translation mechanism determines the pages are moved
between the main memory and disk i.e. swapping is required.
Cache bridges the speed gap between the processor and main memory and implemented in
hardware. Virtual memory mechanism bridges the size and speed gaps between main memory and
secondary memory.
Each virtual address generated by the processor, whether it is for an instruction fetch or an
operand fetch/store operations, is interpreted as a virtual page number (high order) followed by offset
(low-order).
Page table Information about the main memory location of each page.
Information Main memory addresses where the page is stored and the status of the page.
Page frame An area in the main memory that can hold one page.
Page table base register The starting address of the page table is kept in a page table in base register.
Address of the corresponding entry in the page table By adding the virtual page number and the content
of the page table base register
Control Bits:
It gives the status of the page.
Valid Bits: - Whether the page is actually loaded in main memory.
Dirty Bits: - Whether page is modified or not.
Access Bits: - Give the permission for that page.(ie) Read /Write permissions etc….
Translation Look-aside Buffer (TLB)
A small cache called as Translation Look-aside Buffer (TLB) is included in the MMU.
 TLB holds page table entries of the most recently accessed pages.

In addition to the above for each page, TLB must hold the virtual page number for each page.
High-order bits of the virtual address generated by the processor select the virtual page. These bits
are compared to the virtual page numbers in the TLB. If there is a match, a hit occurs and the
corresponding address of the page frame is read. If there is no match, a miss occurs and the page
table within the main memory must be consulted. Set-associative mapped TLBs are found in
commercial processors
 Upon detecting a page fault by the MMU, following actions occur:
 MMU asks the operating system to intervene by raising an exception.
 Processing of the active task which caused the page fault is interrupted.
 Control is transferred to the operating system.
 Operating system copies the requested page from secondary storage to the main memory.
 Once the page is copied, control is returned to the task which was interrupted.
 Servicing of a page fault requires transferring the requested page from secondary storage to the
main memory.
 This transfer may incur a long delay.
 While the page is being transferred the operating system may:
 Suspend the execution of the task that caused the page fault.
 Begin execution of another task whose pages are in the main memory.
 Enables efficient use of the processor.
 interrupt will be accepted.
4. Discuss DMA controller with block diagram.
Direct Memory Access (DMA)
 A special control unit may be provided to transfer a block of data directly between an I/O device
and the main memory, without continuous intervention by the processor.
 Control unit which performs these transfers is a part of the I/O device‟s interface circuit. This
control unit is called as a DMA controller.
 DMA controller performs functions that would be normally carried out by the processor:
 For each word, it provides the memory address and all the control signals.
 To transfer a block of data, it increments the memory addresses and keeps track of the
number of transfers.
 DMA controller can transfer a block of data from an external device to the processor, without any
intervention from the processor.
 However, the operation of the DMA controller must be under the control of a program
executed by the processor. That is, the processor must initiate the DMA transfer.
 To initiate the DMA transfer, the processor informs the DMA controller of:
 Starting address,
 Number of words in the block.
 Direction of transfer (I/O device to the memory, or memory to the I/O device).
 Once the DMA controller completes the DMA transfer, it informs the processor by raising an
interrupt signal.
 Two register are used to store starting address and the word count.
Status and control flag Register
 R/W – Direction of transfer
 Read – transfers data from memory to I/O devices
 Done flag is „1‟ - DMA completes data transfer
 Causes controller to raise an interrupt – IE „1‟ and IRQ – request an interrupt

DMA controller connects a high-speed network to the computer bus. Disk controller, which
controls two disks also, has DMA capability. It provides two DMA channels. It can perform two
independent DMA operations, as if each disk has its own DMA controller. The registers to store the
memory address, word count and status and control information are duplicated.
 Processor and DMA controllers have to use the bus in an interwoven fashion to access the
memory.
 DMA devices are given higher priority than the processor to access the bus.
 Among different DMA devices, high priority is given to high-speed peripherals such as a
disk or a graphics display device.
 Processor originates most memory access cycles on the bus.
 DMA controller can be said to “steal” memory access cycles from the bus. This
interweaving technique is called as “cycle stealing”.
 An alternate approach is to provide a DMA controller an exclusive capability to initiate transfers
on the bus, and hence exclusive access to the main memory. This is known as the block or burst
mode.
5. Explain in detail about the bus arbitration techniques in detail.
Bus arbitration
Processor and DMA controllers both need to initiate data transfers on the bus and access main
memory. The device that is allowed to initiate transfers on the bus at any given time is called the bus
master. When the current bus master relinquishes its status as the bus master, another device can
acquire this status. The process by which the next device to become the bus master is selected and bus
mastership is transferred to it is called bus arbitration.
 Centralized arbitration:
 A single bus arbiter performs the arbitration.
 Distributed arbitration:
 All devices participate in the selection of the next bus master.
Centralized Bus Arbitration
Bus arbiter may be the processor or a separate unit connected to the bus. Normally, the processor
is the bus master, unless it grants bus membership to one of the DMA controllers. DMA controller
requests the control of the bus by asserting the Bus Request (BR) line. In response, the processor
activates the Bus-Grant1 (BG1) line, indicating that the controller may use the bus when it is free.
BG1 signal is connected to all DMA controllers in a daisy chain fashion. BBSY signal is 0; it
indicates that the bus is busy. When BBSY becomes 1, the DMA controller which asserted BR can
acquire control of the bus.
Distributed arbitration
 All devices waiting to use the bus share the responsibility of carrying out the arbitration
process.
 Arbitration process does not depend on a central arbiter and hence distributed arbitration
has higher reliability.

 Each device is assigned a 4-bit ID number.


 All the devices are connected using 5 lines, 4 arbitration lines to transmit the ID, and one line
for the Start-Arbitration signal.
 To request the bus a device:
 Assert the Start-Arbitration signal.
 Places its 4-bit ID number on the arbitration lines.
 The pattern that appears on the arbitration lines is the logical-OR of all the 4-bit device IDs
placed on the arbitration lines.
6. Explain in detail about hardware multithreading.
Hardware-level support for storing state for multiple threads, permitting fast context switches.
There are two types of hardware multithreading
 Block or Coarse-grained multi-threading
 Fine-grained or Interleaved multi-threading
COARSE-GRAIN MULTITHREADING:
The simplest type of multi-threading occurs when one thread runs until it is blocked by an event
that normally would create a long latency stall. Such a stall might be a cache-miss that has to access
off-chip memory, which might take hundreds of CPU cycles for the data to return. Instead of waiting
for the stall to resolve, a threaded processor would switch execution to another thread that was ready
to run. Only when the data for the previous thread had arrived, would the previous thread be placed
back on the list of ready-to-run threads.
For example:
 Cycle i : instruction j from thread A is issued
 Cycle i+1: instruction j+1 from thread A is issued
 Cycle i+2: instruction j+2 from thread A is issued, load instruction which misses in all caches
 Cycle i+3: thread scheduler invoked, switches to thread B
 Cycle i+4: instruction k from thread B is issued
 Cycle i+5: instruction k+1 from thread B is issued
Conceptually, it is similar to cooperative multi-tasking used in real-time operating systems in
which tasks voluntarily give up execution time when they need to wait upon some type of the event.
HARDWARE COSTS:
The goal of multi-threading hardware support is to allow quick switching between a blocked
thread and another thread ready to run. To achieve this goal, the hardware cost is to replicate the
program visible registers as well as some processor control registers (such as the program counter).
Switching from one thread to another thread means the hardware switches from using one register set
to another.
Many families of microcontrollers and embedded processors have multiple register banks to allow
quick context switching for interrupts. Such schemes can be considered a type of block multithreading
among the user program thread and the interrupt threads. Such additional hardware has these benefits:
The thread switch can be done in one CPU cycle. It appears to each thread that it is executing alone
and not sharing any hardware resources with any other threads. This minimizes the amount of
software changes needed within the application as well as the operating system to support
multithreading. In order to switch efficiently between active threads, each active thread needs to have
its own register set. For example, to quickly switch between two threads, the register hardware needs
to be instantiated twice.
FINE-GRAINED MULTITHREADING (or) INTERLEAVED MULTI-THREADING:
 Cycle i+1: an instruction from thread B is issued
 Cycle i+2: an instruction from thread C is issued
 The purpose of this type of multithreading is to remove all data dependency stalls from the
execution pipeline.
 Since one thread is relatively independent from other threads, there's less chance of one
instruction in one pipe stage needing an output from an older instruction in the pipeline.
 Conceptually, it is similar to pre-emptive multi-tasking used in operating systems.
 One can make the analogy that the time-slice given to each active thread is one CPU cycle.
HARDWARE COSTS:
 In addition to the hardware costs discussed in the Block type of multithreading, interleaved
multithreading has an additional cost of each pipeline stage tracking the thread ID of the
instruction it is processing.
 Also, since there are more threads being executed concurrently in the pipeline, shared
resources such as caches and TLBs need to be larger to avoid thrashing between the different
threads.
MULTITHREADED CATEGORIES:

Multicore Processor
Single-core computer Multi-core architectures

Multi-core CPU chip


 The cores fit on a single processor socket
 Also called CMP (Chip Multi-Processor)
6. Explain in detail about Flynn’s Classification of parallel hardware.
 It was proposed by researcher Michael J. Flynn in 1966.
 It is the most commonly accepted taxonomy of computer organization.
 In this classification, computers are classified by whether it processes a single instruction at a time
or multiple instructions simultaneously, and whether it operates on one or multiple data sets.

Four categories of multiprocessor systems by their instruction and data streams


Classification
SISD – Single Instruction Stream Single Data Stream (Uniprocessor)
SIMD – Single Instruction Stream Multiple Data Stream (Vector Processor)
MISD – Multiple Instruction Stream Single Data Stream (No examples today)
MIMD – Multiple Instruction Stream Multiple Data Stream (Multiple Processors)
SPMD – Single Program Multiple Data (MIMD Programming Model) i.e. single program that runs on all
processors of an MIMD computer.

Single Instruction, Single Data stream (SISD)


A sequential computer which exploits no parallelism in either the instruction or data streams.
Single control unit (C) fetches single Instruction Stream (IS) from memory. The CU then generates
appropriate control signals to direct single processing element (P) to operate on single Data Stream (DS)
i.e. one operation at a time.
Examples of SISD architecture are the traditional uniprocessor machines like a PC (currently
manufactured PCs have multiple processors) or old mainframes.
In computing, SISD (single instruction, single data) is a term referring to a computer architecture
in which a single processor, a uniprocessor, executes a single instruction stream, to operate on data stored
in a single memory. This corresponds to the von Neumann architecture.
SISD is one of the four main classifications as defined in Flynn's taxonomy. In this system classifications
are based upon the number of concurrent instructions and data streams present in the computer
architecture. According to Michael J. Flynn, SISD can have concurrent processing characteristics.
Instruction fetching and pipelined execution of instructions are common examples found in most modern
SISD computers
Single Instruction Stream Multiple Data Streams (SIMD)
A computer which exploits multiple data streams against a single instruction stream to perform
operations which may be naturally parallelized. The example for SIMD is an array processor or GPU.
SIMD is a class of parallel computers in Flynn's taxonomy. It describes computers with multiple
processing elements that perform the same operation on multiple data points simultaneously. Thus, SIMD
works best with array in for loop to exploit data level parallelism, but not concurrency: there are
simultaneous (parallel) computations, but only a single process (instruction) at a given moment. SIMD is
particularly applicable to common tasks like adjusting the contrast in a digital image or adjusting the
volume of digital audio. Most modern CPU designs include SIMD instructions in order to improve the
performance of multimedia use.

The vector architectures easily capture the flexibility in data widths, so it is easy to make a vector
operation work on 32 64-bit data elements or 64 32-bit data elements or 128 16-bit data elements or 256
8-bit data elements. The parallel semantics of a vector instruction allows an implementation to execute
these operations using a deeply pipelined functional unit, an array of parallel functional units, or a
combination of parallel and pipelined functional units.
(a)Single add pipeline and can complete one addition per clock cycle (b) Four add pipeline or lanes and
can complete four additions per clock cycle
Vector arithmetic instructions usually only allow element N of one vector register to take part in
operations with element N from other vector registers. By construction of a highly parallel vector unit,
this can be structured as multiple parallel vector lanes. It can increase the peak throughput of a vector unit
by adding more lanes.

Structure of a four-lane vector unit


Four lanes can reduce the number of clocks per vector instruction by roughly a factor of four. The
vector architectures are a very efficient way to execute data parallel processing programs and they are
easier to evolve over time than the multimedia extensions to the x86 architecture.
Multiple Instruction Stream Single Data stream (MISD)
Multiple instructions operate on a single data stream. Uncommon architecture, which is generally
used for fault tolerance. Heterogeneous systems operate on the same data stream and must agree on the
result. Examples include the Space Shuttle flight control computer.

In computing, MISD (multiple instruction, single data) is a type of parallel computing architecture
where many functional units perform different operations on the same data. Pipeline architectures belong
to this type, though a purist might say that the data is different after processing by each stage in the
pipeline. Fault-tolerant computers executing the same instructions redundantly in order to detect and mask
errors, in a manner known as task replication, may be considered to belong to this type. Not many
instances of this architecture exist, as MIMD and SIMD are often more appropriate for common data
parallel techniques. Specifically, they allow better scaling and use of computational resources than MISD
does. However, one prominent example of MISD in computing is the Space Shuttle flight control
computers
Multiple Instruction Stream Multiple Data streams (MIMD)
Multiple autonomous processors are simultaneously executing different instructions on different
data. Distributed systems are generally recognized to be MIMD architectures; either exploiting a single
shared memory space or a distributed memory space. A multi-core superscalar processor is an MIMD
processor.

In computing, MIMD (multiple instructions, multiple data) is a technique employed to achieve


parallelism. Machines using MIMD have a number of processors that function asynchronously and
independently. At any time, different processors may be executing different instructions on different
pieces of data. MIMD architectures may be used in a number of application areas such as computer-aided
design/computer-aided manufacturing, simulation, modeling, and as communication switches. MIMD
machines can be of either shared memory or distributed memory categories. These classifications are based
on how MIMD processors access memory. Shared memory machines may be of the bus-based, extended,
or hierarchical type. Distributed memory machines may have hypercube or mesh interconnection schemes.
7. Explain in detail about Basic MIPS implementation.
Draw the Block diagram for Mips Implementation diagram with all necessary Control lines and
Multiplexers (13 Marks)

8. Discuss in detail about to building a data path. (13)


Program Counter
Instruction Memory
Adder
Register File
ALU
Data Memory
Sign Extend
Multiplexers
Data Path Elements for Branch (8 Marks)
Using this all fuctional units and building the data path Block diagram (5 Marks)
9. What is pipelining? Discuss about pipelined data path and control.
Definition (2 Marks)
Block Diagram with all control lines and all stages (11 Marks)
10. Explain in detail about various categories of hazards with examples.
Structural Hazards (2 Marks)
Data Hazards with avoiding Techniques (5 Marks)
Control Hazards with Branching Techniques (6 Marks)
11. Explain how floating point addition is carried out in a computer system. Give an example for a
binary floating point addition. Refer:2.7 Page No:2.50
12. Explain in detail about the multiplication algorithm with suitable example and diagram.
Refer:2.4 Page No: 2.18
13. Discuss in detail about division algorithm in detail with diagram and examples.
Refer:2.5 Page No: 2.32

2. Briefly explain about measuring and improving the performance of a computer?

Running a program on two different desktop computers, you’d say that the faster one is
the desktop computer that gets the job done first. If you were running a datacenter that had
several servers running jobs submitted by many users, you’d say that the faster computer was
the one that completed the most jobs during a day. As an individual computer user, you are
interested in reducing response time—the time between the start and completion of a task—
also referred as execution time.
Datacenter managers are often interested in increasing throughput or bandwidth—the
total amount of work done in a given time.
To maximize performance, we want to minimize response time or execution time for
some task. Thus, we can relate performance and execution time for a computer X:

This means that for two computers X and Y, if the performance of X is greater than the
performance of Y, we have

That is, the execution time on Y is longer than that on X, if X is faster than Y.
In discussing a computer design, we often want to relate the performance of two different
computers quantitatively. We will use the phrase “X is n times faster than Y”—or equivalently
“X is n times as fast as Y”—to mean

If X is n times as fast as Y, then the execution time on Y is n times as long as it is on X:

Measuring Performance:
The computer that performs the same amount of work in the least time is the fastest.
Program execution time is measured in seconds per program. CPU execution time or simply
CPU time, which recognizes this distinction, is the time the CPU spends computing for this task
and does not include time spent waiting for I/O or running other programs. CPU time can be
further divided into the CPU time spent in the program, called user CPU time, and the CPU
time spent in the operating system performing tasks onbehalf of the program, called system
CPU time.
The term system performance to refer to elapsed time on an unloaded system and CPU
performance to refer to user CPU time.
CPU Performance and Its Factors
Alternatively, because clock rate and clock cycle time are inverses,

This formula makes it clear that the hardware designer can improve performance by reducing
the number of clock cycles required for a program or the length of the clock cycle. As we will
see in later chapters, the designer often faces a trade-off between the number of clock cycles
needed for a program and the length of each cycle. Many techniques that decrease the number
of clock cycles may also increase the clock cycle time.

Instruction Performance
However, since the compiler clearly generated instructions to execute, and the
computer had to execute the instructions to run the program, the execution time must depend
on the number of instructions in a program.

The term clock cycles per instruction, which is the average number of clock cycles each
instruction takes to execute, is often abbreviated as CPI. Since different instructions may take
different amounts of time depending on what they do, CPI is an average of all the instructions
executed in the program.

The Classic CPU Performance Equation


The basic performance equation in terms of instruction count (the number of
instructions executed by the program), CPI, and clock cycle time:

or, since the clock rate is the inverse of clock cycle time:

3. Briefly explain about manufacturing process of integrated chips with neat diagram?
The manufacture of a chip begins with silicon, a substance found in sand. Because
silicon does not conduct electricity well, it is called a semiconductor. With a special chemical
process, it is possible to add materials to silicon that allow tiny areas to transform into one of
three devices:
Excellent conductors of electricity (using either microscopic copper or aluminum wire)
Excellent insulators from electricity (like plastic sheathing or glass)
Areas that can conduct or insulate under special conditions (as a switch)
Transistors fall in the last category. A VLSI circuit, then, is just billions of combinations of
conductors,insulators, and switches manufactured in a single small package.
Figure shows process for Integrated chip manufacturing. The process starts with a silicon
crystal ingot, which looks like a giant sausage. Today, ingots are 8–12 inches in diameter and
about 12–24 inches long. An ingot is finely sliced into wafers no more than 0.1 inches thick.
These wafers then go through a series of processing steps, during which patterns of chemicals
are placed on each wafer, creating the transistors, conductors, and insulators.

The simplest way to cope with imperfection is to place many independent components on a
single wafer.
The patterned wafer is then chopped up, or diced, into these components, called dies and
more informally known as chips. To reduce the cost, using the next generation process shrinks a
large die as it uses smaller sizes for both transistors and wires. This improves the yield and the
die count per wafer.
Once you’ve found good dies, they are connected to the input/output pins of a package,
using a process called bonding. These packaged parts are tested a final time, since mistakes can
occur in packaging,and then they are shipped to customers.
The cost of an integrated circuit can be expressed in three simple equations:
4. Write short notes on operations and operands in computer hardware?
The words of a computer’s language are called instructions, and its vocabulary is
called an instruction set.

Operations in MIPS:
Every computer must be able to perform arithmetic. The MIPS assembly language
notation add a, b, c instructs a computer to add the two variables b and c and to put their sum
in a.
This notation is rigid in that each MIPS arithmetic instruction performs only one
operation and must always have exactly three variables.
EXAMPLE, To add 4 variables, b,c,d,e and store it in a.
add a, b, c # The sum of b and c is placed in a
add a, a, d # The sum of b, c, and d is now in a
add a, a, e # The sum of b, c, d, and e is now in a
Thus, it takes three instructions to sum the four variables.

Name Example Comments


$s0-$s7, $t0-$t9, Fast locations for data. In MIPS, data must be in registers to perform
32 registers $zero, $a0-$a3, $v0- arithmetic. MIPS register $zero always equals 0. $gp (28) is the global
$v1, $gp, $fp, $sp, $ra pointer, $sp(29) is the stack pointer, $fp (30) is the frame pointer, and $ra
(31) is the return address.
Memory [0], Accessed only by data transfer instructions. MIPS uses byte addresses, so
30
2 memory words Memory [4],…, sequential words differ by 4. Memory holds data structures, such as arrays,
Memory[42949672920 and spilled register, such as those saved on procedure calls.

Operands in MIPS:
The operands of arithmetic instructions are restricted; they must be from a limited
number of special locations built directly in hardware called registers. The size of a register in
the MIPS architecture is 32 bits; groups of 32 bits occur so frequently that they are given the
name word in the MIPS architecture.
Design Principle 2: Smaller is faster.
A very large number of registers may increase the clock cycle time simply because it
takes electronic signals longer when they must travel farther. So, 32 registers were used in MIPS
architecture. The MIPS convention is to use two-character names following a dollar sign to
represent a register. eg: $s0, $s1
Example: f = (g + h) – (i + j); instructions using registers.
add $t0,$s1,$s2 # register $t0 contains g + h
add $t1,$s3,$s4 # register $t1 contains i + j
sub $s0,$t0,$t1 # f gets $t0 – $t1, which is (g + h)–(i + j)

Memory Operands:
Programming languages have simple variables that contain single data elements, as in
these examples, but they also have more complex data structures—arrays and structures. These
complex data structures can contain many more data elements than there are registers in a
computer. The processor can keep only a small amount of data in registers, but computer
memory contains billions of data elements. So, MIPS must include instructions that transfer
data between memory and registers.
Such instructions are called data transfer instructions. To
access a word in memory, the instruction must supply the memory
address.
In MIPS, words must start at addresses that are multiples of 4.
This requirement is called an
alignment restriction, and many architectures have it.(since in MIPS
each 32 bits form a word in memory, so the address of one word to
another jumps in multiples of 4)

Byte addressing also affects the array index. To get the proper byte address in the code
above, the off set to be added to the base register $s3 must be 4 x 8, or 32,(as per previous
example).
Above EXAMPLE: g = h + A[8]; (implemented based on byte address)
To get A[8] from memory use lw and calculate (8 x4) = 32 which is the actual offset value,
lw $t0,32($s3) # Temporary reg $t0 gets A[8]
Use Result of A[8] i.e., stored in $t0, add $s1,$s2,$t0 # g = h + A[8]
The instruction complementary to load is traditionally called store; it copies data from a
register to memory.
The format of a store is similar to that of a load: the name of the operation, followed by
the register to be stored, then off set to select the array element, and finally the base register.
Once again, the MIPS address is specified in part by a constant and in part by the contents of a
register. The actual MIPS name is sw, standing for store word.
EXAMPLE: A[12] = h + A[8];
lw $t0, 32($s3) # Temporary reg $t0 gets A[8], note (8 x4) used.
add $t0, $s2,$t0 # Temporary reg $t0 gets h + A[8]
sw $t0, 48($s3) # Stores h + A[8] back into A[12], note (12 x 4) used.

Constant or Immediate Operands:


For example, to add the constant 4 to register $s3, we could use the code
lw $t0, AddrConstant4($s1) # $t0 = constant 4
add $s3,$s3,$t0 # $s3 = $s3 + $t0 ($t0 == 4)
Alternative, that avoids the load instruction is to offer versions of the arithmetic instructions in
which one operand is a constant.
example: add immediate or addi instructions.
addi $s3, $s3, 4 # $s3 = $s3 + 4
Constant operands occur frequently, and by including constants inside arithmetic instructions,
operations are much faster and use less energy than if constants were loaded from memory.

5. Write short notes on Instructions and its types that are used in MIPS?
or
List the three MIPS instruction formats used to represent the instructions?
R-type Instructions
J-type Instructions
I-type Instructions

6. Write short Notes on Logical operators / operation?


7. Write short notes on Decision making and branching instructions?
Branch and Conditional branches: Decision making is commonly represented in
programming languages using the if statement, sometimes combined with go to statements
and labels. MIPS assembly language includes two decision-making instructions, similar to an if
statement with a go to.
8. Write short
notes on
addressing and
addressing modes? OR
What are addressing modes? Explain the various addressing modes with suitable examples.
Multiple forms of addressing are generically called addressing modes. This defines
how operands are identified for each addressing mode?. The MIPS addressing modes are the
following:

1. Immediate addressing, where the operand is a constant within the instruction itself
Example : Constant data specified in an instruction
addi $s3, $s3, 4
2. Register addressing, where the operand is a register
Example : add $t0, $s1, $s2
All the operands of the instruction are registers. It adds the value of $s1 and $s2
and store the result in &to.

3. Base or displacement addressing, where the operand is at the memory location whose
address is the sum of a register and a constant in the instruction
Example : lw $to , 32($s1)
The address of operand is sum of offset value (32) and the register value ($s1).

4. PC-relative addressing, where the branch address is the sum of the PC (Program Counter
) and a constant in the instruction
Example : beq $s1, $s2, Label

5. Pseudodirect addressing, where the jump address is the 26 bits of the instruction
concatenated with the upper bits of the PC
Example Direct addressing means specifying a complete 32 bit address in the instruction itself. However,
since MIPS instructions are 32 bits, we can't do that. In theory, you only need 30 bits to specify the
address of an instruction in memory. However, MIPS uses 6 bits for the opcode, so there's still not enough
bits to do true direct addressing.

Instead, we can do pseudo-direct addressing. This occurs in j instructions.


Opcode Target

B31-26 B25-0

ooo ooo tt tttt tttt tttt tttt tttt tttt

26 bits are used for the target. This is how the address for pseudo-direct addressing is computed.
PC <- PC31-28::IR25-0::00
Take the top 4 bits of the PC, concatenate that with the 26 bits that make up the target, and concatenate
that with 00. This produces a 32 bit address. This is the new address of the PC.

This allows you to jump to 1/16 of all possible MIPS addresses (since you don't control the top 4 bits of
the PC).

Draw the Diagram.


UNIT II
2. Explain Multiplication Algorithm along with an example.Multiplication of decimal numbers in long
hand, can be used to show the steps of multiplication and the names of the operands.
B.E/ B.TECH. DEGREE EXAMINATION, APRIL/MAY 2015
Third Semester
Computer Science and Engineering
CS 6303- COMPUTER ARCHITECTURE
(Regulation 2013)

Part - A (10 × 2 = 20 marks)

1. List the eight great ideas invented by computer architects.


(i) Design for Moore’s Law
(ii) Use Abstraction to Simplify Design
(iii) Make the Common Case F a st
(iv) Performance via Parallelism
(v) Performance via Pipelining
(vi) Performance via Prediction
(vii) Hierarchy of Memories
(viii) Dependability via Redundancy

2. Distinguish Pipelining from Parallelism.


 Performance can be improved by doing operations in parallel. Parallel execution is easier when
tasks are independent, but often they need to co-operate with each other.
 Pipelining is a technique in which multiple instructions are overlapped in execution. It exploits
parallelism among the instructions in a sequential instruction stream.

3. How overflow occur in subtraction.


Overflow occurs in subtraction when we subtract a negative number from a positive number and get a
negative result, or when we subtract a positive number from a negative number and get a positive result.
It means a borrow occurred from the sign bit.

4. What do you mean by sub word Parallelism?


By partitioning the carry chains within a 128-bit adder, a processor could use parallelism to perform
simultaneous operations on short vectors of sixteen 8-bit operands, eight 16-bit operands, four 32-bit
operands, or two 64-bit operands. The cost of such partitioned adders was small. This concept is called
as subword parallelism.

5. What are R-Type instructions?


R-Type instructions follows the following format. In this format, two registers are used as source register
and one register is used as destination register.
OP rs rt rd shamt funct
6 bits 5 bits 5 bits 5 bits 5 bits 6 bits

The meaning of each name of the fields is as follows:


op - Basic operation of the instruction, traditionally called the opcode.
rs - The first register source operand.
rt - The second register source operand.
rd - The register destination operand. It gets the result of the operation.
shamt - Shift amount.
funct - Function. This field, often called the function code, selects the
specific variant of the operation in the op field.

6. What is a branch prediction buffer?


A branch prediction buffer or branch history table is a small memory indexed by the lower portion of the
address of the branch instruction. The memory contains a bit that says whether the branch was recently
taken or not.

7. Differentiate between Strong scaling and weak scaling.


Strong scaling
Speedup achieved on a multiprocessor without increasing the size of the problem is called strong scaling.

Weak scaling
Speedup achieved on a multiprocessor while increasing the size of the problem proportionally to the
increase in the number of processors is called weak scaling.

8. Compare UMA and NUMA multiprocessors.


Uniform Memory Access(UMA)
Latency to a word in memory does not depend on which processor asks for it. That is, the time taken
to access a word is uniform for all processors.
Non-Uniform Memory Access (NUMA)
 Some memory access are much faster than others depending on which processor asks for which
word
 Here, the main memory is divided and attached to different microprocessors or to different
memory controllers on the same chip.
 Programming challenges are harder for NUMA than UMA.
 NUMA machines can scale to larger size.
 NUMAs can have lower latency to nearby memory

9. What is the need to implement memory as a hierarchy?


Computers use different types of memory units for different types of purposes. Each memory unit has its
own advantages and disadvantages. A structure that uses multiple levels of memories is called memory
hierarchy. A memory hierarchy consists of multiple levels of memory with different speeds and sizes.
The faster memories are more expensive per bit than the slower memories and thus are smaller. As the
distance from the processor increases, the size of the memories and the access time both increases.

10. Point out how DMA can improve I/O speed.


The CPU still responsible for initiating each block transfer. Then, the DMA interface controller can take
the control and responsibility of transferring data. So that data can be transferred without the intervention
of CPU. The CPU and IO controller interacts with each other only when the control of bus is requested.

PART - B (5 × 16 = 80 Marks)

11. (a) Discuss about the various techniques to represent instructions in a computer system. (Refer Section
2.4) (16)
Or
(b) What is the need for addressing in a computer system? Explain the different addressing modes with
suitable examples. (16)
(Refer Section 2.7)
12. (a) Explain the sequential version of multiplication algorithm and its hardware. (16)
(Refer Section 3.3.1)
Or
(b) Explain how floating point addition is carried out in a computer system. Give an example for a
binary floating point addition. (16)
(Refer Section 3.5.4)
13. (a) Explain the different types of pipeline hazards with suitable examples (16)
(Refer Section 4.5.1)
Or
(b) Explain in detail how exceptions are handled in MIPS architecture? (16)
(Refer Section 4.9)
14. (a) Discuss about SISD, MIMD, SIMD, SPMD and VECTOR systems. (16)
(Refer Section 5.4)
Or
(b) What is hardware multithreading? Compare and contrast Fine grained Multi-
Threading and coarse grained Multi- Threading. (16)
(Refer Section 5.5)
15. (a) Elaborate on the various memory technologies and its relevance. (16)
(Refer Section 6.3)
Or
(b) What is virtual memory?
Explain the steps involved in virtual memory address translation (16)
(Refer Section 6.5)
B.E/B.Tech. DEGREE EXAMINATION, MAY/JUNE 2016
Computer Science and Engineering
CS 6303 – COMPUTER ARCHITECTURE
(Regulation 2013)
Time : Three Hours Maximum : 100 Marks
Answer ALL questions
PART – A(10*2=20 Marks)
1. How to represent Instruction in a Computer System?
Instructions are kept in the computer as a series of high and low electronic signals and may be represented as numbers.
Since registers are referred in instructions, there must be a convention to map register names into numbers. Three types
of instruction formats are used in MIPS. They are:

 R-Type or R-Format (Register)


 I-Type or I - Format (Immediate)
 J- Type (Addressing in Branches and Jumps)

2. Distinguish between auto increment and auto decrement addressing mode.


A special case of indirect register mode. The register whose number is included in the instruction code, contains the
address of the operand.
In Autoincrement Mode, after operand addressing , the contents of the register is incremented.
Example: (R1)+
In Decrement Mode, before operand addressing, the contents of the register is decremented.
Example: -(R1)

3. Define ALU.
An arithmetic logic unit (ALU) is a digital circuit used to perform arithmetic and logic operations. It represents the
fundamental building block of the central processing unit (CPU) of a computer.
4. What is Subword Parallelism?
By partitioning the carry chains within a 128-bit adder, a processor could use parallelism to perform simultaneous
operations on short vectors of sixteen 8-bit operands, eight 16-bit operands, four 32-bit operands, or two 64-bit
operands. The cost of such partitioned adders was small. Given that the parallelism occurs within a wide word, the
extensions are classified as subword parallelism.

5. What are the advantages of pipelining?


 The cycle time of the processor is reduced;
 Pipelining increases the instruction throughput.

6. What is Exception?
Exceptions are also called as interrupt. It is an unscheduled event that disrupts program execution. It is also used to
detect an overflow condition. Events other than branches or jumps that change the normal flow of instruction
execution comes under exception.

7. State the need for Instruction Level parallelism?


Pipelining improves performance by increasing the rate at which instructions can be executed. However, there are
limits to how much pipelining can improve performance. As more and more pipeline stages are added to a processor,
the delay of the pipeline register is also increased and thereby reducing the benefit of increasing the pipeline depth.
Instruction Level Parallelism enables more number of instructions that can be performed simultaneously during a
single clock cycle.

8. What is Fine grained Multithreading?


Fine-grained multithreading switches between threads after every instruction. It makes interleaved execution of
multiple threads at the same time. This interleaving is done in round robin fashion on every clock cycle.

9. Define Memory hierarchy.


Computers use different types of memory units for different types of purposes. Each memory unit has its own
advantages and disadvantages. A structure that uses multiple levels of memories is called memory hierarchy. A
memory hierarchy consists of multiple levels of memory with different speeds and sizes. The faster memories are
more expensive per bit than the slower memories and thus are smaller. As the distance from the processor increases,
the size of the memories and the access time both increases.

10. State the advantages of virtual memory.


The primary advantage or objective of Virtual Memory systems is the ability to load and execute a process that
requires a larger amount of memory than what is available by loading the process in parts and then executing them.
It eliminates external fragmentation.

PART - B (5*16=80 Marks)


11. (a) Discuss about the various components of a computer system. (16)
Refer section 1.3
(b) Elaborate the different types of addressing modes with a suitable example (16)
Refer section 2.7

12. (a) Explain briefly about floating point addition and Subtraction algorithms. (16)
Refer section 3.5.4
(b) Define Booth Multiplication algorithm with suitable example. (16)
(Out of syllabus)
Booth's algorithm can be implemented by repeatedly adding (with ordinary unsigned binary addition) one of two
predetermined values Aand S to a product P, then performing a rightward arithmetic shift on P. Let m and r be the
multiplicand and multiplier, respectively; and let x and y represent the number of bits in m and r.
1. Determine the values of A and S, and the initial value of P. All of these numbers should have a length equal to
(x + y + 1).
 A: Fill the most significant (leftmost) bits with the value of m. Fill the remaining (y + 1) bits with zeros.
 S: Fill the most significant bits with the value of (−m) in two's complement notation. Fill the remaining
(y + 1) bits with zeros.
 P: Fill the most significant x bits with zeros. To the right of this, append the value of r. Fill the least
significant (rightmost) bit with a zero.
2. Determine the two least significant (rightmost) bits of P.
 If they are 01, find the value of P + A. Ignore any overflow.
 If they are 10, find the value of P + S. Ignore any overflow.
 If they are 00, do nothing. Use P directly in the next step.
 If they are 11, do nothing. Use P directly in the next step.
3. Arithmetically shift the value obtained in the 2nd step by a single place to the right. Let P now equal this new value.
4. Repeat steps 2 and 3 until they have been done y times.
5. Drop the least significant (rightmost) bit from P. This is the product of m and r.
13. (a) What is pipelining? Discuss about pipelined data path and control. (16)
Refer section 4.6
(b) Briefly explain about various categories of hazards with examples. (16)
Refer section 4.7 and 4.8
14. (a) Explain in detail about Flynn’s classification. (16)
Refer section 5.4
(b) Write shot Notes on : (16)
(i) Hardware multithreading Refer section 5.5
(ii) Multicore processors. Refer section 5.6

15. (a) Define Cache Memory ? Explain the various mapping techniques associated with cache memories (16)
Refer section 6.4.3
(b) Explain about DMA controller, with the help of a block diagram. (16)
Refer section 7.3
B.E/B.TECH. DEGREE EXAMINATION, NOVEMBER/DECEMBER 2014.
Third Semester
Computer Science and Engineering
CS 6303 – COMPUTER ARCHITECTURE
(Regulation 2013)
Time : Three hours Maximum : 100 marks
Answer ALL questions

PART A – (10 * 2 = 20 marks)

1. State Amdahl’s Law.


Amdahl’s Law is used to find the execution time of a program after making the improvement. It can
be represented in an equation as follows:
𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒 𝑎𝑓𝑡𝑒𝑟 𝑖𝑚𝑝𝑟𝑜𝑣𝑒𝑚𝑒𝑛𝑡
𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒 𝑎𝑓𝑓𝑒𝑐𝑡𝑒𝑑 𝑏𝑦 𝑖𝑚𝑝𝑟𝑜𝑣𝑒𝑚𝑒𝑛𝑡
= + 𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒 𝑢𝑛𝑎𝑓𝑓𝑒𝑐𝑡𝑒𝑑
𝐴𝑚𝑜𝑢𝑛𝑡 𝑜𝑓 𝑖𝑚𝑝𝑟𝑜𝑣𝑒𝑚𝑒𝑛𝑡
Hence, Amdahl’s Law can be used to estimate performance improvements.

2. Brief about Relative addressing mode with an example.


In this addressing mode, the branch address is the sum of the PC and a constant in the instruction.
Figure depicts the format of PC-relative addressing.

6 5 5 16 bits

op rs rt Address

Memory

PC + Word

PC-relative Addressing
Example:
bne $s0,$s1,Exit # go to Exit if $s0 ≠ $s1

In the above example branch address is calculated by adding the PC value with the constant in the
instruction.

3. Define Little Endian arrangement.


In Little - Endian type, the lower byte addresses are used for the less significant bytes (the
rightmost bytes) of the word.

4. What is DMA?
The CPU is responsible for only initiating each block transfer. Then, the interface controller can
take the control and responsibility of transferring data. So that data can be transferred without the
intervention of CPU. The CPU and IO controller interacts with each other only when the control
of bus is requested. This level of IO control is called Direct Memory Access (DMA).

5. What is the need for Speculation?


Speculation is an approach that allows the compile or processor to guess about the outcome of an
instruction, so as to enable execution to begin for other instruction that depend on speculated
instruction. Speculation, performed by the compiler or the hardware, can increase the amount of
ILP that can be exploited via prediction. For example, if branch is predicted earlier, instruction
following branch can be executed earlier.

6. What is Exception?
Exceptions are also called as interrupt. It is an unscheduled event that disrupts program execution.
It is also used to detect an overflow condition. Events other than branches or jumps that change
the normal flow of instruction execution comes under exception.

7. What is Flynn’s Classification? (Refer Section 5.4.2)


Flynn’s classification is based on multiplicity of instruction streams and data streams observed by
the CPU during program execution. One categorization of parallel hardware is based on the
number of instruction streams and the number of data streams. Figure shows the various
categories of hardware based on number of instruction streams and data streams.

Data Streams
Single Multiple
Instruction Single SISD : Intel Pentium 4 SIMD: SSE Instructions of x86
Streams Multiple MISD: No examples MIMD: Intel Core I7

8. Brief about Multithreading. (Refer Section 5.5)


Multithreading is first introduced to exploit thread-level paprallelism within a processor.
Hardware multithreading allows multiple threads to share the functional units of a single
processor in an overlapping fashion.

9. Difference Programmed I/O and Interrupt I/O. (Refer Section 7.2)


Programmed-driven I/O means the program is polling or checking some hardware item e.g. mouse
within a loop. It is easy to program and understand, but it is slow and inefficient.
For Interrupt driven I/O, the same mouse will trigger a signal to the program to process the mouse
event. It is fast and efficient, but difficult to implement.

10. What is the purpose of Dirty/Modified bit in Cache memory?


It Indicates whether the page has been modified during residency in the memory. This information is
needed to determine whether the page should be written back in the disk before it is removed from
memory.

PART B –(5 * 16 = 80 marks)


11. (a) (i) Assume a two address format specified as source, destination. Examine the following
sequence of instructions and explain the addressing modes used and the operation done in every
instruction. (10)
(1) Move (r5)+, R0
(2) Add (R5)+, R0
(3) Move R0, R(5)
(4) Move 16(R5), R3
(5) Add #40,R5 (Out of Syllabus)
(ii) Consider the computer with three instruction classes and CPI measurements as given
below and Instruction counts for each instruction class for same program from two
different compilers are given. Assume that the computer’s clock rate is 4GHZ. Which
Code sequence will execute faster according to execution time? (6)
Code from CPI for this Instruction Class
A B C
CPI 1 2 3
Code from Instruction Count for each Class
A B C
Compiler 1 2 1 2
Compiler 2 4 1 1

(Refer Example 1.4 )


Or
(b) (i) Explain the components of a computer System. (8)
(Refer Section 1.3)
(ii) State the CPU performance equation and discuss the factors that affect
performance. (8)
(Refer Section 1.5)
12. (a) (i) Multiply the following pair of signed nos. using Booth’s bit-pair recording of the
multiplier. A = +13 (Multiplicand) and B = -6 (Multiplier). (10)
(Out of Syllabus)
(ii) Briefly Explain Carry lookahead adder. (6)
(Out of Syllabus)
Or
(b) Divide (12)10 By (3)10 using the Restoring and Non restoring division algorithm with step
by step intermediate results and explain. (16)
(Out of Syllabus)
13. (a) Explain Data path and its control in detail. (16)
(Refer Section 4.3 & 4.4)
Or
(b) What is Hazard? Explain its types with suitable examples. (16)
(Refer Section 4.5.1)
14. (a) Explain Instruction level Parallel Processing. State the challenges of parallel
Processing (16)
(Refer Section 5.2 & 5.3)
Or
(b) Explain the terms:
(i) Multicore Processor. (Refer Section 5.6)
(ii) Hardware Multithreading. (Refer Section 5.5) (16)
15. (a) (i) Explain mapping functions in cache memory to determine how memory blocks
are placed in Cache. (8)
(Refer Section 6.4.3)
(ii) Explain in detail about the Bus Arbitration techniques in DMA (8)
(Refer Section 7.3.4)
Or
(b) (i) Draw different memory address layouts and brief about the technique used to
increase the average rate of fetching words from the main memory.
(8)
(Refer Section 6.3)
(ii) Explain in detail about any two Standard Input and Output Interface required to
connect the I/O device to the Bus. (8)
(Refer Section 7.3)
B.E/ B.TECH. DEGREE EXAMINATION, NOVEMBER/DECEMBER 2015
Third Semester
Computer Science and Engineering
CS 6303- COMPUTER ARCHITECTURE
(Regulation 2013)

Part - A (10 × 2 = 20 marks)

1. What is instruction set architecture?


The words of a computer’s language are called instructions, and its vocabulary is called an instruction set. It is
also called as instruction set architecture.
2. How CPU execution time for a program is calculated?
CPU execution time or simply CPU time, which recognizes this distinction, is the time the CPU spends
computing for this task and does not include time spent waiting for I/O or running other programs. CPU time
can be further divided into the CPU time spent in the program, called user CPU time, and the CPU time spent
in the operating system performing tasks on behalf of the program, called system CPU time.
3. What are the overflow / underflow conditions for addition and subtraction?
The following table shows the combination of operations, operands, and results that indicate an overflow.

Operation Operand A Operand B Result


indicating overflow
A+B ≥0 ≥0 < 0
A+B <0 <0 ≥0
A–B ≥0 <0 <0
A–B <0 ≥0 ≥0

4. State the representation of double precision floating point number.


A floating-point value represented in two 32-bit words is called Double precision. The representation of
double precision floating-point numbers takes two MIPS words as shown below, , where s is the sign of the
floating-point number (0 – positive, 1 – negative ), exponent is the value of the 11-bit exponent field, and
fraction is the 52-bit number in the fraction field.

31 30 ….. 20 19 ….. 0
S Exponent Fraction
1 bit 11 bits 20 bits

Fraction (Continued)
32 bits
5. What is a hazard? What are its types?
There are situations in pipelining when the next instruction cannot execute in the following clock cycle. These
events are called hazards, and there are three different types:
(i) Structural hazard
(ii) Data hazard
(iii) Control hazard

6. What is meant by branch prediction?


It is a method to resolve control hazard that assumes the outcome of a branch and proceeds from that
assumption rather than waiting to determine the actual outcome. It is more suitable for loops in programming.
At the bottom of loops are branches that jump back to the top of the loop. Since they are likely to be taken and
they branch backward, we could always predict taken for branches that jump to an earlier address.

7. What is ILP?
Instruction Level Parallelism (ILP) is a measure of number of instructions that can be performed
simultaneously during a single clock cycle. The potential overlap among instructions is called as instruction
level parallelism.

8. Define Super scalar processor.


Dynamic Multiple Issue Processor is also called as super scalar processors or simply superscalar. In the
simplest superscalar processors, instructions are issued in certain order. The processor decides whether zero,
one, or more instructions can be issued in a given clock cycle. More than one instruction is executed per clock
cycle by selecting them during execution. Compiler tries to schedule instruction to move dependences
separately, so as to improve instruction issue rate.
9. What are the various memory technologies?
There are four primary technologies used now in memory hierarchies. They are:
1. Static Random Access Memory (SRAM)
2. Dynamic Random Access Memory (DRAM)
3. ROM and Flash memory
4. Magnetic disk
10. Define Hit ratio.
The fraction of memory accesses found in a level of the memory hierarchy is called as hit rate or hit ratio. It is
used as a measure of the performance of the memory hierarchy.

PART - B (5 × 16 = 80 Marks)

11. a) Explain in details the various components of computer system with neat diagram. (16)

(Refer Section 1.3)


(Page no. 1.6 – 1.9)
(OR)

b) What is an addressing mode? Explain the various addressing modes with suitable examples.
(16)

(Refer Section 2.7)


(page no. 2.33 -2.36)
12.a) Explain in details about the multiplication algorithm with suitable examples and diagram.(16)

(Refer Section 3.3.1)


Page no 3.25 -3.28
(OR)

b) Discuss in details about division algorithm in details with diagram and examples. (16)
(Refer Section 3.4.1)
Page no 3.34 -3.37

13. a) Explain the basics MIPS implementation with necessary multiplexers and control lines.( 16)

Page nos. 4.1 – 4.3 , 4.8,4.16, figure 4.6

(OR)

b) Explain how the instruction pipeline works? What are the various situations where an instruction pipeline can
stall? Illustrate with an example (16)

(Refer Section 4.5)


Page no 4.19 – 4.24
14. a) Explain in details Flynn’s classification of parallel hardware. (16)

(Refer Section 5.4)


Page no. 5.14 – 5.21 till 5.4.6
(OR)

b) Explain in detail about hardware Multithreading. (16)

(Refer Section 5.5)page 5.23 -2.26


15. a) What is virtual memory? Explain in details about how virtual memory is implemented with neat diagram?
(16)

(Refer Section 6.5)page no 6.43 -6.48


(OR)

b) Draw the typical block diagram of a DMA Controller and explain how it is used for direct data transfer between
memory and peripherals? (16)

(Refer Section 7.3)


Page no. 7.7 – 7.11 till 7.3.3
Question Paper Code – 71675
B.E/ B.TECH. DEGREE EXAMINATION, APRIL/MAY 2017
Third/Fifth/Sixth Semester
Computer Science and Engineering
CS 6303- COMPUTER ARCHITECTURE
(Common to ECE,E&I,I&C,R&AE,IT)
(Regulation 2013)
PART-A
1. List the major components of a computer system.
 input unit
 output unit
 memory unit
 arithmetic and logic unit
 control unit.

2. State the need for indirect addressing mode.Give an example.


The effective address of the operand is the contents of a register or main memory location, location whose
address appears in the instruction. The register or memory location that contains the address of the
operand is a pointer. When an execution takes place in such mode, instruction may be told to go to a
specific address. Once it's there, instead of finding an operand, it finds an address where the operand is
located.
3. Subtract (11010)2-(10000)2 using 1’s complement and 2’s complement method.
1’s complement method
(i) Take 1’s complement of subtrahend : 01111
(ii) Add with minuend: 11010 + 01111 = 1 01001
(iii) If there is a carry add with result: 01010
(iv) The result is 01010
2’s complement method
(i) Take 2’s complement of subtrahend : 10000
(ii) Add with minuend: 11010 + 10000 = 1 01010
(iii) If there is a carry discard it : 01010
(iv) The result is 01010

4. Write the rules to perform addition on floating point numbers.


In order to add, we need the exponents of the two numbers to be the same.
Determine which exponent is the smaller exponent. Rewrite that number using the larger exponent, so
that the two exponents are now the same.
5. Name the control signals required to perform arithmetic operations.
1. RegDst
2. RegWrite
3. ALUSrc
4. MemtoReg
6. Define hazard.Give an example for data hazard.
When an instruction cannot execute in the next clock cycle, in a pipelining is called hazard these events
are called hazards.
 Data hazard occurs when the pipeline must be stalled because one step must wait for another to
complete.
For example, suppose we have an add instruction followed immediately by a subtract instruction that
uses the sum ($s0):
add $s0, $t0, $t1
sub $t2, $s0, $t3

7. What is instruction level parallelism?


Instruction Level Parallelism (ILP) is a measure of number of instructions that can be performed
simultaneously during a single clock cycle. The potential overlap among instructions is called as
instruction level parallelism.
8. Distinguish implicit multithreading and explicit multithreading.
Explicit multi threading concurrently execute instructions from different explicit threads.
Implicit multi threading is concurrent execution of multiple threads extracted from single sequential
program.
9. Define memory interleaving.
Memory interleaving is a category of techniques for increasing memory speed. For example, with
separate memory banks for odd and even addresses, the next byte of memory can be accessed while the
current byte is being refreshed.
10. Summarize the sequence of events involved in handling an interrupt request from a single
device.
If there is an interrupt, the interrupt handler will be triggered. The handler will stop the present instruction
which is processing and save its configuration in a register and load the program counter of the interrupt
from a location which is given by the interrupt vector table. After processing the interrupt by the
processor interrupt handler will load the instruction and its configuration from the saved register, process
will start its processing where it’s left.
PART-B
11 (a) Explain the important measures of the performance of a computer and derive the basic
performance equation.
Refer section 1.5 (Page No.1.17, 1.19 and 1.20)
Or
(b) Explain direct, immediate, relative and indexed addressing modes with examples.
Refer section 2.7 (Page No. 2.33 - 2.35)

12 (a) (i) Demonstrate multiplication of two binary numbers with an example?


Design an arithmetic element to perform this multiplication.
Refer section 3.3.1 (Page No. 3.28)
(ii) Describe non restoring division with an example. (Out of Syllabus)
Or
(b) (i) Design an arithmetic element to perform the basic floating point operations.
Refer section 3.5.4, 3.5.5 (Page No. 3.56, 3.64)

(ii) What is meant by sub word parallelism?Explain.


Refer section 3.6 (Page No. 3.79)

13 (a) Discuss the modified data path to accommodate pipelined executions with a diagram.
Refer section 4.6.1 (Page No. 4.26)

Or
(b) (i) Explain the hazards caused by unconditional branching statements.
Refer section 4.8 (Page No. 4.38 – 4.44)
(ii) Describe operand forwarding in a pipeline processor with a diagram.
Refer section 4.7.1 (Page No. 4.32 – 4.34)
14 (a) (i) Discuss the challenges in parallel processing with necessary examples.
Refer section 5.3 (Page No. 5.12 – 5.13)
(ii) Explain Flynn’s classification of parallel processing with necessary diagrams.
Refer section 5.4 (Page No. 5.16 – 5.20)
Or
(b) Explain the four principal approaches to multithreading with necessary diagrams.
Refer section 5.5 (Page No. 5.23 – 5.26)
15 (a) Explain the different mapping functions that can be applied on cache memories in detail.
Refer section 6.4.3 (Page No. 6.23 – 6.30)
Or
(b) (i) Explain virtual memory address translation in detail with necessary diagrams.
Refer section 6.5.2 (Page No. 6.44 – 6.46)
(ii) What is meant by Direct Memory Access?Explain the use of DMA controllers in a computer
system.
Refer section 7.3 (Page No. 7.7 – 7.11)

PART-C
16 (A) (i) Explain mapping functions in cache memory to determine how memory blocks are placed in
cache.
Refer section 6.4.3 (Page No. 6.23 – 6.30)
(ii) Explain in detail about the Bus Arbitration techniques in DMA.
Refer section 7.3.4 (Page No. 7.11 – 7.15)
Or
(b) A pipelined processor uses delayed branch techniques Recommend any one of the following
possibility for the design of the processor. In the first possibility,the processor has a 4-stage pipeline and
one delay slot.In the second possibility,it has a 6-stage pipeline and two delay slots.Compare the
performance of these two alternatives,taking only the branch penalty into account.Assume that 20% of the
instructions are branch instructions and that an optimizing compiler has an 80% success rate in filling in
the single delay slot.For the second alternative,the compiler is able to fill the second slot 25% of the time.
ANSWER
 The first possibility is the better option when compared with the second alternative.
 It produces 80% success rate in filling in the single delay slot whereas the second alternative fill
the second slot 25% of the time.
 Hence the first alternative can be selected.
 Diagrams should be drawn for four stage pipeline and five stage pipelines with one delay slot and
two delay slots respectively by considering an example code which contains 20% of the
instructions as branch instructions.
 From the diagrams it is clear that the first option is the better one.
Question Paper Code : 40902
B.E/B.Tech. DEGREE EXAMINATION, APRIL/MAY 2018
Third/Fifth/Sixth Semester
Computer Science and Engineering
CS 6303 – COMPUTER ARCHITECTURE
(Common to : Electronics and Communication Engineering /
Electronics and Instrumentation Engineering / Instrumentation and
Control Engineering / Robotics and Automation Engineering /
Information Technology)
(Regulation 2013)
Time : Three Hours Maximum : 100 Marks
Answer ALL questions
PART – A (10 × 2=20 Marks)
1. Write the equation for the dynamic power required per transistor.
The power required per transistor is just the product of energy of a
transition and the frequency of transitions:
Power ∝ 1/2 × Capacitive load × Voltage2 × Frequency switched\
2. Classify the instructions based on the operations they perform
and give one example to each category.
Classifications of Instructions:
Arithmetic – add $s1,$s2,$s3
Data transfer - lw $s1,20($s2)
Logical – and $s1,$s2,$s3
Conditional branch - beq $s1,$s2, 25
Unconditional branch – j 2500
3. Show the IEEE 754 binary representation of the number (-0.75)10
in single precision.
IEEE 754 representation of (-0.75)10 in single precision format
The number (-0.75)10 is represented in binary form as follows:
0
-0.112 × 2
and in normalized scientific notation, it is
-1
-1.12 × 2
The general representation for a single precision number is,
 Solved Anna University Question Papers      SQ.71 
s (Exponent – 127)
(-1) × (1 + Fraction) × 2
-1
Subtracting the bias 127 from the exponent of -1.1two × 2 yields
1 (126 – 127)
(-1) × (1 + .1000 0000 0000 0000 0000 000two) × 2
The single precision binary representation of -0.75ten is then

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 11 1 1 1 10 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 bit 8 bits 23 bits

4. Define a datapath in CPU.


Data Path : The component of the processor that performs arithmetic
operations is called datapath.
The datapath performs the arithmetic operations, and control tells the
datapath, memory, and I/O devices what to do according to the
wishes of the instructions of the program.
5. What is the ideal CPI of a pipelined processor?
Ideal CPI of a pipelined processor is almost 1.
6. What is meant by exception? Give one examples of MIPS
exception.
Exception is also called interrupt. It is defined as an unscheduled
event that disrupts program execution; used to detect overflow.
Example : Arithmetic overflow.
7. Protein String Matching Code has 4 days execution time on
current machine doing integer instructions in 20% of time, doing
I/O in 35% of time and other operations in the remaining time.
Which is the better tradeoff among the following two proposals?
First: Compiler optimization that reduces number of integer
instructions by 25% (assume each integer instruction takes the
same amount of time); Second: Hardware optimization that
reduces the latency of each IO operations from 6µs to 5µ.
Second Proposal: Hardware optimization that reduces the latency of
each IO operations from 6µs to 5µ is the best, because 35% of the
time is consumed by I/O operations whereas integer instructions takes
only 20% of the time.
Further time take for executing integer instructions is very minimum
when compared to time taken for an I/O operation. Hence, the second
is the best proposal.
 SQ.72   Computer Architecture 

8. Give examples for each class in Flynn’s Classification.


Flynn’s Classification
Data Streams
Single Multiple
SISD : Intel Pentium SIMD: SSE Instructions
Instruction Single 4 of x86
Streams MISD: No examples
Multiple MIMD: Intel Core I7

9. Distinguish SARM and DRAM.

Static RAM Dynamic RAM


 SRAM uses transistor to  DRAM uses a separate
store a single bit of data capacitor to store each bit of
 SRAM does not need data.
periodic refreshment to  DRAM needs periodic
maintain data refreshment to maintain the
 SRAM’s structure is charge in the capacitors for
complex than DRAM data.

 SRAM are expensive as  DRAM’s structure is simplex


compared to DRAM than SRAM.
 SRAM are faster than  DRAM’s are less expensive as
DRAM compared to SRAM.
 SRAM are used in Cache  DRAM’s are slower than SRAM.
memory.  DRAM are used in Main
memory.

10. What is the use of DMA Controller?


Direct memory access (DMA) is a method that allows an input/output
(I/O) device to send or receive data directly to or from the main
memory, bypassing the CPU to speed up memory operations. The
process is managed by a chip known as a DMA controller (DMAC).

PART – B (5 × 13 = 65 Marks)
11. a) i) Consider three different processors P1, P2 and P3 executing
the same instruction set. P1 has a 3 GHz clock rate and a
CPI of 1.5. P2 has a 2.5 GHz clock rate and a CPI of 1.0 P3
has a 4.0 GHz clock rate and has a CPI of 2.2.
 Solved Anna University Question Papers      SQ.73 

(a) Which processor has the highest performance


expressed in instructions per second?
CPU execution time can be calculated using the following
formula:
Instruction count  CPI
CPU Time =
Clock rate
Let the instruction count be ‘n’ and assume it is same for all the three
processors. Now calculate the CPU time for each processor.
n  1.5
CPU TimeP1 = 9
 n  0.5  109 sec
3  10
n  1.0
CPU TimeP2 = 9
 n  0.4  109 sec
2.5  10
n  2.2
CPU TimeP3 = 9
 n  0.55  109 sec
4  10
CPU time for processor P2 is smallest than other processor. Hence,
P2 produces highest performance than other processors.

(b) If the processors each execute a program in10 seconds,


find the number of cycles and the number of instructions
in each processor. (5)
(i) Number of clock cycles
CPU time for P1, P2 and P3 = 10 sec
The formula for finding CPU time can be rewritten to find the number
of clock cycles.
CPU clock cycles
CPU time =
Clock rate
CPU Clock cycles = CPU time × Clock rate
9 9
CPU Clock cyclesP1 = 10 sec × 3 × 10 Hz = 30 × 10 cycles
9
CPU Clock cyclesP2 = 10 sec × 2.5 × 109 Hz = 25 × 10 cycles
CPU Clock cycles = 10 × 4 × 10 H = 40 × 10 cycles
(ii) Instruction Count
The formula for finding CPU clock cycles can be rewritten to find the
instruction count.
 SQ.74   Computer Architecture 

CPU clock cycles = Instruction Count × CPI


CPU clock cycles
Instruction Count =
CPI
30 × 10
Instruction count = = 20 × 10 instructions
1.5
25 × 10
Instruction count = = 25 × 10 instructions
1.0
40 × 10
Instruction count = = 18.18 × 10 instructions
2.2
(ii) Explain in detail the components of a computer system. (8)
Refer Section 1.3
(or)
(b) (i) Translate the following C code to MIPS assembly code. Use
a minimum number of instructions. Assume that I and k
correspond to registers $s3 and $s5 and the base of the
array save is in $s6. What is the MIPs assembly code
corresponding to this C segment? (5)
while (save[i] = = k)
i+1 = 1;
The first step is to load save[i] into a temporary register. Before we
can load save[i] into a temporary register, we need to have its address.
Before we can add i to the base of array save to form the address, we must
multiply the index i by 4 due to the byte addressing problem. We can use
shift left logical, since shifting left by 2 bits multiplies by 4
We need to add the label Loop to it so that we can branch back to that
instruction at the end of the loop:
Loop: sll $t1,$s3,2 # Temp reg $t1 = i * 4
To get the address of save[i], we need to add $t1 and the base of
save in $s6:
add $t1,$t1,$s6 # $t1 = address of save[i]
Now we can use that address to load save[i] into a temporary register:
lw $t0,0($t1) # Temp reg $t0 = save[i]
 Solved Anna University Question Papers      SQ.75 

The next instruction performs the loop test, exiting if save[i] ≠ k:


bne $t0,$s5, Exit # go to Exit if save[i] ≠ k
The next instruction adds 1 to i:
addi $s3,$s3,1 #i=i+1
The end of the loop branches back to the while test at the top of the
loop. We just add the Exit label after it.
j Loop # go to Loop
Exit:
ii) What is an addressing mode in a computer ? Classify
MIPS addressing modes and give one examples
instruction to each category. (8)
Refer Section 2.7
12. a) i) Perform X+Y and Y-X using 2’s complements for given
the two binary numbers X = 0000 1011 1110 1111 and Y
= 1111 0010 1001 1101. (5)
Perform X+Y:
X = 0000 1011 1110 1111 (+)
Y = 1111 0010 1001 1101
X+Y = 1111 1010 1000 1100
Perform Y-X:
Step -1: Find 2’s complement of X;
Step -2 : Add the result with Y;

(i) Find 2’s complement of X:


X = 0000 1011 1110 1111
1’s complement of X = 1111 0100 0001 0000
2’s complement of X = 1111 0100 0001 0001
(ii) Add with y
1111 0100 0001 0001 (+)
1111 0010 1001 1101
1 1111 0110 1010 1110
Y-X = 1110 0110 1010 1110
 SQ.76   Computer Architecture 

ii)
Multiply the following signed 2’s compliment numbers
using the Booth algorithm. A = 001110 and B = 111001
where A is multiplicand and B is multiplier. (8)
A = 001110 (Multiplicand – m)
B = 111001 (Multiplier Q)
-M = 110010
Step M A Q Q-1 Action
0 001110 000000 111001 0 Initial values
1 001110 110010 111001 0 Q0 Q-1 = 10;
001110 111001 011100 1 A = A-N
Shift Right AQQ-1
2 001110 000111 011100 1 Q0 Q-1 = 01; A +
001110 000011 101110 0 M
Shift right AQQ-1
3 001110 000111 101110 0 Q0 Q-1 = 00; NOP
001110 000011 110111 0 Shift Right AQQ-1
4 001110 110101 110111 0 Q0Q-1 = 10;
001110 111010 111011 1 A = A-N
Shift Right AQQ-1
5 001110 111010 111011 1 Q0 Q1 = 11; NOP
001110 111101 011101 1 Shift Right AQQ-1
6 001110 111101 011101 1 Q0 Q-1 = 11; NOP
001110 111110 101110 1 Shift right AQQ-1

Product = AQ
Product = 111110 101110
(or)
b) i) Draw the block diagram of integer divider and explain
the division algorithm. (5)
Refer section 3.4.1
ii) Add the numbers (0.75)10 and (-0.275)10 in binary using
the Floating point addition algorithm. (8)

(i) Convert the numbers into binary is


(0.75)10 = 0.112 × 20 = 1.1 × 2-1
(-0.275)10 = -0.010001 × 20 = -1.0001 × 2-2
 Solved Anna University Question Papers      SQ.77 

(ii) Equalize exponents:


-1.0001 × 2-2 to shifted to right.
= -0.10001 × 2-1
(iii) Add significands:
1.10000 (+)
-0.10001
0.11111 × 2-1
(iv) Normalization
= 1.1111 × 2-2
Sum = 1.1111 × 2-2
13. a) Design a simple datapath with the control unit explain in
detail.(13)
Refer section 4.4.2
(or)
b) Discuss the limitations of pipelining a processor’s datapath.
Suggest the methods to overcome them. (13)
Pipelining increases the CPU instruction throughput - the number of
instructions completed per unit of time. But it does not reduce the
execution time of an individual instruction. In fact, it usually slightly
increases the execution time of each instruction due to overhead in the
pipeline control.
The increase in instruction throughput means that a program runs
faster and has lower total execution time.
Limitations on practical depth of a pipeline arise from:
 Pipeline latency. The fact that the execution time of each instruction
does not decrease puts limitations on pipeline depth;
 Imbalance among pipeline stages. Imbalance among the pipe
stages reduces performance since the clock can run no faster than
the time needed for the slowest pipeline stage;
 Pipeline overhead. Pipeline overhead arises from the combination of
pipeline register delay (setup time plus propagation delay) and clock
skew.
 Once the clock cycle is as small as the sum of the clock skew and
latch overhead, no further pipelining is useful, since there is no time
left in the cycle for useful work.
 SQ.78   Computer Architecture 

14. a) i) List the limitations of instruction level parallelism. (5)


Refer section 5.2.3
ii) What are the challenges in parallel processing? (8)
Refer section 5.3
(or)
b) i) Compare and contrast fine-grained multi-threading,
coarse-grained multi-threading and simultaneous multi-
threading.(9)
Refer section 5.5
ii) Classify shared memory multiprocessor based on the
memory access latency. (4)
Refer section 5.6.1
15. a) i) What is the need of Cache memory ? List the three
mapping methods of Cache memory and explain any
two. (10)
Refer section 6.4.3
ii) Define virtual memory. What is the advantage of using
virtual memory? (3)
Virtual memory is a memory management capability of an OS that
uses hardware and software to allow a computer to compensate for
physical memory shortages by temporarily transferring data from random
access memory (RAM) to disk storage.
It allows us to run more applications on the system than we have
enough physical memory to support. Virtual memory is simulated memory
that is written to a file on the hard drive. That file is often called page file or
swap file. It's used by operating systems to simulate physical RAM by
using hard disk space.
(or)
b) i) Discuss about Programmed I/Os associated with
computers. (6)
Refer section 7.2
ii) Write the sequence of operations carried out by a
processor when interrupted by a peripheral device
connected to it. (7)
Refer section 7.4.1
 Solved Anna University Question Papers      SQ.79 

PART – C (1 × 15 = 15 Marks)

16. a) i) The following sequence of instructions are executed in


the basic 5-stage pipelined processor : (15)
or r1, r2, r3
or r2,r1,r4
or r1,r1,r2
a) Indicate dependences and their type.
Instructions:
I1 : or r1, r2, r3
I2 : or r2, r1, r4
I3 : or r1, r1, r2
Dependencies:
1. True Data dependency:
 Instruction I2 and I1 have true data dependency, because I2 can
be executed only when I1 in completed ie, I2 needs r1 which is
available only when I1 is completed.
 Similar kind of data dependency exists between I1 and I3, I2 and
I3.
2. Output dependency:
 Instruction I1 and I3 are having output dependency, because I3
cannot be executed before I1.
No forwarding:
Data hazard occurs as shown below:
b) Assume there is no forwarding in this pipelined processor.
Indicate hazards and add NOP instructions to eliminate
them.
I1 : or r1, r2, r3
I2 : or r2, r1, r4
I3 : or r1, r1, r2
CC1 C C2 C C3 C C4 C C5 C C6 CC7

IF ID MEM WB
r1
r1
IK ID MEN WB
r2
IK ID MEM
 SQ.80   Computer Architecture 

 As in figure, r1 is available only at CC5, but instructions I3 need


r1 at CC3 and CC4. This is a situation of data hazard.
 NOP instruction can be inserted to avoid data hazard.
 In instruction I2, after instruction fetch (IF) ID stage is delayed
until CC5, so that r1 will be available at CC5.
 Similar case exist between I2 & I3.
I1: or r1, r2, r3
I2 : NOP
I3 : NOP
I4 : or r2, r1, r4
I5 : NOP
I6 : NOP
I7 : or r1, r1, r2
c) Assume there is full forwarding. Indicate hazards and add
NOP instructions to eliminate them.
Forwarding:
I1 : or r1, r2, r3
I2 : or r2, r1, r4
I3 : or r1, r1, r2
CC1 C C2 C C3 C C4 C C5 C C6 CC7

IF ID MEM WB
r1
IF ID MEN WB
r2
IF ID MEM WB

 When forwarding in applied, there is no need of NOP instruction.


 R1 is forwarded to instructions I2 & I3 as shown in fig at CC3.
 Similarly, r2 in forwarded to I3 at CC4.

(or)

b) Explain the detail of DMA control with suitable diagrams.


Discuss how it improve the overall performance of the
system. (15)

Refer section 7.3


www.Vidyarthiplus.com

VIDYARTHIPLUS - ANNA UNIVERSITY ONLINE STUDENTS COMMUNITY


Department of Computer Science and Engineering

CS6303 COMPUTER ARCHITECTURE


Question Bank
Year :II year (2014-15) ODD SEMESTER

UNIT-1OVERVIEW AND INSTRUCTIONS

PART- A

1. What are the eight great ideas in computer architecture?


2. What are the five classic components of a computer?
3. What is the function of data path and control path?
4. What is instruction set Architecture?
5. Define application binary interface
6. Differentiate DRAM and SRAM.
7. Compare Volatile and nonvolatile memory.
8. List the advantages of Network Computer.
9. Define VLSI
10. Differentiate Throughput and Response Time
11. Write the CPU performance equation.
12. If computer A runs a program in 10 seconds, and computer B runs the same program
in 15 seconds, how much faster is A over B.
13. Write the formula for CPU execution time for a program
14. Write the formula for CPU clock cycles required for a program.
15. How will you measure the dynamic power dissipppation?
16. Define – Stored Program Concepts
17. What are the fields in an MIPS instruction?
18. List the advantages of multiprocessor over uniprocessor.
19. What are the different types ofoperands? Give examples
20. List the different addressing modes

PART-B
1. i)Discuss in detail about Eight great ideas of computer Architecture.(8)
ii) Explain in detail about Technologies for Building Processors and Memory (8)
2. Explain the various components of computer System with neat diagram (16)
3. Discuss in detail the various measures of performance of a computer(16)
4. Define Addressing mode and explain the basic addressing modes with an example for
each.
5. Explain operations and operands of computer Hardware in detail (16)
6. i)Discuss the Logical operations and control operations of computer (12)
ii)Write short notes on Power wall(6)
7. Consider three diff erent processors P1, P2, and P3 executing the same instruction
set. P1 has a 3 GHz clock rate and a CPI of 1.5. P2 has a 2.5 GHz clock rate and a CPI
of 1.0. P3 has a 4.0 GHz clock rate and has a CPI of 2.2.
a. Which processor has the highest performance expressed in instructions per second?
b. If the processors each execute a program in 10 seconds, find the number of
cycles and the number of instructions.
c. We are trying to reduce the execution time by 30% but this leads to an increase

www.Vidyarthiplus.com
www.Vidyarthiplus.com

of 20% in the CPI. What clock rate should we have to get this time reduction?

8.Assume a program requires the execution of 50 × 106 FP instructions,110 × 106 INT


instructions, 80 × 106 L/S instructions, and 16 × 106 branchinstructions. The CPI for each
type of instruction is 1, 1, 4, and 2, respectively.Assume that the processor has a 2 GHz clock
rate.
a. By how much must we improve the CPI of FP instructions ifwe want the program to run
two times faster?
b. By how much must we improve the CPI of L/S instructionsif we want the program to run
two times faster?
c. By how much is the execution time of the program improvedif the CPI of INT and FP
instructions is reduced by 40% and the CPI of L/S and
Branch is reduced by 30%?
9.Explain Branching operations with example
10.Explain the following addressing modes in detail with diagram
i)Immediate addressing ii)Register addressing, iii)Baseor displacement addressing,
iv)PC-relative addressing v)Pseudodirect addressing

UNIT-II ARITHMETIC OPERATIONS


PART-A

1. Add 610 to 710 in binary and Subtract 610 from 710 in binary
2. Write the overflow conditions for addition and subtraction.
3. Draw the Multiplication hardware diagram
4. List the steps of multiplication algorithm
5. What is fast multiplication?
6. List the steps of division algorithm
7. What is scientific notation and normalization? Give an example
8. Give the representation of single precision floating point number
9. Define overflow and under flow with examples
10. Give the representation of double precision floating point number
11. What are the floating point instructions in MIPS?
12. What are the steps of floating point addition?
13. List the steps of floating point multiplication
14. Define – Guard and Round
15. Write the IEEE 754 floating point format.
16. What is meant by sub-word parallelism?
17. Multiply 100010 * 100110.
18. Divide 1,001,010ten by 1000ten.
19.For the following C statement, what is the corresponding MIPS assembly code?
f = g + (h − 5)
20.For the following MIPS assembly instructions above, what is a corresponding
C statement?
add f, g, h
add f, i, f

www.Vidyarthiplus.com
www.Vidyarthiplus.com

PART- B

1. Explain the Multiplication algorithm in detail with diagram and examples


2. Discuss in detail about division algorithm in detail with diagram and examples
3. Explain in detail about floating point addition with example
4. Explain in detail about floating point multiplication
5. Give the algorithm for multiplication of signed 2’s complement
numbers and illustrate with an example
6. Multiply the following pair of signed 2’s complement numbers :
A = 010111, B = 101100.
7. Add the numbers 0.510 and -0.437510using binary Floating point Addition algorithm
8. Multiply 1.10 10X 1010and 9.200X10-5 using binary Floating point multiplication
9.Calculate the division of A and B
A : 3.264 X 103 B: 6.52 X 102
10.Show the IEEE 754 binary representation of the number -0.75 10in single and
double precision

UNIT III PROCESSOR AND CONTROL UNIT

PART-A

1. What is meant by data path element?


2. What is the use of PC register?
3. What is meant by register file?
4. What are the two state elements needed to store and access an instruction?
5. Draw the diagram of portion of datapath used for fetching instruction.
6. Define – Sign Extend and Vector interupts
7. What is meant by branch target address?
8. Differentiate branch taken from branch not taken.
9. What is meant by delayed branch?
10. Write the instruction format for the jump instruction.
11. What are hazards? Write its types.
12. What is meant by forwarding?
13. What is pipeline stall?
14. What is meant by branch prediction?
15. What are the 5 pipeline stages?
16. What are exceptions and interrupts?
17. What is meant by pipelining?
18. What are the five steps in MIPS instruction execution?
19. What are the three instruction classes and their instruction formats?
20. Write the formula for calculating time between instructions in a pipelined
processor.

PART B

1. Explain the basic MIPS implementation of instruction set


2. Explain the basic MIPS implementation with necessary multiplexers and control lines
3. What is control hazards ?Explain the methods for dealing with the control hazards.
4. Discuss the influence of pipelining in detail

www.Vidyarthiplus.com
www.Vidyarthiplus.com

5. Explain how the instruction pipeline works. What are the various situations where an
instruction pipeline can stall? What can be its resolution?
6. What is data hazard? How do you overcome it?What are its side effects?
7. Discuss the data and control path methods in pipelining
8. Explain dynamic branch prediction
9. How exceptions are handled in MIPS
10. Explain in detail about building a datapath
11. Explain in detail about control implementation scheme

UNIT IVPARALLELISAM

PART-A

1. What is meant by ILP?


2. What is multiple issue? Write any two approaches.
3. What is meant by speculation?
4. Define – Static Multiple Issue
5. Define – Issue Slots and Issue Packet
6. Define – VLIW
7. Define – Superscalar Processor
8. What is meant by loop unrolling?
9. What is meant by anti-dependence? How is it removed?
10. Differentiate in-order execution from out-of-order execution.
11. What is meant by hardware multithreading?
12. What are the two main approaches to hardware multithreading?
13. What is SMT?
14. Compare SMT from hardware multithreading.
15. What are the three multithreading options?
16. Define – SMP
17. Differentiate UMA from NUMA.
18. What is a multicore microprocessor?
19. What is a parallel processing program?
20. Define a cluster

PART- B
1. Explain Instruction level parallelism
2. Explain the difficulties faced by parallel processing programs
3. Explain shared memory multiprocessor
4. Explain in detail Flynn’s classification of parallel hardware
5. Explain cluster and other Message passing Multiprocessor
6. Explain in detail hardware Multithreading
7. Explain SISD and MIMD
8. Explain SIMD and SPMD
9. Explain Multicore processors
10. Explain the different types of multithreading

www.Vidyarthiplus.com
www.Vidyarthiplus.com

UNIT VMEMORY AND I/O SYSTEM

PART-A

1. What are the temporal and spatial localities of references?


2. Write the structure of memory hierarchy
3. What are the various memory technologies?
4. Differentiate SRAM from DRAM.
5. What is flash memory ?
6. Define − Rotational Latency
7. What is direct-mapped cache?
8. Consider a cache with 64 blocks and a block size of 16 bytes. To what block number
does byte address 1200 map?
9. How many total bits are required for a direct-mapped cache with 16 KB of
data and 4-word blocks, assuming a 32-bit address?
10. What are the writing strategies in cache memory?
11. What are the steps to be taken in an instruction cache miss?
12. Define – AMAT
13. What are the various block placement schemes in cache memory?
14. Define – MTTF and AFR
15. Define – Availability
16. What are the three ways to improve MTTF?
17. Define – TLB
18. What is meant by virtual memory?
19. Differentiate physical address from logical address.
20. What is meant by address mapping?

PART- B
1. Explain in detail about memory Technologies
2. Expain in detail about memory Hierarchy with neat diagram
3. Describe the basic operations of cache in detail with diagram
4. Discuss the various mapping schemes used in cache design(10)
A byte addressable computer has a small data cache capable of holding eight 32-bit words.
Each cache block contains 132-bit word. When a given program is executed, the processor
reads data from the following sequence of hex addresses – 200, 204, 208, 20C, 2F4, 2F0,
200, 204,218, 21C, 24C, 2F4. The pattern is repeated four times. Assuming that the cache is
initially empty, show the contents of the cache at the end of each pass, and compute the hit
rate for a direct mapped cache. (6)
5. Discuss the methods used to measure and improve the performance of the cache.
6. Explain the virtual memory address translation and TLB withnecessary diagram.
7. Draw the typical block diagram of a DMA controller and explain how it is
used for direct data transfer between memory and peripherals.
8. Explain in detail about interrupts with diagram
9. Describe in detail about programmedInput/Output with neat diagram
10.Explain in detail about I/O processor.

www.Vidyarthiplus.com

Das könnte Ihnen auch gefallen