Beruflich Dokumente
Kultur Dokumente
3 0 03
OBJECTIVES:
UNIT IV PARALLELISIM 9
Parallel processing challenges – Flynn„s classification – SISD, MIMD, SIMD, SPMD, and Vector
Architectures - Hardware multithreading – Multi-core processors and other Shared Memory
Multiprocessors - Introduction to Graphics Processing Units, Clusters, Warehouse Scale Computers and
other Message-Passing Multiprocessors.
Memory Hierarchy - memory technologies – cache memory – measuring and improving cache
performance – virtual memory, TLB„s – Accessing I/O Devices – Interrupts – Direct Memory Access –
Bus structure – Bus operation – Arbitration – Interface circuits - USB.
TOTAL : 45 PERIODS
OUTCOMES:
On Completion of the course, the students should be able to:
Understand the basics structure of computers, operations and instructions.
Design arithmetic and logic unit.
Understand pipelined execution and design control unit.
Understand parallel processing architectures.
Understand the various memory systems and I/O communication.
TEXT BOOKS:
1. David A. Patterson and John L. Hennessy, Computer Organization and Design: The Hardware/Software
Interface, Fifth Edition, Morgan Kaufmann / Elsevier, 2014.
2. Carl Hamacher, Zvonko Vranesic, Safwat Zaky and Naraig Manjikian, Computer Organization and Embedded
Systems, Sixth Edition, Tata McGraw Hill, 2012.
6. What is CPU execution time, user CPU time and system CPU time?
CPU time : The actual time the CPU spends computing for a specific task.
user CPU time: The CPU time spent in a program itself.
system CPU time: The CPU time spent in the operating system performing tasks on behalf
the program.
21.Wrie down the MIPS Assembly language notation for data transfer operations.
Category Instruction Example Meaning Comments
load word lw $s1,20($s2) $s1 = Memor y[$s2 + 20] Word from memor y to register
store word sw $s1,20($s2) Memor y[$s2 + 20] = $s1 Word from register to memor y
load half lh $s1,20($s2) $s1 = Memor y[$s2 + 20] Halfword memor y to register
load half unsigned lhu $s1,20($s2) $s1 = Memor y[$s2 + 20] Halfword memor y to register
store half sh $s1,20($s2) Memor y[$s2 + 20] = $s1 Halfword register to memor y
Data load byte lb $s1,20($s2) $s1 = Memor y[$s2 + 20] Byte from memor y to register
transfer load byte unsigned lbu $s1,20($s2) $s1 = Memor y[$s2 + 20] Byte from memor y to register
store byte sb $s1,20($s2) Memor y[$s2 + 20] = $s1 Byte from register to memor y
load linked word ll $s1,20($s2) $s1 = Memor y[$s2 + 20] Load word as 1st half of atomic swap
store condition. sc $s1,20($s2) Memory[$s2+20]=$s1;$s1=0 or Store word as 2nd half of atomic swap
wordupper immed.
load lui $s1,20 $s1 = 20 * 216
1 Loads constant in upper 16 bits
22.Wrie down the MIPS Assembly language notation for Logical operations.
Category Instruction Example Meaning Comments
and and $s1,$s2,$s3 $s1 = $s2 & $s3 Three reg. operands; bit-by-bit AND
or or $s1,$s2,$s3 $s1 = $s2 | $s3 Three reg. operands; bit-by-bit OR
nor nor $s1,$s2,$s3 $s1 = ~ ($s2 | $s3) Three reg. operands; bit-by-bit NOR
Logical and immediate Bit-by-bit AND reg with constant
andi $s1,$s2,20 $s1 = $s2 & 20
or immediate ori $s1,$s2,20 $s1 = $s2 | 20 Bit-by-bit OR reg with constant
shift left logical sll $s1,$s2,10 $s1 = $s2 << 10 Shift left by constant
shift right logical srl $s1,$s2,10 $s1 = $s2 >> 10 Shift right by constant
23.Wrie down the MIPS Assembly language notation for conditional branch operations.
Category Instruction Example Meaning Comments
branch on equal beq $s1,$s2,25 if ($s1 == $s2) go to Equal test; PC-relative branch
PC + 4 + 100
branch on not equal bne $s1,$s2,25 if ($s1!= $s2) go to Not equal test; PC-relative
PC + 4 + 100
set on less than slt $s1,$s2,$s3 if ($s2 < $s3) $s1 = 1; Compare less than; for beq, bne
else $s1 = 0
set on less than sltu $s1,$s2,$s3 if ($s2 < $s3) $s1 = 1; Compare less than unsigned
Conditional unsigned else $s1 = 0
branch set less than Compare less than constant
slti $s1,$s2,20 if ($s2 < 20) $s1 = 1;
immediate else $s1 = 0
set less than sltiu $s1,$s2,20 if ($s2 < 20) $s1 = 1; Compare less than constant
immediate unsigned else $s1 = 0 unsigned
24.Wrie down the MIPS Assembly language notation for Unconditional branch operations.
Category Instruction Example Meaning Comments
jump j 2500 go to 10000 Jump to target address
Unconditional jump register jr $ra go to $ra For switch, procedure return
jump jump and link jal 2500 $ra = PC + 4; go to 10000 For procedure call
It places the value 200 in the register R0.The immediate mode used to specify the value of source
operand.
In assembly language, the immediate subscript is not appropriate so # symbol is used. It can be re-
written as
Move #200,R0
Assembly Syntax: Addressing Function
Immediate #value Operand =value
-1 0 0 +1 0 0 0 -1 +1 -1 0 0 +1 -1 +1 -1
Bit-Pair recoding:
1 0 0 0 1 1 1 1 0 1 0 0 0 1 0 1 0
-2 +1 0 -1 +1 0 +1 1
22.Draw the full adder circuit and give the truth table (CSE May/June 2007)
Inputs Outputs
A B C Carry Sum
0 0 0 0 0
0 0 1 0 1
0 1 0 0 1
0 1 1 1 0
1 0 0 0 1
1 0 1 1 0
1 1 0 1 0
1 1 1 1 1
23.What is a guard bit and what are the ways to truncate the guard bits?
The first of two extra bits kept on the right during intermediate calculations of floating point numbers,
called guard bit.
There are several ways to truncate the guard bits
Chopping
Von-Neumann rounding
Rounding
24. Write the Add/subtract rule for floating point numbers.
Choose the number with the smaller exponent and shift its mantissa right a number of steps
equal to the difference in exponents.
Set the exponent of the result equal to the larger exponent.
Perform addition/subtraction on the mantissa and determine the sign of the result
Normalize the resulting value, if necessary.
25.Why floating point number is more difficult to represent and process than integer?
An integer value requires only half the memory space as an equivalent. IEEE double- precision
floating point value. Applications that use only integer based arithmetic will therefore also have
significantly smaller memory requirement
A floating-point operation usually runs hundreds of times slower than an equivalent integer based
arithmetic operation.
1. Define pipelining. What are the steps required for a pipelined processor to process the
instruction? What are the advantages of Pipeline?
Pipelining is a technique in which multiple instructions are overlapped in execution.
It is a technique of decomposing a sequential process into sub operations with each sub process being
executed in a special dedicated segment that operates concurrently with all other segments.
IF – Instruction Fetch
ID – Instruction Decode
EX – Execution or address calculation
MEM – Data Memory Access
WB – Write Back
Advantages
Main advantage of pipelining is that the cycle time of the processor is reduced and it can
increase the instruction throughput.
2. What are Hazards? State different types of hazards that can occur in pipeline.
A hazard is also called as hurdle. The situation that prevents the next instruction in the instruction stream
from executing during its designated Clock cycle. Stall is introduced by hazard. (Ideal stage)
The types of hazards that can occur in the pipelining were,
1. Data hazards.
2. Instruction hazards.
3. Structural hazards.
3. What is meant by branch prediction?
Branch prediction is a method of resolving a branch hazard. It assumes a given outcome for the
branch and proceeds from that assumption rather than waiting to ascertain the actual outcome.
Dynamic branch prediction:
Dynamic branch prediction is a prediction of branches at runtime using runtime information
4. What is Branch prediction buffer?
Branch prediction buffer also called branch history table.
It is a small memory that is indexed by lower portion of the address of the branch instruction.
It contains one or more bits indicating whether the branch was recently taken or not.
5. What is exception?
Exception is an unscheduled event that disrupts program execution and used to detect overflow.
Exceptions are created to handle unexpected events from within the processor.
There are two kinds of exceptions are available in MIPS architecture such as
Execution of an undefined instruction
An arithmetic overflow exception
6. Define Sign extend.
In the load and store or branch operations, the intermediate is added to a 32-bit operand (register or PC
contents to create an operand or perhaps a new address, for instance in a branch predict instruction. Since
the immediate may be + or -, then to odd it to a 32-bit argument, we must sign extend it to 32-bit. The sign
extension unit performs this.
7. Define Data hazards
A data hazard is any condition in which either the source or the destination operands of
an instruction are not available at the time expected in pipeline. As a result some operation has
to be delayed, and the pipeline stalls.
8. Define Instruction hazards
The pipeline may be stalled because of a delay in the availability of an instruction. For
example, this may be a result of miss in cache, requiring the instruction to be fetched from the
main memory. Such hazards are called as Instruction hazards or Control hazards.
9. Define Structural hazards?
The structural hazards is the situation when two instructions require the use of a given
hardware resource at the same time. The most common case in which this hazard may arise is
access to memory.
10. What are the classification of data hazards?
Classification of data hazard: A pair of instructions can produce data hazard by referring
reading or writing the same memory location. Assume that i is executed before J. So, the hazards
can be classified as,
1. RAW hazard
2. WAW hazard
3. WAR hazard
11. Define RAW hazard : ( read after write)
Instruction „j‟ tries to read a source operand before instruction „i‟ writes it.
12. Define WAW hazard :( write after write)
Instruction „j‟ tries to write a source operand before instruction „i‟ writes it.
13.Define WAR hazard :( write after read)
Instruction „j‟ tries to write a source operand before instruction „i‟ reads it.
14. How data hazard can be prevented in pipelining?
Data hazards in the instruction pipelining can prevented by the following techniques.
a)Operand Forwarding
b)Software Approach
15.How Compiler is used in Pipelining?
A compiler translates a high level language program into a sequence of machine instructions. To
reduce N, we need to have suitable machine instruction set and a compiler that makes good use of it. An
optimizing compiler takes advantages of various features of the target processor to reduce the product
N*S, which is the total number of clock cycles needed to execute a program. The number of cycles is
dependent not only on the choice of instruction, but also on the order in which they appear in the program.
The compiler may rearrange program instruction to achieve better performance of course, such changes
must not affect of the result of the computation.
20. Draw the basic structure of Basic Structure of a Symmetric Shared Memory Multiprocessor
21. What is multicore'?
At its simplest, multi-core is a design in which a single physical processor contains the core logic
of more than one processor. It's as if an Intel Xeon processor were opened up and inside were packaged
all the circuitry and logic for two (or more) Intel Xcon processors. The multi-core design takes several
such processor "cores" and packages them as a single physical processor.The goal of this design is to
enable a system to run more tasks simultaneously and thereby achieve greater overall system
performance.
22. Write the software implications of a multicore processor?
Multi-core systems will deliver benefits to all software, but especially multi-threaded programs.All code
that supports HT Technology or multiple processors, for example, will benefit automatically from multi-
core processors, without need for modification. Most server-side enterprise packages and many desktop
productivity tools fall into this category.
23. What is coarse grained multithreading?
It switches threads only on costly stalls. Thus it is much less likely to slow down the execution
an individual thread.
UNIT-5 MEMORY AND I/O SYSTEMS
Timing diagram
Row address is latched under the control of RAS signals and the memory takes 2 or 3 cycle to
activate the selected row.
Column address is latched under the control of CAS signals and after a delay of „1‟ clock cycle,
the first set of data bit is places on the data lines.
The SDRAM automatically increment the column address to address to access the next three set of
bits in the selected row which are placed on the data lines in the next „3‟ clock cycle.
SDRAM have build-in refresh circuitry. This circuit behaves as a refresh counter, which provides
the address of the rows that are selected for refreshing. Each row must be refreshed at least every
64-ms
SDRAM can be used with clock speed above 100MHZ
Latency, Bandwidth, and DDRSDRAMs
Memory latency is the time it takes to transfer a word of data to or from memory. Memory
bandwidth is the number of bits or bytes that can be transferred in one second.
DDRSDRAMs (Double Data Rate Synchronous Dynamic RAM)
SDRAM performs all actions on the rising edge of the clock signals. The faster version is called
DDRSDRAM to perform data transfer on the both edges of the clock i.e. rising and falling edge. Their
bandwidth is essentially doubled for long burst transfer but latency of these devices is similar to that of
the SDRAM. The cell array is organized in two banks. Each bank can be accessed separately.
Rambus memory
A very wide bus is expensive and requires a lot of space on a motherboard. An alternative
approach is to implement a narrow bus that is much faster. This approach (was used by Rambus Inc.) is
called Rambus.
Key features
Fast signaling method used to transfer information between chips.
Advantage
Rambus channel – provide a complete specification for the design of such communication link.
Rambus allow a clock frequency of 400MHZ.
If data transfer on both edges of the clock then data transfer rate is 800MHZ.
Rambus use multiple banks of cell array to access more than one word at a time such memory chip
is called RDRAM (Rambus DRAM).
RDRAM chip can be assembled into the large modules called RIMM (Rambus in-line Memory
Module.
It uses master-slave relationship. Master is a processor and slave is a RDRAM
Flash memory:
Has similar approach to EEPROM.
One difference
EEPROM – It is possible to read and write the content of a single cell
Flash memory - It is possible to read the content of a single cell but only possible to write
an entire block of cells.
Flash devices have greater density.
Higher capacity and low storage cost per bit.
Power consumption of flash memory is very low, making it attractive for use in equipment that
is battery-driven.
Single flash chips are not sufficiently large, so larger memory modules are implemented using
flash cards and flash drives.
Flash card – Large module is mount flash chip on a small card. Card is simply plug-in to accessible
slot. These cards also vary in memory size.
Flash drives – similar to hard disk, different is hard disk can store many Gigabytes (GB). But flash
drivers can store less than one GB. Hard disk provides an extremely low cost per bit. But flash drivers
are higher cost per bit.
Magnetic Disk
The magnetic hard disk consists of a collection of platters, which rotate on a spindle at 5400 to
15,000 revolutions per minute. The metal platters are covered with magnetic recording material on both
sides, similar to the material found on a cassette or videotape. To read and write information on a hard
disk, a movable arm containing a small electromagnetic coil called a read-write head is located just above
each surface. The entire drive is permanently sealed to control the environment inside the drive, which, in
turn, allows the disk heads to be much closer to the drive surface.
Each disk surface is divided into concentric circles, called tracks. There are typically tens of
thousands of tracks per surface. Each track is in turn divided into sectors that contain the information;
each track may have thousands of sectors. Sectors are typically 512 to 4096 bytes in size. The sequence
recorded on the magnetic media is a sector number, a gap, the information for that sector including error
correction code, a gap, the sector number of the next sector, and so on.
The disk heads for each surface are connected together and move in conjunction, so that every head is
over the same track of every surface. The term cylinder is used to refer to all the tracks under the heads at
a given point on all surfaces.
Seek time – time required to move the read/write head to the proper track.
Rotational delay/Rotational Latency time – the amount of time it takes for the desired sector of a
disk (i.e., the sector from which data is to be read or written) to rotate under the read-write heads of
the disk drive. This is the time for half a rotation of the disk.
Disk access time - The sum of seek time and latency time is called disk access time.
2. Discuss in detail about Cache Memory. Explain mapping functions in cache memory to
determine how memory blocks are placed in cache.
Processor is much faster than the main memory
As a result, the processor has to spend much of its time waiting while instructions and data
are being fetched from the main memory.
To achieving good performance
Speed of the main memory cannot be increased beyond a certain point. Cache memory is an
architectural arrangement which makes the main memory appear faster to the processor than it really is.
Cache memory is based on the property of computer programs known as “locality of reference”.
Analysis of programs indicates that many instructions in localized areas of a program are executed
repeatedly during some period of time, while the others are accessed relatively less frequently. These
instructions may be the ones in a loop, nested loop or few procedures calling each other repeatedly. This
is called “locality of reference”.
Temporal locality of reference:
Recently executed instruction is likely to be executed again very soon.
Spatial locality of reference:
Instructions with addresses close to a recently instruction are likely to be executed soon.
Processor issues a Read request; a block of words is transferred from the main memory to the
cache, one word at a time.
Subsequent references to the data in this block of words are found in the cache.
At any given time, only some blocks in the main memory are held in the cache. Which blocks in
the main memory are in the cache is determined by a “mapping function”.
When the cache is full, and a block of words needs to be transferred from the main memory, some
block of words in the cache must be replaced. This is determined by a “replacement algorithm”.
Cache Hit
Existence of a cache is transparent to the processor. The processor issues Read and
Write requests in the same manner.
If the data is in the cache it is called a Read or Write hit.
Read hit:
The data is obtained from the cache.
Write hit:
Cache has a replica of the contents of the main memory.
Contents of the cache and the main memory may be updated simultaneously. This is the
write-through protocol.
Update the contents of the cache, and mark it as updated by setting a bit known as the dirty
bit or modified bit. The contents of the main memory are updated when this block is
replaced. This is write-back or copy-back protocol.
Cache Miss
If the data is not present in the cache, then a Read miss or Write miss occurs.
Read miss:
Block of words containing this requested word is transferred from the memory.
After the block is transferred, the desired word is forwarded to the processor.
The desired word may also be forwarded to the processor as soon as it is transferred without
waiting for the entire block to be transferred. This is called load-through or early-restart.
Write-miss:
Write-through protocol is used, and then the contents of the main memory are updated
directly.
If write-back protocol is used, the block containing the addressed word is first brought into
the cache. The desired word is overwritten with new information.
Cache Coherence Problem
The copies of the data in the cache and the main memory are different. This is called the cache
coherence problem.
One option is to force a write-back before the main memory is updated from the disk.
Mapping functions
Mapping functions determine how memory blocks are placed in the cache.
A simple processor example:
Cache consisting of 128 blocks of 16 words each.
Total size of cache is 2048 (2K) words.
Main memory is addressable by a 16-bit address.
Main memory has 64K words.
Main memory has 4K blocks of 16 words each.
Three mapping functions:
Direct mapping
Associative mapping
Set-associative mapping.
Direct mapping
Block j of the main memory maps to j modulo 128 of the cache. i.e. .0 maps to 0, 129 maps to 1 as
so on. More than one memory block is mapped onto the same position in the cache. This may lead to
contention for cache blocks even if the cache is not full. Resolve the contention by allowing new block to
replace the old block, leading to a trivial replacement algorithm.
Memory address is divided into three fields:
Low order 4 bits determine one of the 16 words in a block.
When a new block is brought into the cache, the next 7 bits determine which cache block
this new block is placed in.
High order 5 bits determine which of the possible 32 blocks is currently present in the
cache. These are tag bits.
This mapping is simple to implement, but not very flexible.
Direct Mapping
Associative mapping
Main memory block can be placed into any cache position.
Associative mapping
Memory address is divided into two fields:
Low order 4 bits identify the word within a block.
High order 12 bits or tag bits identify a memory block when it is resident in the cache.
Flexible, and uses cache space efficiently. Replacement algorithms can be used to replace an
existing block in the cache when the cache is full. Cost is higher than direct-mapped cache
because of the need to search all 128 patterns to determine whether a given block is in the cache.
Set-Associative mapping
Blocks of cache are grouped into sets. The mapping function allows a block of the main memory
to reside in any block of a specific set. Divide the cache into 64 sets, with two blocks per set. Memory
block 0, 64, 128 etc. map to block 0, and they can occupy either of the two positions.
Set Associative mapping
Memory address is divided into three fields:
6 bit field determines the set number.
High order 6 bit fields are compared to the tag fields of the two blocks in a set.
Set-associative mapping is a combination of direct and associative mapping.
Various replacement algorithm‟s are used
First-in-first-out replacement algorithm
Random replacement algorithm
Least-recently used replacement algorithm
Most-recently used replacement algorithm
3. Explain in detail about Virtual memories.
When a program does not completely fit into the main memory the parts of it not currently being
executed are stored in secondary memory and transferred to main memory when it is required. Here, the
virtual memory technique is used to extend the apparent size of the physical memory. It uses the
secondary storage such as disks to extend the apparent size of the physical memory and automatically
transfer the program and data block into the physical memory when they are required file execution is
called virtual memory techniques.
Virtual memory organization
The segments which are currently being executed are kept in main memory and remaining
segments are stored in secondary storage such as magnetic disk. If the required segments which are not in
main memory, the executed segments is replaced by the required segments. In modern computer OS
perform these task. Processor generates virtual or logical address.
In addition to the above for each page, TLB must hold the virtual page number for each page.
High-order bits of the virtual address generated by the processor select the virtual page. These bits
are compared to the virtual page numbers in the TLB. If there is a match, a hit occurs and the
corresponding address of the page frame is read. If there is no match, a miss occurs and the page
table within the main memory must be consulted. Set-associative mapped TLBs are found in
commercial processors
Upon detecting a page fault by the MMU, following actions occur:
MMU asks the operating system to intervene by raising an exception.
Processing of the active task which caused the page fault is interrupted.
Control is transferred to the operating system.
Operating system copies the requested page from secondary storage to the main memory.
Once the page is copied, control is returned to the task which was interrupted.
Servicing of a page fault requires transferring the requested page from secondary storage to the
main memory.
This transfer may incur a long delay.
While the page is being transferred the operating system may:
Suspend the execution of the task that caused the page fault.
Begin execution of another task whose pages are in the main memory.
Enables efficient use of the processor.
interrupt will be accepted.
4. Discuss DMA controller with block diagram.
Direct Memory Access (DMA)
A special control unit may be provided to transfer a block of data directly between an I/O device
and the main memory, without continuous intervention by the processor.
Control unit which performs these transfers is a part of the I/O device‟s interface circuit. This
control unit is called as a DMA controller.
DMA controller performs functions that would be normally carried out by the processor:
For each word, it provides the memory address and all the control signals.
To transfer a block of data, it increments the memory addresses and keeps track of the
number of transfers.
DMA controller can transfer a block of data from an external device to the processor, without any
intervention from the processor.
However, the operation of the DMA controller must be under the control of a program
executed by the processor. That is, the processor must initiate the DMA transfer.
To initiate the DMA transfer, the processor informs the DMA controller of:
Starting address,
Number of words in the block.
Direction of transfer (I/O device to the memory, or memory to the I/O device).
Once the DMA controller completes the DMA transfer, it informs the processor by raising an
interrupt signal.
Two register are used to store starting address and the word count.
Status and control flag Register
R/W – Direction of transfer
Read – transfers data from memory to I/O devices
Done flag is „1‟ - DMA completes data transfer
Causes controller to raise an interrupt – IE „1‟ and IRQ – request an interrupt
DMA controller connects a high-speed network to the computer bus. Disk controller, which
controls two disks also, has DMA capability. It provides two DMA channels. It can perform two
independent DMA operations, as if each disk has its own DMA controller. The registers to store the
memory address, word count and status and control information are duplicated.
Processor and DMA controllers have to use the bus in an interwoven fashion to access the
memory.
DMA devices are given higher priority than the processor to access the bus.
Among different DMA devices, high priority is given to high-speed peripherals such as a
disk or a graphics display device.
Processor originates most memory access cycles on the bus.
DMA controller can be said to “steal” memory access cycles from the bus. This
interweaving technique is called as “cycle stealing”.
An alternate approach is to provide a DMA controller an exclusive capability to initiate transfers
on the bus, and hence exclusive access to the main memory. This is known as the block or burst
mode.
5. Explain in detail about the bus arbitration techniques in detail.
Bus arbitration
Processor and DMA controllers both need to initiate data transfers on the bus and access main
memory. The device that is allowed to initiate transfers on the bus at any given time is called the bus
master. When the current bus master relinquishes its status as the bus master, another device can
acquire this status. The process by which the next device to become the bus master is selected and bus
mastership is transferred to it is called bus arbitration.
Centralized arbitration:
A single bus arbiter performs the arbitration.
Distributed arbitration:
All devices participate in the selection of the next bus master.
Centralized Bus Arbitration
Bus arbiter may be the processor or a separate unit connected to the bus. Normally, the processor
is the bus master, unless it grants bus membership to one of the DMA controllers. DMA controller
requests the control of the bus by asserting the Bus Request (BR) line. In response, the processor
activates the Bus-Grant1 (BG1) line, indicating that the controller may use the bus when it is free.
BG1 signal is connected to all DMA controllers in a daisy chain fashion. BBSY signal is 0; it
indicates that the bus is busy. When BBSY becomes 1, the DMA controller which asserted BR can
acquire control of the bus.
Distributed arbitration
All devices waiting to use the bus share the responsibility of carrying out the arbitration
process.
Arbitration process does not depend on a central arbiter and hence distributed arbitration
has higher reliability.
Multicore Processor
Single-core computer Multi-core architectures
The vector architectures easily capture the flexibility in data widths, so it is easy to make a vector
operation work on 32 64-bit data elements or 64 32-bit data elements or 128 16-bit data elements or 256
8-bit data elements. The parallel semantics of a vector instruction allows an implementation to execute
these operations using a deeply pipelined functional unit, an array of parallel functional units, or a
combination of parallel and pipelined functional units.
(a)Single add pipeline and can complete one addition per clock cycle (b) Four add pipeline or lanes and
can complete four additions per clock cycle
Vector arithmetic instructions usually only allow element N of one vector register to take part in
operations with element N from other vector registers. By construction of a highly parallel vector unit,
this can be structured as multiple parallel vector lanes. It can increase the peak throughput of a vector unit
by adding more lanes.
In computing, MISD (multiple instruction, single data) is a type of parallel computing architecture
where many functional units perform different operations on the same data. Pipeline architectures belong
to this type, though a purist might say that the data is different after processing by each stage in the
pipeline. Fault-tolerant computers executing the same instructions redundantly in order to detect and mask
errors, in a manner known as task replication, may be considered to belong to this type. Not many
instances of this architecture exist, as MIMD and SIMD are often more appropriate for common data
parallel techniques. Specifically, they allow better scaling and use of computational resources than MISD
does. However, one prominent example of MISD in computing is the Space Shuttle flight control
computers
Multiple Instruction Stream Multiple Data streams (MIMD)
Multiple autonomous processors are simultaneously executing different instructions on different
data. Distributed systems are generally recognized to be MIMD architectures; either exploiting a single
shared memory space or a distributed memory space. A multi-core superscalar processor is an MIMD
processor.
Running a program on two different desktop computers, you’d say that the faster one is
the desktop computer that gets the job done first. If you were running a datacenter that had
several servers running jobs submitted by many users, you’d say that the faster computer was
the one that completed the most jobs during a day. As an individual computer user, you are
interested in reducing response time—the time between the start and completion of a task—
also referred as execution time.
Datacenter managers are often interested in increasing throughput or bandwidth—the
total amount of work done in a given time.
To maximize performance, we want to minimize response time or execution time for
some task. Thus, we can relate performance and execution time for a computer X:
This means that for two computers X and Y, if the performance of X is greater than the
performance of Y, we have
That is, the execution time on Y is longer than that on X, if X is faster than Y.
In discussing a computer design, we often want to relate the performance of two different
computers quantitatively. We will use the phrase “X is n times faster than Y”—or equivalently
“X is n times as fast as Y”—to mean
Measuring Performance:
The computer that performs the same amount of work in the least time is the fastest.
Program execution time is measured in seconds per program. CPU execution time or simply
CPU time, which recognizes this distinction, is the time the CPU spends computing for this task
and does not include time spent waiting for I/O or running other programs. CPU time can be
further divided into the CPU time spent in the program, called user CPU time, and the CPU
time spent in the operating system performing tasks onbehalf of the program, called system
CPU time.
The term system performance to refer to elapsed time on an unloaded system and CPU
performance to refer to user CPU time.
CPU Performance and Its Factors
Alternatively, because clock rate and clock cycle time are inverses,
This formula makes it clear that the hardware designer can improve performance by reducing
the number of clock cycles required for a program or the length of the clock cycle. As we will
see in later chapters, the designer often faces a trade-off between the number of clock cycles
needed for a program and the length of each cycle. Many techniques that decrease the number
of clock cycles may also increase the clock cycle time.
Instruction Performance
However, since the compiler clearly generated instructions to execute, and the
computer had to execute the instructions to run the program, the execution time must depend
on the number of instructions in a program.
The term clock cycles per instruction, which is the average number of clock cycles each
instruction takes to execute, is often abbreviated as CPI. Since different instructions may take
different amounts of time depending on what they do, CPI is an average of all the instructions
executed in the program.
or, since the clock rate is the inverse of clock cycle time:
3. Briefly explain about manufacturing process of integrated chips with neat diagram?
The manufacture of a chip begins with silicon, a substance found in sand. Because
silicon does not conduct electricity well, it is called a semiconductor. With a special chemical
process, it is possible to add materials to silicon that allow tiny areas to transform into one of
three devices:
Excellent conductors of electricity (using either microscopic copper or aluminum wire)
Excellent insulators from electricity (like plastic sheathing or glass)
Areas that can conduct or insulate under special conditions (as a switch)
Transistors fall in the last category. A VLSI circuit, then, is just billions of combinations of
conductors,insulators, and switches manufactured in a single small package.
Figure shows process for Integrated chip manufacturing. The process starts with a silicon
crystal ingot, which looks like a giant sausage. Today, ingots are 8–12 inches in diameter and
about 12–24 inches long. An ingot is finely sliced into wafers no more than 0.1 inches thick.
These wafers then go through a series of processing steps, during which patterns of chemicals
are placed on each wafer, creating the transistors, conductors, and insulators.
The simplest way to cope with imperfection is to place many independent components on a
single wafer.
The patterned wafer is then chopped up, or diced, into these components, called dies and
more informally known as chips. To reduce the cost, using the next generation process shrinks a
large die as it uses smaller sizes for both transistors and wires. This improves the yield and the
die count per wafer.
Once you’ve found good dies, they are connected to the input/output pins of a package,
using a process called bonding. These packaged parts are tested a final time, since mistakes can
occur in packaging,and then they are shipped to customers.
The cost of an integrated circuit can be expressed in three simple equations:
4. Write short notes on operations and operands in computer hardware?
The words of a computer’s language are called instructions, and its vocabulary is
called an instruction set.
Operations in MIPS:
Every computer must be able to perform arithmetic. The MIPS assembly language
notation add a, b, c instructs a computer to add the two variables b and c and to put their sum
in a.
This notation is rigid in that each MIPS arithmetic instruction performs only one
operation and must always have exactly three variables.
EXAMPLE, To add 4 variables, b,c,d,e and store it in a.
add a, b, c # The sum of b and c is placed in a
add a, a, d # The sum of b, c, and d is now in a
add a, a, e # The sum of b, c, d, and e is now in a
Thus, it takes three instructions to sum the four variables.
Operands in MIPS:
The operands of arithmetic instructions are restricted; they must be from a limited
number of special locations built directly in hardware called registers. The size of a register in
the MIPS architecture is 32 bits; groups of 32 bits occur so frequently that they are given the
name word in the MIPS architecture.
Design Principle 2: Smaller is faster.
A very large number of registers may increase the clock cycle time simply because it
takes electronic signals longer when they must travel farther. So, 32 registers were used in MIPS
architecture. The MIPS convention is to use two-character names following a dollar sign to
represent a register. eg: $s0, $s1
Example: f = (g + h) – (i + j); instructions using registers.
add $t0,$s1,$s2 # register $t0 contains g + h
add $t1,$s3,$s4 # register $t1 contains i + j
sub $s0,$t0,$t1 # f gets $t0 – $t1, which is (g + h)–(i + j)
Memory Operands:
Programming languages have simple variables that contain single data elements, as in
these examples, but they also have more complex data structures—arrays and structures. These
complex data structures can contain many more data elements than there are registers in a
computer. The processor can keep only a small amount of data in registers, but computer
memory contains billions of data elements. So, MIPS must include instructions that transfer
data between memory and registers.
Such instructions are called data transfer instructions. To
access a word in memory, the instruction must supply the memory
address.
In MIPS, words must start at addresses that are multiples of 4.
This requirement is called an
alignment restriction, and many architectures have it.(since in MIPS
each 32 bits form a word in memory, so the address of one word to
another jumps in multiples of 4)
Byte addressing also affects the array index. To get the proper byte address in the code
above, the off set to be added to the base register $s3 must be 4 x 8, or 32,(as per previous
example).
Above EXAMPLE: g = h + A[8]; (implemented based on byte address)
To get A[8] from memory use lw and calculate (8 x4) = 32 which is the actual offset value,
lw $t0,32($s3) # Temporary reg $t0 gets A[8]
Use Result of A[8] i.e., stored in $t0, add $s1,$s2,$t0 # g = h + A[8]
The instruction complementary to load is traditionally called store; it copies data from a
register to memory.
The format of a store is similar to that of a load: the name of the operation, followed by
the register to be stored, then off set to select the array element, and finally the base register.
Once again, the MIPS address is specified in part by a constant and in part by the contents of a
register. The actual MIPS name is sw, standing for store word.
EXAMPLE: A[12] = h + A[8];
lw $t0, 32($s3) # Temporary reg $t0 gets A[8], note (8 x4) used.
add $t0, $s2,$t0 # Temporary reg $t0 gets h + A[8]
sw $t0, 48($s3) # Stores h + A[8] back into A[12], note (12 x 4) used.
5. Write short notes on Instructions and its types that are used in MIPS?
or
List the three MIPS instruction formats used to represent the instructions?
R-type Instructions
J-type Instructions
I-type Instructions
1. Immediate addressing, where the operand is a constant within the instruction itself
Example : Constant data specified in an instruction
addi $s3, $s3, 4
2. Register addressing, where the operand is a register
Example : add $t0, $s1, $s2
All the operands of the instruction are registers. It adds the value of $s1 and $s2
and store the result in &to.
3. Base or displacement addressing, where the operand is at the memory location whose
address is the sum of a register and a constant in the instruction
Example : lw $to , 32($s1)
The address of operand is sum of offset value (32) and the register value ($s1).
4. PC-relative addressing, where the branch address is the sum of the PC (Program Counter
) and a constant in the instruction
Example : beq $s1, $s2, Label
5. Pseudodirect addressing, where the jump address is the 26 bits of the instruction
concatenated with the upper bits of the PC
Example Direct addressing means specifying a complete 32 bit address in the instruction itself. However,
since MIPS instructions are 32 bits, we can't do that. In theory, you only need 30 bits to specify the
address of an instruction in memory. However, MIPS uses 6 bits for the opcode, so there's still not enough
bits to do true direct addressing.
B31-26 B25-0
26 bits are used for the target. This is how the address for pseudo-direct addressing is computed.
PC <- PC31-28::IR25-0::00
Take the top 4 bits of the PC, concatenate that with the 26 bits that make up the target, and concatenate
that with 00. This produces a 32 bit address. This is the new address of the PC.
This allows you to jump to 1/16 of all possible MIPS addresses (since you don't control the top 4 bits of
the PC).
Weak scaling
Speedup achieved on a multiprocessor while increasing the size of the problem proportionally to the
increase in the number of processors is called weak scaling.
PART - B (5 × 16 = 80 Marks)
11. (a) Discuss about the various techniques to represent instructions in a computer system. (Refer Section
2.4) (16)
Or
(b) What is the need for addressing in a computer system? Explain the different addressing modes with
suitable examples. (16)
(Refer Section 2.7)
12. (a) Explain the sequential version of multiplication algorithm and its hardware. (16)
(Refer Section 3.3.1)
Or
(b) Explain how floating point addition is carried out in a computer system. Give an example for a
binary floating point addition. (16)
(Refer Section 3.5.4)
13. (a) Explain the different types of pipeline hazards with suitable examples (16)
(Refer Section 4.5.1)
Or
(b) Explain in detail how exceptions are handled in MIPS architecture? (16)
(Refer Section 4.9)
14. (a) Discuss about SISD, MIMD, SIMD, SPMD and VECTOR systems. (16)
(Refer Section 5.4)
Or
(b) What is hardware multithreading? Compare and contrast Fine grained Multi-
Threading and coarse grained Multi- Threading. (16)
(Refer Section 5.5)
15. (a) Elaborate on the various memory technologies and its relevance. (16)
(Refer Section 6.3)
Or
(b) What is virtual memory?
Explain the steps involved in virtual memory address translation (16)
(Refer Section 6.5)
B.E/B.Tech. DEGREE EXAMINATION, MAY/JUNE 2016
Computer Science and Engineering
CS 6303 – COMPUTER ARCHITECTURE
(Regulation 2013)
Time : Three Hours Maximum : 100 Marks
Answer ALL questions
PART – A(10*2=20 Marks)
1. How to represent Instruction in a Computer System?
Instructions are kept in the computer as a series of high and low electronic signals and may be represented as numbers.
Since registers are referred in instructions, there must be a convention to map register names into numbers. Three types
of instruction formats are used in MIPS. They are:
3. Define ALU.
An arithmetic logic unit (ALU) is a digital circuit used to perform arithmetic and logic operations. It represents the
fundamental building block of the central processing unit (CPU) of a computer.
4. What is Subword Parallelism?
By partitioning the carry chains within a 128-bit adder, a processor could use parallelism to perform simultaneous
operations on short vectors of sixteen 8-bit operands, eight 16-bit operands, four 32-bit operands, or two 64-bit
operands. The cost of such partitioned adders was small. Given that the parallelism occurs within a wide word, the
extensions are classified as subword parallelism.
6. What is Exception?
Exceptions are also called as interrupt. It is an unscheduled event that disrupts program execution. It is also used to
detect an overflow condition. Events other than branches or jumps that change the normal flow of instruction
execution comes under exception.
12. (a) Explain briefly about floating point addition and Subtraction algorithms. (16)
Refer section 3.5.4
(b) Define Booth Multiplication algorithm with suitable example. (16)
(Out of syllabus)
Booth's algorithm can be implemented by repeatedly adding (with ordinary unsigned binary addition) one of two
predetermined values Aand S to a product P, then performing a rightward arithmetic shift on P. Let m and r be the
multiplicand and multiplier, respectively; and let x and y represent the number of bits in m and r.
1. Determine the values of A and S, and the initial value of P. All of these numbers should have a length equal to
(x + y + 1).
A: Fill the most significant (leftmost) bits with the value of m. Fill the remaining (y + 1) bits with zeros.
S: Fill the most significant bits with the value of (−m) in two's complement notation. Fill the remaining
(y + 1) bits with zeros.
P: Fill the most significant x bits with zeros. To the right of this, append the value of r. Fill the least
significant (rightmost) bit with a zero.
2. Determine the two least significant (rightmost) bits of P.
If they are 01, find the value of P + A. Ignore any overflow.
If they are 10, find the value of P + S. Ignore any overflow.
If they are 00, do nothing. Use P directly in the next step.
If they are 11, do nothing. Use P directly in the next step.
3. Arithmetically shift the value obtained in the 2nd step by a single place to the right. Let P now equal this new value.
4. Repeat steps 2 and 3 until they have been done y times.
5. Drop the least significant (rightmost) bit from P. This is the product of m and r.
13. (a) What is pipelining? Discuss about pipelined data path and control. (16)
Refer section 4.6
(b) Briefly explain about various categories of hazards with examples. (16)
Refer section 4.7 and 4.8
14. (a) Explain in detail about Flynn’s classification. (16)
Refer section 5.4
(b) Write shot Notes on : (16)
(i) Hardware multithreading Refer section 5.5
(ii) Multicore processors. Refer section 5.6
15. (a) Define Cache Memory ? Explain the various mapping techniques associated with cache memories (16)
Refer section 6.4.3
(b) Explain about DMA controller, with the help of a block diagram. (16)
Refer section 7.3
B.E/B.TECH. DEGREE EXAMINATION, NOVEMBER/DECEMBER 2014.
Third Semester
Computer Science and Engineering
CS 6303 – COMPUTER ARCHITECTURE
(Regulation 2013)
Time : Three hours Maximum : 100 marks
Answer ALL questions
6 5 5 16 bits
op rs rt Address
Memory
PC + Word
PC-relative Addressing
Example:
bne $s0,$s1,Exit # go to Exit if $s0 ≠ $s1
In the above example branch address is calculated by adding the PC value with the constant in the
instruction.
4. What is DMA?
The CPU is responsible for only initiating each block transfer. Then, the interface controller can
take the control and responsibility of transferring data. So that data can be transferred without the
intervention of CPU. The CPU and IO controller interacts with each other only when the control
of bus is requested. This level of IO control is called Direct Memory Access (DMA).
6. What is Exception?
Exceptions are also called as interrupt. It is an unscheduled event that disrupts program execution.
It is also used to detect an overflow condition. Events other than branches or jumps that change
the normal flow of instruction execution comes under exception.
Data Streams
Single Multiple
Instruction Single SISD : Intel Pentium 4 SIMD: SSE Instructions of x86
Streams Multiple MISD: No examples MIMD: Intel Core I7
31 30 ….. 20 19 ….. 0
S Exponent Fraction
1 bit 11 bits 20 bits
Fraction (Continued)
32 bits
5. What is a hazard? What are its types?
There are situations in pipelining when the next instruction cannot execute in the following clock cycle. These
events are called hazards, and there are three different types:
(i) Structural hazard
(ii) Data hazard
(iii) Control hazard
7. What is ILP?
Instruction Level Parallelism (ILP) is a measure of number of instructions that can be performed
simultaneously during a single clock cycle. The potential overlap among instructions is called as instruction
level parallelism.
PART - B (5 × 16 = 80 Marks)
11. a) Explain in details the various components of computer system with neat diagram. (16)
b) What is an addressing mode? Explain the various addressing modes with suitable examples.
(16)
b) Discuss in details about division algorithm in details with diagram and examples. (16)
(Refer Section 3.4.1)
Page no 3.34 -3.37
13. a) Explain the basics MIPS implementation with necessary multiplexers and control lines.( 16)
(OR)
b) Explain how the instruction pipeline works? What are the various situations where an instruction pipeline can
stall? Illustrate with an example (16)
b) Draw the typical block diagram of a DMA Controller and explain how it is used for direct data transfer between
memory and peripherals? (16)
13 (a) Discuss the modified data path to accommodate pipelined executions with a diagram.
Refer section 4.6.1 (Page No. 4.26)
Or
(b) (i) Explain the hazards caused by unconditional branching statements.
Refer section 4.8 (Page No. 4.38 – 4.44)
(ii) Describe operand forwarding in a pipeline processor with a diagram.
Refer section 4.7.1 (Page No. 4.32 – 4.34)
14 (a) (i) Discuss the challenges in parallel processing with necessary examples.
Refer section 5.3 (Page No. 5.12 – 5.13)
(ii) Explain Flynn’s classification of parallel processing with necessary diagrams.
Refer section 5.4 (Page No. 5.16 – 5.20)
Or
(b) Explain the four principal approaches to multithreading with necessary diagrams.
Refer section 5.5 (Page No. 5.23 – 5.26)
15 (a) Explain the different mapping functions that can be applied on cache memories in detail.
Refer section 6.4.3 (Page No. 6.23 – 6.30)
Or
(b) (i) Explain virtual memory address translation in detail with necessary diagrams.
Refer section 6.5.2 (Page No. 6.44 – 6.46)
(ii) What is meant by Direct Memory Access?Explain the use of DMA controllers in a computer
system.
Refer section 7.3 (Page No. 7.7 – 7.11)
PART-C
16 (A) (i) Explain mapping functions in cache memory to determine how memory blocks are placed in
cache.
Refer section 6.4.3 (Page No. 6.23 – 6.30)
(ii) Explain in detail about the Bus Arbitration techniques in DMA.
Refer section 7.3.4 (Page No. 7.11 – 7.15)
Or
(b) A pipelined processor uses delayed branch techniques Recommend any one of the following
possibility for the design of the processor. In the first possibility,the processor has a 4-stage pipeline and
one delay slot.In the second possibility,it has a 6-stage pipeline and two delay slots.Compare the
performance of these two alternatives,taking only the branch penalty into account.Assume that 20% of the
instructions are branch instructions and that an optimizing compiler has an 80% success rate in filling in
the single delay slot.For the second alternative,the compiler is able to fill the second slot 25% of the time.
ANSWER
The first possibility is the better option when compared with the second alternative.
It produces 80% success rate in filling in the single delay slot whereas the second alternative fill
the second slot 25% of the time.
Hence the first alternative can be selected.
Diagrams should be drawn for four stage pipeline and five stage pipelines with one delay slot and
two delay slots respectively by considering an example code which contains 20% of the
instructions as branch instructions.
From the diagrams it is clear that the first option is the better one.
Question Paper Code : 40902
B.E/B.Tech. DEGREE EXAMINATION, APRIL/MAY 2018
Third/Fifth/Sixth Semester
Computer Science and Engineering
CS 6303 – COMPUTER ARCHITECTURE
(Common to : Electronics and Communication Engineering /
Electronics and Instrumentation Engineering / Instrumentation and
Control Engineering / Robotics and Automation Engineering /
Information Technology)
(Regulation 2013)
Time : Three Hours Maximum : 100 Marks
Answer ALL questions
PART – A (10 × 2=20 Marks)
1. Write the equation for the dynamic power required per transistor.
The power required per transistor is just the product of energy of a
transition and the frequency of transitions:
Power ∝ 1/2 × Capacitive load × Voltage2 × Frequency switched\
2. Classify the instructions based on the operations they perform
and give one example to each category.
Classifications of Instructions:
Arithmetic – add $s1,$s2,$s3
Data transfer - lw $s1,20($s2)
Logical – and $s1,$s2,$s3
Conditional branch - beq $s1,$s2, 25
Unconditional branch – j 2500
3. Show the IEEE 754 binary representation of the number (-0.75)10
in single precision.
IEEE 754 representation of (-0.75)10 in single precision format
The number (-0.75)10 is represented in binary form as follows:
0
-0.112 × 2
and in normalized scientific notation, it is
-1
-1.12 × 2
The general representation for a single precision number is,
Solved Anna University Question Papers SQ.71
s (Exponent – 127)
(-1) × (1 + Fraction) × 2
-1
Subtracting the bias 127 from the exponent of -1.1two × 2 yields
1 (126 – 127)
(-1) × (1 + .1000 0000 0000 0000 0000 000two) × 2
The single precision binary representation of -0.75ten is then
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 11 1 1 1 10 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 bit 8 bits 23 bits
PART – B (5 × 13 = 65 Marks)
11. a) i) Consider three different processors P1, P2 and P3 executing
the same instruction set. P1 has a 3 GHz clock rate and a
CPI of 1.5. P2 has a 2.5 GHz clock rate and a CPI of 1.0 P3
has a 4.0 GHz clock rate and has a CPI of 2.2.
Solved Anna University Question Papers SQ.73
ii)
Multiply the following signed 2’s compliment numbers
using the Booth algorithm. A = 001110 and B = 111001
where A is multiplicand and B is multiplier. (8)
A = 001110 (Multiplicand – m)
B = 111001 (Multiplier Q)
-M = 110010
Step M A Q Q-1 Action
0 001110 000000 111001 0 Initial values
1 001110 110010 111001 0 Q0 Q-1 = 10;
001110 111001 011100 1 A = A-N
Shift Right AQQ-1
2 001110 000111 011100 1 Q0 Q-1 = 01; A +
001110 000011 101110 0 M
Shift right AQQ-1
3 001110 000111 101110 0 Q0 Q-1 = 00; NOP
001110 000011 110111 0 Shift Right AQQ-1
4 001110 110101 110111 0 Q0Q-1 = 10;
001110 111010 111011 1 A = A-N
Shift Right AQQ-1
5 001110 111010 111011 1 Q0 Q1 = 11; NOP
001110 111101 011101 1 Shift Right AQQ-1
6 001110 111101 011101 1 Q0 Q-1 = 11; NOP
001110 111110 101110 1 Shift right AQQ-1
Product = AQ
Product = 111110 101110
(or)
b) i) Draw the block diagram of integer divider and explain
the division algorithm. (5)
Refer section 3.4.1
ii) Add the numbers (0.75)10 and (-0.275)10 in binary using
the Floating point addition algorithm. (8)
PART – C (1 × 15 = 15 Marks)
IF ID MEM WB
r1
r1
IK ID MEN WB
r2
IK ID MEM
SQ.80 Computer Architecture
IF ID MEM WB
r1
IF ID MEN WB
r2
IF ID MEM WB
(or)
PART- A
PART-B
1. i)Discuss in detail about Eight great ideas of computer Architecture.(8)
ii) Explain in detail about Technologies for Building Processors and Memory (8)
2. Explain the various components of computer System with neat diagram (16)
3. Discuss in detail the various measures of performance of a computer(16)
4. Define Addressing mode and explain the basic addressing modes with an example for
each.
5. Explain operations and operands of computer Hardware in detail (16)
6. i)Discuss the Logical operations and control operations of computer (12)
ii)Write short notes on Power wall(6)
7. Consider three diff erent processors P1, P2, and P3 executing the same instruction
set. P1 has a 3 GHz clock rate and a CPI of 1.5. P2 has a 2.5 GHz clock rate and a CPI
of 1.0. P3 has a 4.0 GHz clock rate and has a CPI of 2.2.
a. Which processor has the highest performance expressed in instructions per second?
b. If the processors each execute a program in 10 seconds, find the number of
cycles and the number of instructions.
c. We are trying to reduce the execution time by 30% but this leads to an increase
www.Vidyarthiplus.com
www.Vidyarthiplus.com
of 20% in the CPI. What clock rate should we have to get this time reduction?
1. Add 610 to 710 in binary and Subtract 610 from 710 in binary
2. Write the overflow conditions for addition and subtraction.
3. Draw the Multiplication hardware diagram
4. List the steps of multiplication algorithm
5. What is fast multiplication?
6. List the steps of division algorithm
7. What is scientific notation and normalization? Give an example
8. Give the representation of single precision floating point number
9. Define overflow and under flow with examples
10. Give the representation of double precision floating point number
11. What are the floating point instructions in MIPS?
12. What are the steps of floating point addition?
13. List the steps of floating point multiplication
14. Define – Guard and Round
15. Write the IEEE 754 floating point format.
16. What is meant by sub-word parallelism?
17. Multiply 100010 * 100110.
18. Divide 1,001,010ten by 1000ten.
19.For the following C statement, what is the corresponding MIPS assembly code?
f = g + (h − 5)
20.For the following MIPS assembly instructions above, what is a corresponding
C statement?
add f, g, h
add f, i, f
www.Vidyarthiplus.com
www.Vidyarthiplus.com
PART- B
PART-A
PART B
www.Vidyarthiplus.com
www.Vidyarthiplus.com
5. Explain how the instruction pipeline works. What are the various situations where an
instruction pipeline can stall? What can be its resolution?
6. What is data hazard? How do you overcome it?What are its side effects?
7. Discuss the data and control path methods in pipelining
8. Explain dynamic branch prediction
9. How exceptions are handled in MIPS
10. Explain in detail about building a datapath
11. Explain in detail about control implementation scheme
UNIT IVPARALLELISAM
PART-A
PART- B
1. Explain Instruction level parallelism
2. Explain the difficulties faced by parallel processing programs
3. Explain shared memory multiprocessor
4. Explain in detail Flynn’s classification of parallel hardware
5. Explain cluster and other Message passing Multiprocessor
6. Explain in detail hardware Multithreading
7. Explain SISD and MIMD
8. Explain SIMD and SPMD
9. Explain Multicore processors
10. Explain the different types of multithreading
www.Vidyarthiplus.com
www.Vidyarthiplus.com
PART-A
PART- B
1. Explain in detail about memory Technologies
2. Expain in detail about memory Hierarchy with neat diagram
3. Describe the basic operations of cache in detail with diagram
4. Discuss the various mapping schemes used in cache design(10)
A byte addressable computer has a small data cache capable of holding eight 32-bit words.
Each cache block contains 132-bit word. When a given program is executed, the processor
reads data from the following sequence of hex addresses – 200, 204, 208, 20C, 2F4, 2F0,
200, 204,218, 21C, 24C, 2F4. The pattern is repeated four times. Assuming that the cache is
initially empty, show the contents of the cache at the end of each pass, and compute the hit
rate for a direct mapped cache. (6)
5. Discuss the methods used to measure and improve the performance of the cache.
6. Explain the virtual memory address translation and TLB withnecessary diagram.
7. Draw the typical block diagram of a DMA controller and explain how it is
used for direct data transfer between memory and peripherals.
8. Explain in detail about interrupts with diagram
9. Describe in detail about programmedInput/Output with neat diagram
10.Explain in detail about I/O processor.
www.Vidyarthiplus.com