Sie sind auf Seite 1von 23

CAO

Lecture-5&6

Division of Signed Numbers


Let,
M = n-bit divisor
AQ = 2n-bit dividend

As the algorithm terminates, remainder is found


in A and quotient in Q.
If signs of divisor and dividend disagree, 2s
complement of the number in Q must be taken to
obtain the actual quotient.

Division of Signed Numbers

Example 1
-5 2
Represent both divisor and dividend in 2s
complement notation.
M = 2 = 0010
Q = -5 = 1011 AQ = 11111011

Example 1

Example 2
+7 (-4)
M = -4 = 1100
Q = +7 = 0111

AQ = 00000111

Parallel Processing
Parallel Processing is the use of a collection of
integrated and tightly coupled processing
elements or processors that are cooperating
and communicating on a single task to speed
up its solution.

Motivation
Higher Speed, or Solving Problems Faster
This is important when applications have hard or soft
deadlines (called real time systems)
For example, we have at most few hours to do 24-hour
weather forecasting or to produce timely tornado
warning
Higher Throughput, or Solving More Instances of
Given Problems
E.g. Transaction processing for banks and airlines
Higher Computational Power, or Solving Larger
Problems
Generate more detailed, accurate and longer simulation
e.g. 5 day weather forecasting

Limitations to Uniprocessor
Improvement
Speed of Light
For each instruction executed, the processor must
fetch the instruction, and move it to the IR inside
the processor. The time to complete this operation
is bounded by the speed of propagation of
electromagnetic signals--3 x 108 meters per second.
(Actually, that is the speed of light in vacuum; the
speed of light through silicon is less.) If the distance
between the processor and the memory unit is 30
cm, it takes about 1 billionth of a second for an
instruction to travel from memory to the processor.

Limitations to Uniprocessor
Improvement
Limits on Miniaturization
Though miniaturization leads to faster
processing, as it increases the switching speed
of the transistors, we cannot do it indefinitely,
because, after all, a transistor needs some
space on the chip and cannot disappear.

Power dissipation issues due to increased clock rate


Increased clock rate results in staggering power consumption
density on chip
Clock rates are now stagnating to counter increased level of
power consumption
Stagnating clock rates are now being compensated by
multiple processor cores on the same chip i.e. multicore
architectures
Consequently, uniprocessor is now becoming to disappear
even from the desktops thus making it imperative for
programmers to learn parallel programming techniques to
exploit the hardware parallelism available in state-of-the-art
machines.

Von-Neumann Bottleneck
The speed disparity between processor and memory is
growing with the passage of time, causing huge
performance bottlenecks.
A parallel system (e.g. a Cluster) overcomes this
shortcoming by providing more and more aggregate
memory and cache capacity as well as boosting the
memory bandwidth required by HPC applications.
Some of the fastest growing applications of parallel
computing utilize not their raw computational speed,
rather their ability to pump data to memory and disk
faster.

Parallelism in a Uniprocessor System


Multiprogramming & Timesharing
1. In multiprogramming several processes reside
in main memory and CPU switches from one
process (say P1) to another (say P2) when the
currently running process (P1) blocks for an I/O
operation. I/O operation for P1 is handled by a
DMA unit while the CPU runs P2.
2. In timesharing processes are assigned slices of
CPUs time. The CPU executes the processes in
the round robin fashion as below.

Parallelism in a Uniprocessor System


Multiplicity of Functional Units
Use of multiple functional units like multiple adders,
multipliers or even multiple ALUs to provide concurrency is
not a new idea in uniprocessor environment. It has been
around for decades.
Harvard Architecture
This provides separate memory units for instructions and
data. This effectively doubles the memory bandwidth saving
CPUs time. E.g. is split cache in which instructions are kept in
I-cache and data in D-cache
In contrast when instructions and data are kept in the same
memory, the architecture is called Princeton Architecture. E.g.
are unified cache, main memory, etc.

Parallelism in a Uniprocessor System


Memory Hierarchy
A parallel processing mechanism supported by
memory hierarchy is the simultaneous transfer
of instructions/data between (CPU, cache) and
(MM, secondary memory)

Instruction Pipelining
The EX (execution) stage is marked by the ALU usage
be it for adding two register operands, operand
address calculation, testing condition for a branch
instruction, etc.
In M (memory) stage, an instruction either reads or
writes a data element from/to memory.
The result produced by an instruction is written back
to register in WB (write back) stage.

An Introductory Analysis of Pipelines

Non-Pipelined Execution

Speedup

Speedup

Instruction Throughput

Cycles Per Instruction (CPI)

Das könnte Ihnen auch gefallen