Sie sind auf Seite 1von 4

# Datapath Design Methods of handling carry signals in the two

## Fixed-point arithmetic main combinational adder designs

Basic adders 1. Ripple Carry propagation
 Full-adder – also called 1-bit and can easily
 Serial Adder – the least the internal signal
expensive circuit in terms of needed by the
hardware cost for adding two n- flags.
- Slow but circuit i. Fast, expensive and
size is very small impractical because
of complexity of its
carry-generation
logic.
Multiplication – usually implemented by some
Two multiplication algorithms for twos-
complement numbers
 Robertson’s Algorithm – will
perform multiplication
High level view of a serial adder depending on the case occur.
that has D F/F as the carry store. One sum bit  Booth’s Algorithm - treats
and carry is generated per clock cycle. positive and negative operands
uniformly, no special actions
 Parallel Adders - in one clock are required for negative
cycle add all bits of two n-bit numbers.
numbers as well as an external Combinational array multiplier – can multiply
carry-in signal. large scale of numbers
Ripple-carry adder – Several division difficulties
type of parallel adder.  Quotient overflow – two large to be
adders to generate 1 on its  Divided by zero error – when a number
carry signal. is divided to zero
Subtracters Division by repeated multiplication- division
-implemented using two’s is performed efficiently and low cost.
complementation. Floating-point Arithmetic
Overflow
-Arithmetic operation exceeds the
standard word size n
-compute the input carry needed by
stage directly from carry like signals.
1.) generate
2.) propagate
guard bits are temporarily attached to the right
end of the mantissa.

Basic Operations

Pipeline Processing
Difficulties On implementing floating point Pipeline processing
arithmetic - A general technique for increasing processor
1. Exponent biasing throughput without requiring large amount of
If biased exponents are added extra hardware.
- Applied to design of complex datapath units
or subtracted using fixed-point arithmetic in the such as multipliers and floating-point adders.
course of a floating-point calculation, the - Also used to improve the overall throughput of
resulting exponent is doubly biased and must an instruction set processor.
be corrected by subtracting the bias. Introduction
Stages or segments – a pipeline processor consist of
a sequence of m data-processing circuits, which
collectively perform a single operation on a stream of
data operands passing through them.

## 3. Overflow and Underflow

A floating point operation
causes overflow if the result is too large or too
small to be represented. However, the  Si contains multiword input register or latch Ri
exponent overflows or underflows, an error and a datapath circuit Ci that is usually
combinational.
signal indicating floating-point overflow or  Ri holds partially processed results as they move
underflow is generated. through the pipeline and they also serve as
buffers that prevent neighboring stages from
interfering w/ 1 another.
4. Guard Bits
 A common clock causes Ri to change state
To preserve accuracy during floating- synchronously.
point calculations, one or more extra bits called  Each Ri receives a new set of input data D(i-1)
from the preceding stage S(i-1) except for Ri
whose data is supplied from an external source.
 D(i-1) represent the results computed by C(i-1)  S1 identifies the smaller of the exponents
during the preceding clock period. say Xe whose mantissa Xm can then be
 Once D(i-1) has been loaded into Ri, Ci modified by shifting in the second stage S2
proceeds to use D(i-1) to compute the new data of the pipeline to form a new mantissa x’m
set Di. that makes (x’m, xe) = (xm, xe)
 Thus in clock period, every stage transfers it  In the third stage, the mantissa X’m and Ym
previous results to the next stages and are added. This can produce an
computes a new set of results. unnormalized result.
 Hence, the result is normalized.
Advantage: A m-stage pipeline can simultaneously
process up to m independent set of data operands. Operation of the four-stage floating-point adder
pipeline:
T – pipeline’s clock period
mT – Delay or latency of the pipeline
1/T – Pipeline’s throughput
CPI = 1

Latency
• For a non-pipelined processor:
NmT
• For a pipelined processor:
[m+(N-1)]T
Where: N = number of Tasks
m = number of stages
T = pipeline’s clock period
 Illustrates the behavior of the adder pipeline
Addition of two normalized floating-point numbers x when performing a sequence of N floating-
and y can be implemented using: Four-step sequence point additions of the form xi+yi for the case
N=6
Four-step sequence:  At any time, any the four stages can contain
1) compare the exponents
a pair of partially processed scalar operands
2) align the mantissas
(xi, yi).
 The buffering of the stages ensures that Si
4) normalize the result
receives as input the results computed by
stage S(i-1) during the preceding clock
Normalization is done by counting the number k of
period only.
 If T is the pipeline’s clock period, then it
the negative case), shifting the mantissa k digit
position to normalize it and making a corresponding takes 4T to compute the single sum xi+yi or
adjustment in the exponent. in other words, pipeline’s delay is 4T
 4T is the time required to do one floating-
processor plus the delay due to the buffer
registers.
 Once all four stages of the pipeline have
been filled w/ data, a new sum emerges from
the last stage of the pipeline S4 every T
seconds.
 Consequently, N consecutive additions can
be done in time (N+3)T, implying that the
four-stage pipeline’s speedup is
_<Pipeline Design:
 Suppose that x has a normalized floating
 Find a suitable multistage sequential
point representation (Xm, Xe) where Xm is
algorithm to compute the given function.
mantissa and Xe is exponent w/ respect to
 This algorithm’s steps which are
some base B=2K
implemented by the pipeline stages should
 In the first step of adding x=(Xm, Xe) to
be balanced that they should have roughly
y=(Ym, Ye) which is executed by S1 of the
the same execution time.
pipeline, Xe and Ye are compared by
Fast buffer registers
subtracting the exponents, which requires a
- placed between the stages to allow all necessary
data-items (partial or complete results) to be
transferred from stage to stage without
interfering w/ 1 another
- buffers are designed to be clocked at the
maximum rate that allows data to be transferred
reliably between stages.

##  Shows a register level design of a floating-

point adder pipeline based on the
nonpipelined design and employing a four
stage organization
 The main change is the inclusion of the
buffer registers to define and isolate the four
stages.
 Thus the circuit is an example of
multifunction pipeline that can be configured
as either a floating-point adder or as one-

Tc = max{Ti} + TR
For i = 1,2,…m

## Where: Tc = minimum clock period

max{Ti} = delay between the
emergence of the
results
from the Pipeline
Throughput:
• Pipelined Processor:

• Non-pipelined Processor:

Feedback:
- The usefulness of a pipeline processor can
sometimes be enhanced by including feedback
paths from a stage output to the primary inputs
of the pipeline.
- It enables the result computed by certain stages
to be used in a subsequent calculations by the
pipeline