Sie sind auf Seite 1von 4

Datapath Design Methods of handling carry signals in the two

Fixed-point arithmetic main combinational adder designs

Basic adders 1. Ripple Carry propagation
 Half-adder i. Lowest cost adders
 Full-adder – also called 1-bit and can easily
adder provide access to
 Serial Adder – the least the internal signal
expensive circuit in terms of needed by the
hardware cost for adding two n- flags.
bit binary numbers 2. Carry-lookahead
- Slow but circuit i. Fast, expensive and
size is very small impractical because
of complexity of its
Multiplication – usually implemented by some
form of addition.
Two multiplication algorithms for twos-
complement numbers
 Robertson’s Algorithm – will
perform multiplication
High level view of a serial adder depending on the case occur.
that has D F/F as the carry store. One sum bit  Booth’s Algorithm - treats
and carry is generated per clock cycle. positive and negative operands
uniformly, no special actions
 Parallel Adders - in one clock are required for negative
cycle add all bits of two n-bit numbers.
numbers as well as an external Combinational array multiplier – can multiply
carry-in signal. large scale of numbers
Ripple-carry adder – Several division difficulties
type of parallel adder.  Quotient overflow – two large to be
connecting full adders one full placed for the answer
adders to generate 1 on its  Divided by zero error – when a number
carry signal. is divided to zero
Subtracters Division by repeated multiplication- division
-implemented using two’s is performed efficiently and low cost.
complementation. Floating-point Arithmetic
-Arithmetic operation exceeds the
standard word size n
Carry-lookahead adders
-a high-speed adder
-compute the input carry needed by
stage directly from carry like signals.
2 auxiliary signals for carry-lookahead adder
1.) generate
2.) propagate
guard bits are temporarily attached to the right
end of the mantissa.

Basic Operations

Pipeline Processing
Difficulties On implementing floating point Pipeline processing
arithmetic - A general technique for increasing processor
1. Exponent biasing throughput without requiring large amount of
If biased exponents are added extra hardware.
- Applied to design of complex datapath units
or subtracted using fixed-point arithmetic in the such as multipliers and floating-point adders.
course of a floating-point calculation, the - Also used to improve the overall throughput of
resulting exponent is doubly biased and must an instruction set processor.
be corrected by subtracting the bias. Introduction
Stages or segments – a pipeline processor consist of
a sequence of m data-processing circuits, which
collectively perform a single operation on a stream of
data operands passing through them.

Structure of Pipeline Processor:

3. Overflow and Underflow

A floating point operation
causes overflow if the result is too large or too
small to be represented. However, the  Si contains multiword input register or latch Ri
exponent overflows or underflows, an error and a datapath circuit Ci that is usually
signal indicating floating-point overflow or  Ri holds partially processed results as they move
underflow is generated. through the pipeline and they also serve as
buffers that prevent neighboring stages from
interfering w/ 1 another.
4. Guard Bits
 A common clock causes Ri to change state
To preserve accuracy during floating- synchronously.
point calculations, one or more extra bits called  Each Ri receives a new set of input data D(i-1)
from the preceding stage S(i-1) except for Ri
whose data is supplied from an external source.
 D(i-1) represent the results computed by C(i-1)  S1 identifies the smaller of the exponents
during the preceding clock period. say Xe whose mantissa Xm can then be
 Once D(i-1) has been loaded into Ri, Ci modified by shifting in the second stage S2
proceeds to use D(i-1) to compute the new data of the pipeline to form a new mantissa x’m
set Di. that makes (x’m, xe) = (xm, xe)
 Thus in clock period, every stage transfers it  In the third stage, the mantissa X’m and Ym
previous results to the next stages and are added. This can produce an
computes a new set of results. unnormalized result.
 Hence, the result is normalized.
Advantage: A m-stage pipeline can simultaneously
process up to m independent set of data operands. Operation of the four-stage floating-point adder
T – pipeline’s clock period
mT – Delay or latency of the pipeline
1/T – Pipeline’s throughput
CPI = 1

• For a non-pipelined processor:
• For a pipelined processor:
Where: N = number of Tasks
m = number of stages
T = pipeline’s clock period
 Illustrates the behavior of the adder pipeline
Addition of two normalized floating-point numbers x when performing a sequence of N floating-
and y can be implemented using: Four-step sequence point additions of the form xi+yi for the case
Four-step sequence:  At any time, any the four stages can contain
1) compare the exponents
a pair of partially processed scalar operands
2) align the mantissas
(xi, yi).
3) add the mantissas
 The buffering of the stages ensures that Si
4) normalize the result
receives as input the results computed by
stage S(i-1) during the preceding clock
Normalization is done by counting the number k of
period only.
leading zero digits of the mantissa (or leading ones in
 If T is the pipeline’s clock period, then it
the negative case), shifting the mantissa k digit
position to normalize it and making a corresponding takes 4T to compute the single sum xi+yi or
adjustment in the exponent. in other words, pipeline’s delay is 4T
 4T is the time required to do one floating-
Four-stage floating-point adder pipeline: point addition using a nonpipelined
processor plus the delay due to the buffer
 Once all four stages of the pipeline have
been filled w/ data, a new sum emerges from
the last stage of the pipeline S4 every T
 Consequently, N consecutive additions can
be done in time (N+3)T, implying that the
four-stage pipeline’s speedup is
_<Pipeline Design:
 Suppose that x has a normalized floating
 Find a suitable multistage sequential
point representation (Xm, Xe) where Xm is
algorithm to compute the given function.
mantissa and Xe is exponent w/ respect to
 This algorithm’s steps which are
some base B=2K
implemented by the pipeline stages should
 In the first step of adding x=(Xm, Xe) to
be balanced that they should have roughly
y=(Ym, Ye) which is executed by S1 of the
the same execution time.
pipeline, Xe and Ye are compared by
Fast buffer registers
subtracting the exponents, which requires a
- placed between the stages to allow all necessary
fixed point adder.
data-items (partial or complete results) to be
transferred from stage to stage without
interfering w/ 1 another
- buffers are designed to be clocked at the
maximum rate that allows data to be transferred
reliably between stages.

Pipelined version of the floating-point adder:

 Shows a register level design of a floating-

point adder pipeline based on the
nonpipelined design and employing a four
stage organization
 The main change is the inclusion of the
buffer registers to define and isolate the four
 Thus the circuit is an example of
multifunction pipeline that can be configured
as either a floating-point adder or as one-
stage fixed point adder

Tc = max{Ti} + TR
For i = 1,2,…m

Where: Tc = minimum clock period

max{Ti} = delay between the
emergence of the
from the Pipeline
• Pipelined Processor:

• Non-pipelined Processor:

- The usefulness of a pipeline processor can
sometimes be enhanced by including feedback
paths from a stage output to the primary inputs
of the pipeline.
- It enables the result computed by certain stages
to be used in a subsequent calculations by the