Beruflich Dokumente
Kultur Dokumente
VOL. 54,
INTRODUCTION
ITH the
ROY AND BANERJEE: AN ALGORITHM FOR TRADING OFF QUANTIZATION ERROR WITH HARDWARE RESOURCES FOR MATLAB-BASED...
PREVIOUS WORK
887
3.1 Background
Numeric representation in digital hardware may be either
fixed or floating-point. In fixed-point representation, the
available bit width is divided and allocated to the integer
888
part and the fractional part, with the extreme left bit
reserved for the sign (2s complement). In contrast, a
floating-point representation allocates one sign bit and a
fixed number of bits to an exponent and a mantissa. In
fixed-point, relatively efficient implementations of arithmetic operations are possible in hardware. In contrast, the
floating-point representation needs to normalize the exponents of the operands for addition and subtraction.
Synthesizing customized hardware for fixed-point arithmetic operations is obviously more efficient than their
floating-point counterparts, both in terms of performance as
well as resource usage.
ALGORITHM
FOR
VOL. 54,
. levelization,
. scalarization,
. computation of Ranges of Variables,
. generate fixed-point MATLAB code,
. autoquantization.
We consider the following small piece of MATLAB code
segment to explain each step of our algorithm:
4.1 Levelization
The levelization pass takes a MATLAB assignment statement consisting of complex expressions on the right-hand
side and converts it into a set of statements, each of which is
in a single operator form. For example, the statement
a1 : 100 b1 : 100 c1 : 2 : 200: d1 : 100;
in the example code segment will be converted into two
statements
The reason for this pass is to make sure that the resulting
synthesized output will be bit-true with the original
quantized MATLAB statement.
4.2 Scalarization
The scalarization pass takes a vectorized MATLAB statement and converts it into a scalar form using enclosing FOR
loops. The levelized code segment shown above will be
converted to
AUTOQUANTIZATION
Our algorithm attempts to minimize the hardware resources while constraining the quantization error within a
specified limit, depending on the requirement of the user or
application. We try to minimize the total bit precision of all
quantizers, which is assumed to be a representation of the
hardware resources.
The autoquantization algorithm consists of the following
passes, which are explained in detail in the next few
sections:
ROY AND BANERJEE: AN ALGORITHM FOR TRADING OFF QUANTIZATION ERROR WITH HARDWARE RESOURCES FOR MATLAB-BASED...
889
4.5 Autoquantization
We now execute our Autoquantization algorithm using
profiling of the input vectors, as shown in Fig. 2. For each
benchmark and input set, we calculate the error vector e,
which is the difference between the output vectors for the
original floating-point and the fixed-point MATLAB code
obtained in Section 4.4, using actual MATLAB-based
floating-point and fixed-point simulation. We assume that
outdata is the output vector of the MATLAB code (a in our
example)
e outdatafloat outdatafixed :
We next define an Error Metric (EM) using the following
definition:
EM% norme=normoutdatafloat 100
where the vector norm used is defined by
!1=p
X p
jxjp
jxi j
:
890
VOL. 54,
pi precision and
mi integral length of the quantizer including sign
bit
n total number of quantizers in Mfixed which are
generated for assignment/arithmetic operations
mi floorlog2 maxi 2
}End Fix_Integer_Range
Using the above algorithm, we have the following
integer ranges in our example:
.
.
ROY AND BANERJEE: AN ALGORITHM FOR TRADING OFF QUANTIZATION ERROR WITH HARDWARE RESOURCES FOR MATLAB-BASED...
891
Fine_Optimize algorithm basically tries to find out quantizers whose impact on the quantization error is the smallest
and to reduce precision bits in such quantizers as long as
the error is within the EM constraint. Then, it tries to find a
quantizer whose impact on the error is largest and increases
one bit of precision while simultaneously reducing 2 bits in
a quantizer with the smallest impact on the error, thereby
reducing 1 bit overall. And, it performs such bit reductions
iteratively until the EM constraint is exceeded. Thus, we
attain our Fine optimal point for the given EMmax .
Fine_Optimize(EMmax ; Mfloat ; Mfixed , list of coarse optimized
qi , EMcoarse )
{
.
.
.
892
.
.
VOL. 54,
TABLE 1a
Results of Autoquantization Algorithm versus Manual, Infinite
Precision, and Accelchip Default Quantizers on Five MATLAB
Benchmarks for a Training Set of 5 Percent and Testing Set of
95 Percent, Factor of Safety = 10%
q1 ; . . . ; qj ; . . . qk ; . . . qn
with precisions p1 ; . . . ; pj 1; . . . pk 2; . . . pn .
If EMj;k < EMmax , then EMfine EMj;k and pj ; pk
are changed to pj 1 and pk 2 and we iteratively
repeat the above two steps. If EMj;k > EMmax , we
have found our fine optimal quantizer values
}End Fine_Optimize
.
and EMfine 0:0793% and this gives our fine optimized set
of quantizers with a 10 percent factor of safety.
One limitation of the algorithm is the objective function
minimized is the cumulative bit-widths of all units. It does
not selectively minimize bit-widths of units with a lower
area or cost. Cumulative bit-width is related to area and,
hence, hardware resources. The impact of bit-width on area
is also dependent on the functional unit. However, we
chose to reduce algorithm complexity by choosing the total
bit-width as our objective function.
COMPLEXITY ANALYSIS
ROY AND BANERJEE: AN ALGORITHM FOR TRADING OFF QUANTIZATION ERROR WITH HARDWARE RESOURCES FOR MATLAB-BASED...
TABLE 1b
Table 1 continued
893
TABLE 2
Results of AccelChip 2003_3.1.72 and Synplify Pro on Xilinx
Virtex II XC2V250 Device for Quantizers Selected in Table 1
with a Requested Frequency of 10 MHz
only for all the quantizers, in the worst case, we can have at
most M bit decrements for each of N quantizers. Hence,
single stage Fine Optimization will execute the MATLAB
code OM N times. Hence, the total execution time
complexity of the algorithm will be OE I M N.
We have therefore reduced the complexity of the autoquantization algorithms from OE I M N for a naive
fine optimization approach assuming error metric calculations to an OE I logM N complexity assuming a
two stage Coarse and Fine Optimization approach. Further, if
894
VOL. 54,
TABLE 3
Comparison of Run Times for Autoquantization Algorithm (Two Stage Coarse + Fixed Optimization) versus Only Fine Optimization
TABLE 4
Comparison of Run Times for Autoquantization Algorithm Using 5 Percent Training Inputs versus 100 Percent Inputs
EXPERIMENTAL RESULTS
Fig. 4. Area versus error trade-off: Plot of normalized LUTs versus EM%
for the five benchmarks.
ROY AND BANERJEE: AN ALGORITHM FOR TRADING OFF QUANTIZATION ERROR WITH HARDWARE RESOURCES FOR MATLAB-BASED...
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
CONCLUSIONS
ACKNOWLEDGMENTS
This research was supported by the US Defense Advanced
Research Projects Agency under contract no. F33615-01-C1631. This work was done while the authors were at the
Electrical and Computer Engineering Department of Northwestern University. A preliminary version of this paper was
published in the Proceedings of the 41st Design Automation
Conference, 2004.
REFERENCES
[1]
[2]
[3]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
895
896
VOL. 54,