Beruflich Dokumente
Kultur Dokumente
I. Introduction
The reduction of development time by employing a high level language is very much needed for the pro-
gramming of digital signal processors. Especially, C compilers for
oating-point digital signal processors are
gaining acceptance because of the shortened development time and the improved compiler eciency. However,
C compilers for xed-point digital signal processors have met with little acceptance especially because of the
overhead in executing
oating-point operations using xed-point data path. The development of xed-point
programs require scaling of variables to prevent over
ows while maintaining the accuracy [1][2]. However, de-
termining the number of shifts is usually considered as one of the most tedious parts in the use of xed-point
signal processors.
To solve these problems, a program that automatically converts a
oating-point version of C program into a
xed-point version is developed for xed-point digital signal processors. In the proposed development environ-
ment, programmers develop a C application program using a
oating-point arithmetic, then the
oating-point
program is automatically converted to a xed-point version.
At rst, the ranges of the variables in the
oating-point program are estimated by the simulation of the
modied
oating point program which collects the statistical information of the variables. The range of the
variables is determined from the mean and the standard deviation, and is used for assigning the location
of the binary point of the xed-point variables. In the conversion step, all of the
oating-point variables
2
are converted to xed-point type, and the expressions are modied by the scaling codes insertion and the
replacement of the integer multiplications. The integer multiplication of ANSI standard C language has a
problem for implementing xed-point arithmetic because it only keeps the lower half of the multiplied result,
while the xed-point arithmetic needs the upper half for preserving the accuracy.
II. Fixed-point data format and arithmetic rules
For the representation of the xed-point data, the generalized xed-point format [1] which allows arbitrary
binary point location using the following attributes is employed:
In this case, the word-length (WL) is the total number of bits for representing a xed-point variable. The
integer word-length (IWL) is the number of bits to the left of the (hypothetical) binary-point, while the
number of bits to the right side of the binary point is called fractional word-length (FWL). Since each signal
can have a dierent value for the range, a unique integer word-length can be assigned to each variable.
Note that the range (R) and the quantization step (Q) are dependent on the integer word-length as follows.
For example, a 16-bit integer (WL = 16) has the binary point just at the right side of the least signicant bit
(IWL = 15), and thus has the range of ( 215; 215). On the contrary, a 16-bit xed-point number having the
IWL of 4 as shown in Fig. 1-(b) has the range of ( 24; 24) and the quantization step size of 2 11.
The arithmetic rules based on this xed-point data representation are shown in Fig 2. Firstly, when an
assignment is performed between two variables which have dierent IWL's, alignment of the binary point
location is needed. An arithmetic right shift of n-bit corresponds to increasing the IWL by n. For example, a
variable x having the IWL of 2 cannot be directly assigned to another variable y with the IWL of 3. In this
case, y = x should be actually performed as y = x >> 1, and x = y should be x = y << 1.
IWL FWL
x S x S
x>>1 S X y S
x S
y S
x>>1 S
y = x >> 1
+ y S
y S
y<<1 S result S
S S
x S
result
x = y << 1
(a) assignment (b) addition/subtraction (c) multiplication
Secondly, in case of addition and subtraction, not only the IWL's of the input operands but also that of the
result should be considered. In the above example, x + y becomes (x >> 1) + y when the IWL of the result
is 3. If the estimated IWL of x + y is larger than the maximum IWL of the two variables, they are shifted by
one more bit respectively as (x >> 2) + (y >> 1) for preventing over
ows.
Thirdly, in case of multiplication, the IWL of the result becomes the sum of the IWL's of the two variables
plus one as shown in Fig 2-(c) because the result has two sign bits in two's complement multiplication. When
w-bit by w-bit multiplication is performed, a 2w-bit product is obtained. According to the ANSI C grammar,
the lower half part is used in the integer multiplication. But, in a digital signal processing algorithm, most
variables are aligned to left for keeping the precision. Thus, if we want to limit the result to w-bit, the upper
w-bit part should be used.
In recent C compilers for the digital signal processors, the upper w-bit are accessible through the single pre-
cision multiplication [3]. We used this feature of the TMS320C2x/5x compiler for the ecient implementation
of the multiplication. However, in traditional compilers, single to double precision integer conversion is needed
before double precision multiplication. In some compilers, upper w=2-bit by upper w=2-bit multiplication is
supported using intrinsics to get w-bit result [4].
III. Conversion Programs
The design
ow of our conversion software is shown in Fig. 3. The rst step is the range estimation, in
which the range of each
oating-point variable is estimated during the simulation. The second step is the
xed-point C code generation including the type conversion and the scaling code insertion. All the steps are
implemented using the SUIF (Stanford University Intermediate Format) system [5]. The SUIF is used for
parsing the
oating-point C programs, and generating the range estimation and xed-point programs. The
C front end and the SUIF to C converter are used without modication. The other programs are developed
4
Floating-Point
C Program
C front-end C front-end
SUIF-to-C converter
Range Estimation
C Program
Integer
C Program Execution
User
IWL
Informations
with the SUIF and the SUIF builder. The ID assignment in the range estimator and the IWL information in
the converter program are implemented using the annotation function of the SUIF system.
The rst step can be skipped if programmers can estimate the ranges of all the variables by theoretical
analysis such as L1 norm analysis [6][7]. But it can be automatically performed by our simulation based range
estimation. The
oating-point C program is modied by inserting a subroutine call after every assignment
statement as shown in Fig. 4-(b). The subroutine insertion method is about twice faster than the C++
class based programs [8][9]. In the subroutine, the maximum absolute value, the average, and the standard
deviation of the variables are tracked. The subroutine `range' has two parameters. The rst is the variable
which will be tracked, and the second is the unique identication number of the variable that is automatically
assigned by the conversion program. Actually the ID numbers are annotated to the variables in the symbol
5
tables. The generated range estimation program allocates a static storage for keeping the statistical data of
the variables. The tracked information results are reported when the simulation is nished.
In the second step, the type and the expression tree conversions are performed with the IWL information.
The conversion program reads the original
oating-point C program and the IWL information table which is
generated by the range estimation procedure. The type conversion is conducted by modifying the types of the
variables in the symbol tables. The type of the
oat variables are replaced by the integer type. Not only the
oat type but also the
oat based types such as
oat pointers,
oat arrays, and
oat functions are converted
to integer based types. The expression tree conversion is based on the xed-point arithmetic rules described
previously. The instructions in the expression tree are traversed bottom up, and
oating-point instructions are
converted to integer type. The instructions to convert are mul, add, sub, cvt, cpy, lod, str, ldc, cal, array, neg,
seq, sne, sl, and sle. When these instructions are found, the operand of the instructions are examined, and
the number of shifts is determined by the IWL's of the operands and the result. If the instruction is mul, it is
replaced by mulh() function call and implemented by a macro or an intrinsic according to the target compiler
dependent features. Figure 4-(c) shows an example of the converted xed-point C codes. According to the
range estimation results, the IWL's of the variables x, y , and s are 0, 4, and 4 respectively. The
oating-point
constant of 0.9 is converted to an integer constant of 29491, which is a 16-bit xed-point number with IWL of
0. The multiplication uses mulh() macro to obtain the upper 16-bit result, and the IWL of the multiplication
result is set to 5, which is the sum of the IWL's of two operands plus one. Since the IWL of x is 0, x should
be 5-bit right shifted to add with the multiplied result. The add result is 1-bit left shifted to assign to y whose
IWL is 4.
Finally, the converted xed-point C code is tested if it produces runtime over
ows. The over
ows can be
produced only in the shift left operations which are used for aligning the binary point in the assignment
statements. The macro sll() shown in Fig. 4-(c) can check over
ows and report their locations. If over
ows
occur, the programmer should modify the IWL information table to eliminate them. The sll() macro can be
replaced by the shift only macro in the nal implementation.
IV. Implementation Examples
A fourth order IIR lter and an ADPCM codec are implemented for the TMS320C5x using the developed
oating-point to xed-point C converter. We used the TMS320C2x/5x optimizing C compiler from Texas
Instruments (version 6.60) [3] to compile the
oating-point and the xed-point programs.
Table I shows the implementation results for a fourth order IIR lter. The signal to quantization noise ratio
of the xed-point implementation is 49.3 dB, which is quite acceptable in most communication applications.
The xed-point implementation based on the proposed translator is 13.8 times faster than the
oating-point
implementation.
6
float iir1(float x)
static float s = 0;
float y;
y = 0.9 * s + x;
s = y;
return y;
static float s = 0;
float y;
y = 0.9 * s + x;
range(y,0);
s = y;
range(s,1);
return y;
static int s = 0;
int y;
s = y;
return y;
TABLE I
The performance results of a fourth order IIR filter.
oating-point xed-point
SQNR - 49.3 dB
machine cycles 2980 215
TABLE II
The performance results of an ADPCM codec.
test signal
oating-point xed-point (16-bit) xed-point (32-bit)
1 15.51dB 12.42dB 15.29dB
2 19.11dB 14.23dB 18.92dB
3 18.95dB 14.58dB 18.53dB
4 21.43dB 16.91dB 22.56dB
machine cycles 125249 26718 61401
The ADPCM implementation is based on the G.721 standard. The ADPCM program consists of three
les and the number of lines is 771. This shows that our conversion software can handle reasonably complex
digital signal processing programs. The performance is measured by the signal to noise ratio of the input
and the reconstructed speech signal as shown in Table II. There are two versions of the xed-point programs
in the table. The rst is a 16-bit version whose variables are all 16-bit integers. It shows about 4 dB
performance degradation. It should be possible to reduce the performance degradation by employing signal
dependent scaling scheme, such as the block
oating-point implementation. The 32-bit version shows a better
performance than the 16-bit version but it requires much more cycle time because the 32-bit arithmetic
operations are implemented with subroutine calls. The single precision xed-point implementation is 4.7
times faster than the
oating-point implementation.
V. Concluding Remarks
In this paper, a
oating-point C program translator is developed by using the SUIF system. The translator
not only converts a
oating-point operation into an integer type, but supports automatic scaling as well. The
implementation results show that the translator can provide an acceptable compromise to the users of the
xed-point digital signal processors in terms of SQNR, execution speed, and the development eort.
References
[1] Seehyun Kim and Wonyong Sung, \A Floating-point to Fixed-point Assembly Program Translator for the TMS 320C25,"
IEEE Trans. on Circuits and Systems, vol. 41, no. 11, pp. 730{739, Nov. 1994.
8
[2] Jiyang Kang and Wonyong Sung, \Fixed-Point C Compiler for TMS320C50 Digital Signal Processor," in Proceeding of the
International Conference on Acoustics, Speech, and Signal Processing '97, Apr. 1997, pp. 707{710.
[3] TMS320C2x/C2xx/C5x Optimizing C Compiler, Texas Instruments Inc., Houston, 1995.
[4] TMS320C6x Optimizing C Compiler, Texas Instruments Inc., Houston, 1997.
[5] The SUIF Library, Stanford Compiler Group, 1994.
[6] Leland B. Jackson, \On the Interaction of Roundo Noise and Dynamic Range in Digital Filters," The Bell System Technical
Journal, pp. 159{183, Feb. 1970.
[7] Christos Caraiscos and Bede Liu, \A Roundo Error Analysis of the LMS Adaptive Algorithm," IEEE Trans. on Acoustics,
Speech, and Signal Processing, vol. 32, no. 1, pp. 34{41, Feb. 1984.
[8] Seehyun Kim, Ki-Il Kum, and Wonyong Sung, \Fixed-Point Optimization Utility for C and C++ Based Digital Signal
Processing Programs," in Proceeding of 1995 IEEE Workshop on VLSI Signal Processing, Oct. 1995, pp. 197{206.
[9] M. Willems, V. Buersgens, H. Keding, and H. Meyr, \FRIDGE: An Interactive Fixed-Point Code Generation Environment
for HW/SW CoDesign," in Proceeding of the International Conference on Acoustics, Speech, and Signal Processing '97, Apr.
1997.