Beruflich Dokumente
Kultur Dokumente
PAGE 2 OF 7
etc. Algorithm kernels offer several compelling advan-
tages as benchmarks: Example
Function Description
• Relevance. Algorithm kernels can be selected by Applications
examining DSP applications and focusing on those Finite impulse
portions of the applications that account for the larg- response filter that Speech process-
Real Block
est share of the processing time. This guarantees operates on a block of ing (e.g., G.728
FIR
their relevance. real (not complex) speech coding).
data.
• Ease of specification. By virtue of their modest
size, algorithm kernels can be well-defined: a speci- FIR filter that oper-
Complex Modem channel
ates on a block of
fication can state their input and output require- Block FIR equalization.
complex data.
ments, include test vectors to verify functional
conformance, and indicate which algorithm variants FIR filter that oper- Speech process-
Real Single-
ates on a single sam- ing, general filter-
and optimizations are allowable. For example, there Sample FIR
ple of real data. ing.
are many techniques for implementing a FFT. With-
out specifying the exact type of FFT, one cannot Least-mean-square Channel equaliza-
LMS adaptive filter; oper- tion, servo con-
fairly compare two processors’ FFT execution
Adaptive FIR ates on a single sam- trol, linear
times. ple of real data. predictive coding.
• Optimization. Because algorithm kernels are of a Infinite impulse
moderate size, a skilled programmer can write the Audio process-
response filter that
code in assembly language and be fairly certain that IIR ing, general filter-
operates on a single
ing.
his or her implementation is optimal, or very close to sample of real data.
optimal, on a given processor. Convolution, cor-
• Ease of implementation. Due to their moderate relation, matrix
Sum of the pointwise
size, algorithm kernels can be implemented in a rea- Vector Dot multiplication,
multiplication of two
sonable amount of time, even with thorough optimi- Product multi-dimen-
vectors.
zation. sional signal pro-
cessing.
The BDTI Benchmarks™, the basic suite of algo- Pointwise addition of Graphics, com-
rithm kernels used in BDTI’s DSP processor bench- Vector Add two vectors, produc- bining audio sig-
ing a third vector. nals or images.
marking, are shown in Table 1. BDTI calculates
execution time, memory usage, and energy consumption Find the value and Error control cod-
for each benchmark. Most of the benchmarks involve Vector location of the maxi- ing, algorithms
Maximum mum value in a vec- using block float-
transforming an input data set into an output data set. The
tor. ing-point.
exception is the Control benchmark, in which the proces-
sor must execute a contrived sequence of operations, Decode a block of bits
Viterbi Error control cod-
that has been convolu-
such as conditional branching and subroutine calls, that Decoder ing.
tionally encoded.
are commonly needed in control code. As DSP applica-
tions become more complex and system designers try to A sequence of con- Virtually all DSP
trol operations (test, applications
achieve higher levels of system integration, DSP proces- Control
branch, push, pop, include some con-
sors will increasingly be called upon to perform control and bit manipulation). trol code.
functions.
Fast Fourier Trans- Radar, sonar,
With the exception of the control benchmark, BDTI 256-Point form converts a time- MPEG audio
optimizes each benchmark for execution time. The Con- In-Place FFT domain signal to the compression,
trol benchmark is optimized for memory usage since frequency domain. spectral analysis.
memory usage is usually a greater concern than speed for Unpacks variable- Audio decom-
control code. Bit Unpack length data from a bit pression, proto-
stream. col handling.
Measuring Algorithm Kernel Execution
TABLE 1. BDTI Benchmarks.
There are several ways to measure a processor’s per-
formance on an algorithm kernel benchmark. Cycle-ac-
curate software simulators usually provide the most
PAGE 3 OF 7
convenient method for determining cycle counts. A cy- yielding benchmark results that are not what might be
cle-accurate simulator models a processor’s execution of expected from a simple comparison of MIPS.
instructions and keeps accurate cycle counts by making Texas Instruments’ TMS320C6203 is a VLIW-based
appropriate adjustments when factors such as pipeline processor that can issue and execute up to eight instruc-
interlocking or bus contention slow the operation of the tions per instruction cycle. Hence, at the 300 MHz clock
processor. Software simulators offer a controlled, flexi- rate shown here, it has a MIPS rating of 2400 MIPS.
ble, and interactive environment for testing and optimiz- However, its relative speed compared to another Texas
ing code. Some software simulators include support for Instruments DSP processor, the architecturally conven-
macros or scripts that can automate performance mea- tional TMS320C5416, is not nearly as high as the differ-
surement and functionality verification and allow engi- ence between the two processors’ MIPS ratings suggests.
neers to quickly see how code changes affect a Despite a MIPS ratio of 15:1, the TMS320C6203 exe-
processor’s performance. cutes the FFT benchmark only 7.8 times faster than the
Hardware-based application development tools can TMS320C5416. A major reason for this is that
also be used to measure execution time and are needed to TMS320C6203 instructions are simpler than
precisely gauge energy consumption. Hardware tools, TMS320C5416 instructions, so the TMS320C6203 re-
such as emulators, allow the user to download code from quires more instructions to accomplish the same task. In
a PC to the target processor. Using a debugger, most addition, the TMS320C6203 is not always able to make
hardware emulators allow the processor to step through use of all of its available parallelism because of limita-
the code line by line, or to run the code until a breakpoint tions such as data dependencies and pipeline effects.
is reached. This example illustrates why using MIPS ratings to
Code can be run in continuous loops on development compare the performance of different processors may be
boards to measure energy consumption. Energy con- misleading, and why BDTI believes that algorithm ker-
sumption is measured by isolating the power going to the nel-based benchmarking provides much more meaning-
DSP processor from the power going to other system ful results with which to compare processor
components, running a benchmark in a repeating loop, performance.
and using a current probe to record the time-varying in- Of course, one must be cautious when interpreting
put current under carefully controlled conditions. benchmark results. For example, a processor’s data word
Such energy consumption measurements can be width affects memory usage as well as numerical accu-
time-consuming and difficult. A less accurate but easier racy. The benchmark results for a finite impulse re-
alternative is to obtain a credible estimate of typical pow- sponse filter implemented on a 24-bit processor might
er consumption and multiply it by the time taken to exe- show 50% more data memory usage than the same filter
cute a benchmark. This is the approach taken by BDTI.
Determining benchmark performance for new pro- 70
cessors without software or hardware development tools 60
is a tedious and error-prone process. One must manually Fixed-Point Floating-Point
calculate the time required to execute each instruction in 50
the benchmark and be careful to check that the bench- 40
marks are functionally correct. Because pipeline inter- 30
locks or bus conflicts can slow execution time, the
20
processor architecture must be thoroughly understood
before instruction execution times are calculated. 10
0
Benchmark Results
TMS320C5416
TMS320C6203
TMS320C6701
Pentium III-C
(160 MIPS)
DSP56311
Pentium III
(300 MHz)
(300 MHz)
(167 MHz)
MSC8101
(1 GHz)
PAGE 4 OF 7
implemented on a 16-bit processor. This increased mem- A processor’s performance on an application is esti-
ory usage is a result of the extended precision of the 24- mated by combining the results of the benchmarks with
bit data. In fact, since the 24-bit processor is calculating the results of the application profiling. Multiplying the
the filter result to 50% greater precision, the 24-bit pro- benchmark execution times by the number of occurrenc-
cessor is in a sense performing more work—a fact not re- es of each benchmark (or a similar algorithm kernel)
flected in the benchmark results. If the application needs yields an estimate of the time required to execute the ap-
additional precision, the 24-bit processor may be an ex- plication. Comparing the application execution time es-
cellent choice. On the other hand, if 16-bit precision is timates of different processors allows an engineer to
sufficient, then the 24-bit processor may be a poor choice gauge the relative suitability of each processor for the ap-
because it consumes more data memory. plication.
(120 MIPS)
DSP56652
(75 MIPS)
(70 MIPS)
DSP1620
PAGE 5 OF 7
Control benchmark can be used to compare processor's We expect that DSP systems will continue to become
relative efficiency at executing control code. more sophisticated and demand greater computational
performance. At the same time, semiconductor vendors
Other Considerations will continue to develop more powerful DSP processors
Although performance is a leading consideration, and integrate these processors with other system compo-
many other factors affect the choice of a DSP processor. nents such as microcontrollers and peripherals. As sys-
Application development tools, for instance, cannot be tems become more complicated and processor choices
overlooked. Without effective application development grow, designers will need good estimates of a proces-
tools, writing application software can be difficult no sor’s DSP performance. The methodology outlined
matter how strong the processor’s performance. Like- above will be an excellent starting place for calculating
wise, chip vendor and third-party application engineer- these estimates.
ing support can be invaluable when problems arise.
References
Additionally, designers cannot overlook physical size
considerations and must choose a processor that is avail- [1] Buyer’s Guide to DSP Processors, Berkeley, Cali-
able in an appropriate package. fornia: Berkeley Design Technology, Inc., 1994,
Cost is another critical concern. There are two ways 1995, 1997, 1999, 2001. This 846-page technical
to view the ratio of cost to performance. In some instanc- report discusses DSP benchmarking methodology in
es, additional performance beyond the required mini- detail and contains extensive benchmarking data for
mum will remain unused. In this situation, designers popular DSP processors. The report provides execu-
typically seek the lowest-cost processor with adequate tion-time application profiling data from several
performance. At other times, the excess performance common DSP applications. Excerpts from this
may allow additional features to be added to the product. report, as well as a pocket guide to DSP processors,
Or, the designer may want a line of code-compatible are available at www.BDTI.com.
DSP processors with performance levels appropriate for [2] Inside the StarCore SC140, Berkeley, California:
different members of an entire product line. In this situ- Berkeley Design Technology, Inc., 2000. This report
ation, a cost-execution time product metric (the execu- provides a comprehensive qualitative analysis of the
tion time of a processor multiplied by the unit cost) may StarCore SC140's architecture and features, along
be useful. Figure 3 shows the cost-execution time prod- with a quantitative analysis based on results from
uct of several processors on BDTI’s FFT benchmark. BDTI's DSP benchmark suite. The SC140's perfor-
mance is compared to that of key competitors, with
Designers must also remember that minimizing sys-
benchmark results analyzed in terms of underlying
tem cost may not always mean minimizing DSP proces-
sor cost. For example, one processor may use memory
more efficiently than a slightly less expensive processor. 4000
Fixed-Point Floating-
If the lower memory usage can eliminate one memory 3500 Point
chip from the system, the more expensive processor may 3000
minimize system cost. Designers must also weigh the 2500
cost of engineering time and carefully consider how the
2000
quality of application development tools will affect
1500
product development schedules.
1000
Lessons Learned 500
0
There is no easy way to evaluate DSP processor per-
formance meaningfully. Traditional performance units
TMS320C5416
TMS320C6203
TMS320C6701
(160 MIPS)
DSP56311
(300 MHz)
(300 MHz)
(167 MHz)
MSC8101
PAGE 6 OF 7
architectural strengths and weaknesses. The report BDTI customers include:
includes coverage of Motorola’s SC140-based
3Com Mentor Graphics
MSC8101.
ARM Microsoft
[3] Phil Lapsley, Jeff Bier, Amit Shoham, and Edward AMD MIPS
A. Lee, DSP Processor Fundamentals: Architec- Analog Devices Motorola
tures and Features, Berkeley, California: Berkeley Cadence National Semiconductor
Design Technology, Inc., 1996. An introductory Cisco NEC
textbook on DSP processor architectures which dis- Compaq Nokia
cusses how processor architecture affects perfor- Conexant Philips
mance. CSF Thomson Principal Financial
Dow Chemical Raytheon
About Berkeley Design Technology DSP Group RealNetworks
Berkeley Design Technology, Inc. (BDTI) is a soft- E.M. Warburg Pincus Replay Networks
ware and technical services company focused on digital Ericsson STMicroelectronics
signal processing (DSP) technology. The company was Fujitsu Sony
founded by U.C. Berkeley faculty and researchers. Hewlett-Packard StarCore
Hitachi Sun Microsystems
BDTI specializes in the analysis, benchmarking,
IBM Synopsys
evaluation, and development of technology used to im-
Infineon Technologies Texas Instruments
plement DSP applications. Specifically, the company:
IDT VLSI Technology
• Performs in-depth technical evaluations of micro- Intel Wind River Systems
processors. LSI Logic Xilinx
• Develops DSP application software and firmware. Lucent Technologies Zoran
• Publishes technical reports and books on DSP tech-
nology, including Buyer's Guide to DSP Processors,
Inside the StarCore SC140, and DSP Processor Fun- BERKELEY DESIGN TECHNOLOGY, INC.
damentals. 2107 Dwight Way, Second Floor
• Analyzes DSP algorithms and applications. Berkeley, CA 94704 U.S.A.
• Evaluates design tools and advises on tool selection
+1 (510) 665-1600
and design methodologies.
Fax: +1 (510) 665-1680
Email: info@BDTI.com
• Develops specifications and recommendations for web: www.BDTI.com
new DSP processors, software, and tools.
• Provides DSP-related training classes. International Representatives
JAPAN
Shinichi Hosoya
Japan Kyastem Co.
Tokyo, Japan
+81 (425) 23 7176
Fax: +81 (425) 23 7178
bdt-info@kyastem.co.jp
PAGE 7 OF 7