Beruflich Dokumente
Kultur Dokumente
in Digital Hardware
Przemysaw M. Szecwka, Piotr Malinowski
*
Faculty of Microsystem Electronics and Photonics
Wrocaw University of Technology
Wrocaw, Poland
przemyslaw.szecowka@pwr.wroc.pl
*)
now with University School of Physical Education in Wrocaw, Poland
AbstractSingular Value Decomposition is classified among
the most effective numeric methods of matrices inversion. The
paper presents a study of hardware implementation of SVD and
CORDIC algorithms. Various digital architectures were
proposed and compared, including low-cost sequential and high-
performance pipelined solutions. Fixed point and floating point
arithmetic was considered. The concepts were implemented in
VHDL, verified and synthesized with Xilinx tools. Selected
approach was physically implemented and tested.
Index TermsCORDIC, SVD, digital, hardware, VHDL,
FPGA
I. INTRODUCTION
Processing of matrices, especially inversion remains a key
challenge for contemporary computing machines. Very smart
algorithms were proposed many years ago, by the scientists
who expected rapid development of digital hardware in the
future. Many of those solutions were presumed to work on
futuristic parallel devices. CORDIC and Singular Value
Decomposition (SVD) are good examples here [1-3].
Eventually recent years have brought the long expected rapid
development of digital hardware and growth of programmable
logic devices complexity. There is growing interest in
construction of dedicated digital hardware, according to more
or less classic concepts [4-7].
This paper describes a study of hardware implementation of
Singular Value Decomposition of matrix based on replicated
CORDIC modules. The authors focus on comparison of
architecture variants in the context of resource allocation, speed
and accuracy. Similar works may be found in contemporary
literature [8] showing growing interest in practical use of
achievements of great mid XX-th century mathematicians.
II. CORDIC AND SVD OVERVIEW
CORDIC algorithm (Coordinate Rotation Digital
Computer) was proposed by Volder in 1959 [2]. Initially it was
used to transform polar to perpendicular coordinates and
reverse. Then CORDIC was extended to provide estimation of
hyperbolic and exponential function, calculation of square root
and other numeric applications. Nowadays it is extensively
used in digital signal and data processing like DFT [7] and
SVD [5]. I.e. it is quite universal tool which may be applied in
many variants and configurations. In general CORDIC consists
in iterative rotations of a vector with a predefined series of
constant angles. The angles decrease in a special manner
forming a series: 45
, 26.7
, 14
, 7.1
, 3.57
etc. Consecutive
rotations are left or right depending on target and actual result.
With growing number of rotations n the increase in accuracy is
obtained. This generic schematic may be applied in various
modes, depending on needs. If the target is rotation with
defined angle, a series of rotations is performed. For 2-
dimensional space, where the [x
0
, y
0
]
T
vector is to be rotated by
an angle of
0
z , after n iterations, the new coordinates are:
| |
0 0 0 0
sin cos
1
z y z x
K
x
n
n
= (1)
| |
0 0 0 0
sin cos
1
z x z y
K
y
n
n
+ = (2)
whilst the final rotation angle 0 =
n
z .
In vector mode CORDIC determines the angle between [x
0
,
y
0
]
T
vector and X axis. After series of dummy iterative
rotations the new coordinates would be
2
0
2
0
1
y x
K
x
n
n
+ = (3)
0 =
n
y (4)
and
|
|
.
|
\
|
=
0
0
arctg
x
y
z
n
. The product of algorithm in such case
however is numerical value of z
n
determined by cumulated sum
of angles (+/- for left/right) applied for consecutive rotations.
Singular Value Decomposition of a matrix consists in
finding a series of singular values
l
, , !
2 1
which simplify
MIXED DESIGN
MIXDES 2010, 17
th
International Conference "Mixed Design of Integrated Circuits and Systems", June 24-26, 2010, Wrocaw, Poland
*QTv`B;?i kyRy #v .2T`iK2Mi Q7 JB+`Q2H2+i`QMB+b *QKTmi2` a+B2M+2- h2+?MB+H lMBp2`bBiv Q7 GQ/x kjd
inversion of matrix. For each matrix
n m
M
,
R e there exist
orthogonal matrices
m m
U
,
R e and
n n
V
,
R e , for which
n m
l
T
, , MV U
,
2 1
R ) diag( e = E = ! (5)
where l = min(m,n), and for r = rank(A) the diagonal values
fulfill conditions
0
2 1
> > > >
r
! (6)
0
2 1
= = = =
+ + l r r
! (7)
A pseudo-inverse matrix M
+
may be determined by
T
U V M
+ +
E = (8)
where E
+
is a pseudo-inverse of diagonal matrix, i.e. it is
diagonal matrix formed by inverted (when non-zero) values of
l
, , !
2 1
. SVD is currently classified among the most
efficient numerical methods of matrices inversion. SVD may
be performed by the appropriate rotation of a matrix. For a
basic 2x2 matrix
(
=
d c
b a
M the rotation angle is
|
.
|
\
|
+
a d
b c
arctg .
This operation may be done by double use of CORDIC in two
modes. First the appropriate angles are determined and then the
rotations are performed. Due to the properties of CORDIC the
iterations may be described by combinations of
adding/subtracting and shifts of bits:
) ( SHIFT
1 i i i i i
y x x + =
+
o (9)
) ( SHIFT
1 i i i i i
x y y =
+
o (10)
where
i
= +/-1 denotes left or right shift. Eventually hardware
implementation of CORDIC consists of adders, subtractors and
muxes.
Figure 1. CORDIC - sequential architecture
kj3
III. CORDIC ARCHITECTURE
Two variants of CORDIC architectures are presented in
Fig. 1 and 2. Both solutions are full-synchronous with single
clock. In the first - sequential approach, arithmetic modules are
shared by iterations. Intermediate results are fed back via the
registers and the appropriate angles are delivered to arithmetic
units by the muxes. Control is provided by iteration counter.
Another concept is pipelined architecture presented in Fig. 2.
Schematic shows a hardware providing 3 consecutive
iterations. Arithmetic blocks are replicated for each iteration,
thus the data flow may form a pipeline. This solution provides
much faster throughput but needs more hardware resources. On
the other hand the control circuitry is more simple for this
solution, leading to some savings and much higher clocking
speed available. The two concepts were implemented in VHDL
[9], verified and synthesized with Xilinx ISE [10] tools for
Virtex-5 programmable device. Arithmetic is fixed point with
8-bit numbers coded in 2complement. Synthesis results
summarized in Table 1. show clearly the difference between
the low-cost and high-speed approach.
TABLE I. SYNTHESIS RESULTS FOR 2 VARIANTS OF CORDIC
ARCHITECTURES
Sequential Pipelined
Number of Slice Registers 56 208
Number of Slice LUTs 151 243
Clock frequency 257 MHz 428 MHz
Levels of Logic 10 2
Delay 3,891 ns 2,336 ns
Delay on Logic 1,612 ns (41,4%) 0,659 ns (28,2%)
Delay on Route 2,279 ns (58,6%) 1,677 ns (71,8%)
Figure 2. CORDIC pipelined architecture.
IV. SVD ARCHITECTURE
General concept of SVD architecture based on CORDIC
modules is presented in Fig. 3. The input is a basic 2x2 matrix.
The primary output are two singular values, secondary output
are rotation angles. This module, either replicated or reused
may be applied for construction of dedicated devices working
with bigger matrices. Detailed schematic of vector rotation
block is presented in Fig. 4. It is a synchronous machine based
on a single CORDIC element reused for consecutive iterations.
The CORDIC output is fed back to the input via the register
until the final value is obtained and latched. Rotation angle is
delivered by the module shown in Fig. 5. Arithmetic block is
reused again for consecutive iterations, thus the output is fed
back. The appropriate angles for elementary rotations are
stored in a memory. Control of data flow in these two modules
is provided by the Finite State Machine working together with
iteration counter. Schematic of FSM is presented in Fig. 6. The
initial neutral state is wait. Activation of the strobe signal
forces calculation of the angle and then the following steps of
processing.
SVD 22
CORDIC
SHIFT-SUM
SHIFT-SUM
CORDIC
SHIFT-SUM
SHIFT-SUM
b
c
d
1
2
p
l
a
Figure 3. Basic SVD architecture composed of CORDIC blocks
kjN
The initial neutral state is wait. Activation of the strobe
signal forces calculation of the angle and then the following
steps of processing. After transition to each state the iteration
counter is activated and counts to predefined value. When the
appropriate number of iterations is reached the FSM transits to
the next state. The two final stages are used to correct the scale
of output values, disturbed during iterative approximations. In
general the machine circulates around all the states with a little
exception for immediate start of new processing with wait state
skipped, on request.
y
1
nreset
clk
c
y
2
x
1
nreset
clk
d
CORDIC
nreset
clk
enable
Out 2
nreset
clk
enable
shiftsum
shiftsum
iteration,
FSM state
iteration, FSM state
iteration, FSM state
iteration,
FSM state
Out 1
Figure 4. SVD architeccture - vector rotation block.
rotation_angle
1
rotation_angle
2
rotation_angle
23
rotation_angle
24
ROM 2429
z
1
nreset
clk
di
angle R
nreset
clk
enable
Z
2