Cariow Witczak An Fpga Mam 7 2015

370  Measurement Automation Monitoring, Jul. 2015, vol. 61, no.
07
Aleksandr CARIOW, Galina CARIOWA, Mira WITCZAK

WEST POMIERANIAN UNIVERSITY OF TECHNOLOGY, SZCZECIN
49 Żołnierska St., 71-210 Szczecin
An FPGA-oriented fully parallel algorithm for multiplying

dual quaternions
Abstract reveals the products of the quaternion units and the dual units
associated with each component of the 8-tuple.
This paper presents a low multiplicative complexity fully parallel
algorithm for multiplying two dual quaternions. The “pen-and-paper”
multiplication of two dual quaternions requires 64 real multiplications and Q  q0  iq1  jq2  kq3  q~0  iq~1  jq~2  kq~3
56 real additions. More effective solutions still do not exist. We show how
to compute a product of two dual quaternions with 24 real multiplications where
and 64 real additions. During synthesis of the discussed algorithm we use
the fact that the product of two dual quaternions can be represented as
a matrix–vector product. The matrix multiplicand that participates in the
i 2  j 2  k 2  1 , ij  k   ji , jk  i  kj ,
product calculating has unique structural properties that allow performing
2
its advantageous factorization. Namely this factorization leads to ki  j   jk ,   0 .
significant reducing of the multiplicative complexity of dual quaternion
multiplication. We show that by using this approach, the computational
process of calculating dual quaternion product can be structured so that The results of all possible products of dual quaternion imaginary
eventually requires only half the number of multipliers compared to the units can be summarized in the following table [6]:
direct implementation of matrix-vector multiplication.
Tab. 1. Table of multiplication of dual quaternion imaginary units
Keywords: dual quaternion product, fast algorithms, hardware complexity
reduction, FPGA implementation. 1 i  i
j j k
k
1 1 i  i
j j k
k
1. Introduction i i 1 k  j i   k  j
j j  k 1 i j  k   i
Today, hypercomplex algebras are increasingly being used to
enhance the effectiveness of the solution of problems in various k k j  i  1 k j  i  
scientific and technological areas. Dual quaternions, in particular,   i j k 0 0 0 0
are widely used in biomechanics, robotics, skeletal animation and i i   k  j 0 0 0 0
many other applications of 3D computer graphics applications that j j  k   i 0 0 0 0
require data processing in real time [1-9]. k k j  i   0 0 0 0
It should be noted that in the implementation of numerical
algorithms using hypercomplex representation of the data, the
multiplication is the most time-consuming and labor-intensive.
This is because the multiplication of two hypercomplex numbers Suppose we have to compute the product of two dual
requires performing many real multiplications and real additions. quaternions
It is easy to verify that the complexity of such a multiplication is Q3  Q1Q2 ,
proportional to the square of its dimension. In particular, the
multiplication of the dual quaternion requires 64 real where
multiplications and 56 real additions. Therefore to speed up the
calculations, it is appropriate to use hardware FPGA-accelerators. Q1  x0  ix1  jx2  kx3  x4  ix5  jx6  kx7 ,
Most modern-day high-end FPGA targets contain a number of
Q2  b0  ib1  jb2  kb3  b4  ib5  jb6  kb7 ,
embedded dedicated multipliers. Thus, instead of mapping
a multiplier into several logic gates, dedicated multipliers Q3  y0  iy1  jy2  ky3  y4  iy5  jy6  ky7 .
provided on the FPGA fabric can be used. So, all multiplications
involved in an implementation of the fully parallel algorithm can We can see that the “pen-and-paper” method of multiplication of
efficiently be implemented using these embedded multipliers. two dual quaternions requires 64 real multiplications and 56 real
However, their number may be simply not enough to meet additions.
demanded fully parallel implementation of the algorithm. The We affirm that the multiplication of two dual quaternions can be
designer uses embedded multipliers to implement multiply represented by the following matrix-vector product:
operations until he occupies all the embedded multipliers. If the
FPGA target runs out of embedded multipliers, the designer uses Y81  B8 X81 , (1)
generic logic gates instead, and the multiplication implementation
becomes expensive in terms of FPGA resource usage. In some
where
cases, therefore, available logic has to be exploited to implement
multipliers, seriously restricting the maximum number of real
multiplications that can be implemented in parallel on a target X81  [ x0 , x1 , x2 , x3 .x4 , x5 , x6 , x7 ] ,
device. This will lead to significant difficulties during
implementation of the computation unit. Therefore the problem of
Y81  [ y0 , y1 , y2 , y3 .y4 , y5 , y6 , y7 ] ,
reducing the number of multiplications in the fully-parallel
hardware-oriented algorithms is critical.
B (0 ,0) B(40 ,1) 
2. Statement of the problem B8   4(1,0) ,
 B 4 B(41,1) 
We can represent any dual quaternion Q as an 8-tuple
(q0 , q1, q2 , q3 , q~0 , q~1, q~2 , q~3 ) . The following representation of Q
Measurement Automation Monitoring, Jul. 2015, vol. 61, no. 07  371
b0 b1 b2 b3   1 

 
 b2 
 
b b0 b3 1
B (40 ,0)  B (41,1)  1 , 
b2  b3 b0 b1  1   1 
     
b3 b2  b1 b0  1 1
P4   , P  .
 1  4 1 
   
b4 b5 b6 b7   1   1 
   1 
b b4 b7  b6 
B (41,0)  5 , B(40 ,1 )  04 ,  
b6  b7 b4 b5 
   1 
b7 b6  b5 b4 
~ 1
D24  diag(2I 4 , I8 ,2I 4 ) .
and 0 N  M is an M  N matrix of zeros (a matrix where every 4
element is equal to zero).

Fig. 1 shows a data flow diagram of the new algorithm for
Taking into account that the B (40,1)  04 , direct realization of (1) multiplying two dual quaternions and Fig. 2 depicts a data flow
requires only 48 real multiplications and 40 real additions. Despite ~
diagram of the process for calculating the matrix D24 entries. In
the fact that the computational complexity is reduced, the number
this paper, the data flow diagrams are oriented from left to right.
of multiplications is still large. Below we shall present the
The straight lines in the figures denote the operations of data
algorithm, which reduces arithmetical complexity to 24 real
transfer. The circles in these figures show the operation of
multiplications and 64 real additions.
multiplication by a number inscribed inside the circle. The points
where lines converge denote summation. We use the usual lines
3. The algorithm without arrows on purpose, so as not to clutter the picture.
The proposed algorithm can be written with the help of the

following matrix-vector calculating procedure:
s0
Y81  D8 Σ 816 W16 Σ16 24 D 24 P24 20 W20 P 208 X81 (2) s1
where s2
s3
P208  [P48 ,I8 ,I8 ] , P48  (I 4
0 4 ) ,
s4
2 s5
 (i )
W20  I 4  Δ8  I8 , Δ8  H 4  H 4 , P24 20  P ,
 8 20 s6
i 0
s7
P8(020
)
 (I 8 (1)
012 ) , P8 20  (0 4  08 ) ,
I 8
x0 s8 y0
x1 s9 y1
P8(220
)
 (012 
I 8 ) ,
x2 s10 y2
Σ1624  I8  ( 112  I 4 )  ( 112  I 4 ) , x3 s11 y3
Δ8 Δ8
W16  I 4  Δ8  I 4 , Σ816  ( 112  I 4 )  ( 112  I 4 ) , x4 s12 y4
x5 s13 y5
where H 4 - is a Hadamard matrix of order 4, 1M  N - is an x6 s14 y6
M  N matrix of ones (a matrix where every element is equal to
x7 s15 y7
one), I N - is an identity N  N matrix, signs „  ”, “  ”denote
the Kronecker product and direct sum of two matrices, s16
respectively, signs 

,  denote vertical and horizontal s17
concatenation of the two or more matrices, respectively [10].
s18
D8  diag ( 1,1,1,1,1,1,1,1) , D 24  diag( s0 , s1 ,..., s23 ) . s19
s20
If the elements of D 24 are placed vertically without disturbing
the order and written in the form of the vector S 241 , then they s21
can be calculated using the following vector-matrix procedure: s22
~ ~
S 241  D 24 P24 20 W20 P 208 B81 (3) s23
Fig. 1. The data flow diagram of the proposed algorithm
~
B81  [b0 ,b1 ,b2 ,b3 .b4 ,b5 ,b6 ,b7 ] , W20  P4  Δ8  P8 ,
372  Measurement Automation Monitoring, Jul. 2015, vol. 61, no. 07
s0 [3] Feng X., Wan W.: Real time skeletal animation with dual quaternion.
2 Journal of Theoretical and Applied Information Technology. v. 49,
2 s1 no.1, pp. 356-362, 2013.
[4] Pham H. L, Perdereau V., Adorno B.V., and Fraisse P.: Position and
2 s2 orientation control of robot manipulators using dual quaternion
feedback. IEEE/RSJ International Conference on Intelligent Robots
2 s3 and Systems, Taipei, Taiwan, China. 18-22 October 210, pp. 658–663.
14 s4 2010.
[5] Torsello A.: Point Invariance of the Screw Tension Minimizer.
14 s5 Università Ca’Foscari Venezia, Dipartimento di Scienze Ambientali
Informatica e Statistica, Technical Report Series, Rapporto di Ricerca
14 s6 DAIS-2011-2, pp. 1-7, 2011.
s7 [6] Kavan L., Collins S., Žára J. J., and O’Sullivan C.: Geometric
14
skinning with approximate dual quaternion blending. ACM Trans.
b0 14 s8 Graph., 27(4):105, pp. 2442-2443, 2008.
[7] Torsello A., Rodolà E., and Albarelli A.: Multiview Registration via
b1 14 s9 Graph Diffusion of Dual Quaternions. IEEE Conference on Computer
b2 Vision and Pattern Recognition, 20-25 June 2011, pp. 2441-2448,
14 s10
2011.
b3 14 s11 [8] Mukundan R.: Advanced Methods in Computer Graphics: With
Δ8 examples in OpenGL. Springer-Verlag London Limited, 2012.
b4 14 s12 [9] Kenwright B.: Dual-Quaternions: From Classical Mechanics to
b5 Computer Graphics and Beyond, 1-11, Source:www.xbdev.net
14 s13 [10] Ţariov A.: Algorytmiczne aspekty racjonalizacji obliczeń w cyfrowym
b6 14 s14 przetwarzaniu sygnałów, Wydawnictwo Zachodniopomorskiego
Uniwersytetu Technologicznego, 2011.
b7 14 s15 _____________________________________________________
Received: 02.04.2015 Paper reviewed Accepted: 02.06.2015
2 s16
2 s17
2 s18 Prof. Aleksandr CARIOW, DSc, PhD
2 s19 He received the Candidate of Sciences (PhD) and Doctor

of Sciences degrees (DSc) in Computer Sciences from
2 s 20 LITMO of St. Petersburg, Russia in 1984 and 2001,
respectively. In September 1999, he joined the faculty of
Computer Sciences at the West Pomeranian University of
2 s 21 Technology, where he is currently a professor and chair of
the Department of Computer Architectures and
2 s 22 Telecommunications. His research interests include
digital signal and image processing algorithms, VLSI
2 s 23 architectures, and data processing parallelization.
e-mail: acariow@wi.zut.edu.pl
Fig. 2. The data flow diagram for calculating elements of the diagonal matrix D24
Galina CARIOWA, PhD
4. Conclusions She received the MSc degrees in mathematics from

Moldavian State University, Chişinău in 1976 and PhD
degree in computer science from West Pomeranian
The paper presents a new FPGA-oriented algorithm for University of Technology, Szczecin, Poland in 2007.
multiplying two dual quaternions. To reduce the hardware She is currently working as an assistant professor of the
Department of Multimedia Systems. Her scientific
complexity (number of embedded multipliers), we exploit the interests include numerical linear algebra and digital
specific properties of the matrix-vector product representation of signal processing algorithms, VLSI architectures, and
dual quaternions multiplication. So, the algorithm requires 24 real data processing parallelization.
multiplications and only 64 real additions (because multiplication
of any vector by the matrix H 4 takes only 8 real additions). e-mail: gcariowa@wi.zut.edu.pl
A completely parallel implementation of dual quaternion multiplier
using the schoolbook (direct) method requires three FGPA-chips Mira WITCZAK, eng.
Spartan-3 XC3S1000-4FT256, while the implementation of the
proposed algorithm occupies only one such chip. Mira is a fourth-year undergraduate student in West
Pomeranian University of Technology, Faculty of
Computer Science and Information Technology,
5. References Szczecin. She is a member of the Students' Scientific
Circle "Quaternion", which studies the problems of
processing and recognition signals and images. Her
[1] Pennestrì E., Valentini P. P.: Dual quaternions as a tool for rigid body research interests lie in the development of theory and
motion analysis: a tutorial with an application to biomechanics. practice of digital signal and image processing algorithms,
Archive of Mechanical Engineering, vol. LVII, No 2, pp. 187–205. computer graphics, and multimedia programming. She
expects to graduate in the spring of 2015.
2010.
[2] Jiang F., Wang H.-N, Huang Ch. S.: Algorithm for Relative Position e-mail: mwitczak@wi.zut.edu.pl
and Attitude of Formation Flying Satellites Based on Dual
Quaternion. Chinese Space Science and Technology, v32(3): pp. 20-
26. 2012.

Cariow Witczak An Fpga Mam 7 2015

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Cariow Witczak An Fpga Mam 7 2015

Hochgeladen von

Copyright:

Verfügbare Formate

370  Measurement Automation Monitoring, Jul. 2015, vol. 61, no.

Aleksandr CARIOW, Galina CARIOWA, Mira WITCZAK

An FPGA-oriented fully parallel algorithm for multiplying

b0 b1 b2 b3   1 

element is equal to zero).

The proposed algorithm can be written with the help of the

2 s19 He received the Candidate of Sciences (PhD) and Doctor

4. Conclusions She received the MSc degrees in mathematics from

Das könnte Ihnen auch gefallen