Beruflich Dokumente
Kultur Dokumente
TERM PAPER
ON
TO BIOINFORMATICS
BY
18/27/MCS015
1
Table of Content
1. Title Page…………………………………………………………….1
2. Table of Content……………………………………………………..2
3. Abstract……………………….……………………………………...3
4. Introduction……………………………………………………………3
8. Conclusion…………………………………………………………….15
9. References…………………………………………………………..12-14
2
ABSTARCT
In this paper, we focus on the speed-up of Smith Waterman Algorithm for local
sequence alignment using parallelization approach while leveraging on Residue
Number System implementation of the algorithm. We propose an approach to the
use of Smith-waterman algorithm on GPU Platforms. Accordingly, this paper
tries to take advantage of all the computational resources available due to
parallelization by further using the fast arithmetic operations of RNS to further
improve the overall performance of the algorithm.
1. INTRODUCTION
3
DNA Forensics (proof of identity, crime or catastrophe victims,
establishment of paternity)[6].
Agriculture or Bio-processing (drought and disease resistant crops, bio-
pesticides, edible vaccines to integrate into agricultural products) [6]
4
2. LITERATURE REVIEW
The aim of this review is to provide an overview of recent GPU based
sequential analysis methods using Smith Waterman Algorithms, empha-
sizing their advantages (i.e. computational speed-up) as well as drawbacks
(e.g. the necessity of algorithm redesign and tailored implementation to
fully leverage the GPU architecture and its peak performance).
Yuma et. Al [12] Implemented the SWA using compute unified
device architecture. Their methods efficiently shared memory to reduce
data amount being transferred GPU and off-chip memory. The
performance of their implementation is 3 times faster than previous CUDA
implementations.
Yongchao et. al. [13] present a CUDA based implementation for the
SW algorithm. The implementation takes advantage of using the CPU and
GPU SIMD instructions as well as concurrently executing on the CPUs and
GPUs. They presented the CUDASW++3.0 which improves over
CUDASW++2.0. It provides a peak performance 119.0 and 185.6 GCUPS
on a single-GPU and dual-GPU respectively.
Manavski et al. [14] also presented a CUDA based implementation
whose performance reached an ultimate of 3.6 GCUPS. They implemented
the algorithm partly using local memory which is very slow on the GPU
card. The algorithm can further be improved to use the resources available
on the GPU.
Łukasz et al. [15] proposed an implementation of the Smith
Waterman Algorithm using Global memory and shared memory and using
more efficient code. The program is processed concurrently on the CPU
and GPU. It has a peak performance of 14.5 GCUPS on a dual core Nvidia
9800 GX2 card.
5
In this paper, we present and propose the parallel implementation of
the Smith Waterman Algorithm on a GPU using the Residue Number
System fast arithmetic operations.
6
The parallel properties of RNS presents a new challenge in developing
algorithms that would fully utilize the parallel structure of modern Computers
with GPU and FPGA to accelerate several computations.
7
3.2.2 REVERSE CONVERSION
Reverse conversion is the conversion of a residue number system to a
weighted number system. The success of reverse conversion is directly
proportional to the forward conversion [2].
Chinese Remainder Theorem (CRT) is one of the approaches in reverse
conversion. The CRT utilizes the formula
X = |∑ni=1|xiMi -1|miMi|M
Where x is represented as {x1,x2….., xn} with moduli set {m1,m2….., mn}.
Another method is the Mixed Radix Conversion (MRC), Let the moduli
set (m1, m2, m3, …, mn) has the corresponding RNS (x1, x2, x3, …, xn) and
a set of digits (a1, a2, a3, …, an) be the mixed radix digits respectively, then
the corresponding decimal equivalent of the residues can be obtained using
the following algorithm:
X =a1 +a2m1 + a3m1m2 + .…
The mixed radix are given by the following:
X = a1 + a2m1 + a3 m1m2
where a1 = x1
a2 = |(x2 - a1)m1-1|m2
a3 = |(x3 - a1)m1-1 – a2m2 -1|m3…
..
ak = |(((xk-a1)|m1 -1|mk ….
8
depend on its neighbors and the similarities between the current symbol of
sequence A and the symbol of sequence B is computed.
Let
• M(i, j) represents the similarity score of two sequences A and B,
terminating at position i and j;
• S(ai, bj) is the score of comparing sequence Ai to sequence Bi.
The algorithm is given by:
0
M(i,j-1) + d
9
Using RNS methods takes the advantage of RNS arithmetic and improves
the speed conversion, depending on the moduli set chosen [2]. M.Nobile et
al. [8] highlighted that there is significant speed up with the use of GPU by
reducing running time as there might still be additional optimization
possible.
Using Moduli set {2n-1, 2n, 2n+1}, the RNS-SWA from [2] is used.
The architecture is shown in Fig 1 below
RNS Processor1
M(i-1,j)
MOD(2n-1)
M(i,j-1)
RNS Processor2
MOD (2n)
M
Binary to RNS
RNS to Binary
Converter
M(i-1,j-1) Converter
(Converter 1)
(Converter 2)
RNS Processor3
MOD(2n+1}
d
S(a,b)
10
Fig 1: Architecture of RNS-SWA
On a GPU the realization of the hardware is based on the following:
Converter1 accepts the inputs M(i-1,j), M(I,j-1), M(i-1,j-1), d and S(i,j) and
sends to RNS processor.
The RNS processor sends the result to converter2. The RNS processors
work in parallel using the cores of the GPU. The absence of carry
propagation in RNS will enable realization of high-speed and low-power
consumption.
Converter2 subsequently converts the latest result to binary/decimal
number, M(i,j) this is done in parallel.
Maximize independent parallelism in the RNS algorithm in converter 1 and
converter 2 to enable easy partitioning in threads and blocks.
CONCLUSIONS
There has been lots of research and implementations of using Smith Waterman
Algorithm for sequential analysis on GPU but none of them has explored the fast
arithmetic properties of Residue Number System (RNS). In future research we
intend to implement the RNS-SWA architecture using CUDA on a GPU and
compare the speed-up with previous implementations of SWA on a GPU.
11
REFERENCES
[1] F. H. Humed, R. Jidin, R. Othman, M. G. Goorbandi, and S. Noraima,
“Implementing Smith Waterman ’ s Similarity Matrix Computations on
Reconfigurable Logic Hardware,” no. November, pp. 1–5, 2008.
[3] David J. Lipman and William R. Pearson, “Rapid and Sensitive Protein
Similarities Searches,” Science (80-. )., vol. 227, no. March, pp. 1435–
1140, 1985.
12
[10] D. Razmyslovich, G. Marcus, M. Gipp, M. Zapatka, and A. Szillus,
“Implementation of Smith-Waterman algorithm in OpenCL for GPUs,”
Proc. 9th Int. Work. Parallel Distrib. Methods Verif. PDMC 2010 - Jt.
with 2nd Int. Work. High Perform. Comput. Syst. Biol. HiBi 2010, pp. 48–
56, 2010.
13
mixed radix conversion technique,” Proc. - IEEE Int. Symp. Circuits Syst.,
vol. 1, no. 1, pp. 521–524, 2009.
14