Beruflich Dokumente
Kultur Dokumente
I. INTRODUCTION
De novo DNA assembly has been done using different
algorithms throughout time. A recent method which will be
the subject of this article consists of using a De Bruijn graph
in which we place DNA fragments. A De Bruijn graph is an
oriented graph that allows representing overlaps of length
k-1 between words of length k, called k-mers, in a given
alphabet [3]. The number of times each k-mer has been seen
is saved, which we call coverage. It is then possible to search
paths in the graph that represent some part of the original
genomic sequence, which we call contigs.
The goal of this project is to complete DNA assembly
using a De Bruijn graph in a reasonable amount of time on
devices other than supercomputers. Up until recently, CPU
have been the main calculation power of these, but now
accelerators have taken the lead. OpenCL will thus be used
for parallelization. The algorithm on which this project is
based on is called Ray. It has been developed by Sebastien
Boisvert from Laval University and it uses OpenMPI for
inter-node parallelization. Our version is called OCLRay
because of the new parallelization tool.
Ray is a proposition of a new algorithm for assembling results from different sequencing technologies taking the form
of short reads. This algorithm is split into many different
parts. First, the graph is filled with the k-mers from the reads.
Then, there is a purge step which consists of removing edges
leading to dead-ends. Next is a statistical count of coverage.
This allows determining appropriate vertices for annotating
the reads and determining the seeds, which is the next step.
This is followed by annihilating the spurious ones and finally,
extending them [2] and writing the results.
In this article, the considered OpenCL version is 1.2,
published in November 2011. OpenCL is an open standard
This work was supported, in part, by the Natural Sciences and Engineering Research Council of Canada, the Fonds de recherche du Quebec - Nature
et technologies and by the Microsystems Strategic Alliance of Quebec.
The authors are with the Department of Electrical and Computer
Engineering, Laval University, 2325 Rue de lUniversite, Quebec, Qc,
G1V 0A6, Canada. carl.poirier.2@ulaval.ca, benoit.gosselin@gel.ulaval.ca,
paul.fortier@gel.ulaval.ca
6489
Fig. 1.
Count
in the FPGA is at the instruction level and that each step has
6490
s l o t = hash ; / / S t a r t i n g s l o t
p e r t u r b = hash ; / / I n i t i a l p e r t u r b a t i o n
w h i l e ( s l o t . i s F u l l ( ) &&
s l o t . item != itemToFind )
s l o t = (5* s l o t ) + 1 + p e r t u r b ;
p e r t u r b = 5 ;
(2)
Fig. 2.
6491
i7-4770
PCIe-385N
Energy consumed (Wh)
6
5
4
3
2
1
0.12
0.1
0.08
0.06
0.04
0.02
0
Purge
Count
Purge
Algorithm step
Fig. 3.
i7-4770
PCIe-385N
0.14
Algorithm step
FPGA and CPU kernel run times according to the algorithm step.
kernels are not, they are very complex, meaning there are
lots of instructions to execute for each work-item. This is
because both include one main loop that has many iterations,
so the FPGA pipeline throughput really shines here. For the
whole algorithm, the FPGA is 6.89 times as fast as the CPU.
Power consumption has been estimated at 28 W using a
TABLE I
FPGA KERNEL RUN TIMES , NORMALIZED .
Purge
0.02696
Count
4.25641
Annotate
0.07568
Anihilate
0.33919
Extend
0.08168
Total
0.14524
Fig. 4.
step.
VI. CONCLUSIONS
Overall, it is clear that FPGAs should be used to speed up
DNA assembly, but also to decrease power usage while doing
so. This particular algorithm shows that FPGAs are potent
accelerators that will work well for a range of applications,
as shown by the very different algorithm steps here. Future
work should focus on systems using uniform memory access
such as SoC from Altera, for which memory transfers would
not be needed, and for which the hard CPU cores could take
on the few serial tasks required. This would perform better
than using atomics in the OpenCL kernels.
ACKNOWLEDGMENT
Count
0.2349
Annotate
13.2143
Anihilate
2.9482
Extend
12.2425
Total
13.1468
6492