Sie sind auf Seite 1von 4

Short Paper

Int. J. on Recent Trends in Engineering & Technology, Vol. 05, No. 01, Mar 2011

Implementation of Hybrid Search on IBM CELL


Broadband Engine
P.J.Sathish kumar1, Abhijit Roy2, B.Sivakumar2, and T.Selvaprabhu2
1
Vel Tech Multi Tech Dr.Rangarajan Dr.Sakunthala Engineering College/Information Technology, Chennai, India
Email: sathishjraman@yahoo.com
2
Vel Tech Multi Tech Dr.Rangarajan Dr.Sakunthala Engineering College/Information Technology, Chennai, India
Email: {abhijeet.roy.22, sivablack, rockselva}@gmail.com

Abstract - The hybrid search helps searching terminology to


be made simpler. There are a lot of searching terminologies
in practice, which has its own advantages and disadvantages.
The performance factor plays a key role in searching
algorithms. The cell broadband engine uses one ppu and eight
spu’s thereby making the performance issues to be resolved
and helps in faster computations. The hybrid-searching
algorithm performs well on the multicore processor. The
project combines the logics of both breadth first search and
depth first search algorithms.Breadth-first search is
complete.The important concept is that how these algorithms
are going to be implemented in CELL Broadband Engine.

Index Terms - Cell Processor(CELL BE) ,Synergistic Processing


Element(SPE), Power PC Processing Element(PPE),Breadth
FirstSearch(BFS), Depth First Search(DFS)

I. INTRODUCTION
The Cell Broadband Engine (Cell/B.E.) processor
is a heterogeneous multi-core chip that is significantly
different from conventional multiprocessor or multi-core
Figure1. Cell Broadband Engine Architecture
architectures. It consists of a traditional microprocessor (the
PPE) that controls eight SIMD co-processing units called
synergistic processor elements (SPEs), a high speed memory II. PROBLEM WITH THE TRADITIONAL SEARCH
controller, and a high bandwidth bus interface (termed the If the shallowest goal node is at some finite depth say
element interconnect bus, or EIB), levels of parallelism in on- d, breadth-first search (BFS) will eventually find it after
chip communication.all integrated on a single chip. Fig. 1 expanding all shallower nodes [1]. However the time taken to
gives an architectural overview of the Cell/B.E. processor. find out a solution is large. Whereas, depth-first search (DFS)
The PPE runs the operating system and coordinates the SPEs. is an uninformed search that progresses by expanding the
It is a 64-bit PowerPC core with a vector multimedia extension first child node of the search tree that appears and thus going
(VMX) unit, 32 KByte L1 PowerPC Processing Element(PPE) deeper and deeper until a goal node is found, or until it hits a
instruction and data caches, and a 512 KByte L2 cache. The node that has no children. Then the search backtracks,
PPE is a dual issue, in-order execution design, with two way returning to the most recent node it hasn’t finished
simultaneous multithreading. Ideally, all the computation exploring.If the key to be found seems to reside at very high
should be partitioned among the SPEs, and the PPE only depths, the DFS algorithm may run into infinite looping
handles the control flow. Each SPE consists of a synergistic whereby the searching process becomes incomplete.
processor unit (SPU) and a memory flow controller (MFC).
The MFC includes a DMA controller, a memory management III. PROPOSED SEARCHING TECHNIQUE
unit (MMU), a bus interface unit, and an atomic unit for
synchronization with other SPUs and the PPE [8, 11]. The combination of these two algorithms makes searching in
a efficient manner on the cell broadband engine. The modern
processors moving more towards improving parallelization
and multithreading, it has become impossible for performance
gains in older compilers as technology advances [3]. Any
multicore architecture relies on improving parallelism than
on improving single core performance. The main advantage

133
© 2011 ACEEE
DOI: 01.IJRTET.05.01.69
Short Paper

Int. J. on Recent Trends in Engineering & Technology, Vol. 05, No. 01, Mar 2011

of combining both of these traditional search strategies is


that, one overcomes the disadvantage faced by the other [5].

IV. ADVANTAGES OF PROPOSED SYSTEM


Parallel algorithm used on these searching
techniques make them to run on the different cores thereby
making effective utilization of the resources.And also the
drawbacks of each of the algorithm is overcome by the other
thereby reducing the time taken for the overall throughput .

V. SYSTEM ARCHITECTURE
The overall process and control flow of the
parallelized and synchronized hybrid search is shown in the
figure below, implemented by using cell sdk 3.1 simulator.
The PowerPC processor element (PPE) reads the following.
a) KEY
b) SPE COUNT
c) TREE SIZE
Figure 2. Hybrid search Architecture
Where KEY is the element to be searched against
the tree, SPE COUNT is the number of SPEs to be utilized in As the SPE’s local store memory is very limited,
the searching process, TREE SIZE is the total number of this poses a limit on the number of elements that can be
elements to be searched. processed by this cell BE architecture.
The PPE determines the tree size for the SPEs The actual searching process starts after the
performing DFS, depending on two factors namely, the input binary trees are created by all of the SPEs [7]. The first SPE
tree size and the number of SPEs to be utilized. The following makes use of breadth first search strategy to search the
relation formulates the tree size for each SPE performing Depth elements of its tree. The rest of the SPEs search their own
first search. trees in Depth first search fashion..
Tree size for each SPE =
VI. PARALLELIZED ALGORITHM
Total number of elements to be processed
Number of SPEs to be utilized
PPU side
PPE initiates the first SPE which is supposed to No_of_ele:= read no of elements
search its tree in Breadth first search fashion by sending the Spe_count:=read no of spes to be used
key element to be searched. The entire tree is given as input Key:=key to be searched
to this SPE as this SPE will search the entire set of elements in Start: spe[0]->thread(no of file , key ,spe id) //invoke first
breadth first search fashion [2]. The PPE then initiates all SPE to search in BFS
other SPEs by sending the tree size prescribed for each SPE for(i=1;i<spe count;i++)
and the key element to be searched. The number SPEs invoked {
depends on the SPE COUNT entered by the user. Start:=spe[i]->thread[no of ele , key ,spe id]//invoke other
The processing of the SPEs starts now. Each SPE SPEs to search inDFS
creates a binary tree with its own prescribed set of elements. }
The creation of nodes in the binary tree takes place in breadth Found:=spu_read_out_mbox(); // Read status from outbound
first fashion. For the last SPE, the tree size differs slightly as mailbox of SPEs.
the relation mentioned above doesn’t stand perfectly If(Found)
divisible at all times [4]. So for the last SPE, the quotient Print “Key found “
obtained from the above relation is summed up with the else
remainder left out after the division is carried out, so that no Print “Key not found “
element is left out within the TREE SIZE prescribed. Terminate(spe[i]->thread)
The nodes are created and added to the tree in level exit(0)
order fashion. For Eg, consider the set of nodes 1,2,3,4,5,6. end
These nodes are now added to the tree with 1 as root node,
followed by 2 as its left child, 3 as right child of 1. Then the BFS_SPU
element 4 gets inserted as left child of 2,5 as right child of 2 create Tree(no of ele);
and 6 as left child of 3 and so on. This is how tree gets bfs search tree(node*root, int key)
constructed in each of the SPE’s local store [6]. If key found // if key found , mark found as 1

© 2011 ACEEE
134
DOI: 01.IJRTET.05.01.69
Short Paper
Int. J. on Recent Trends in Engineering & Technology, Vol. 05, No. 01, Mar 2011

Found=1 VIII. PERFORMANCE ANALYSIS


Spu_write_out_mbox(spe id, found);
// write the result to the ppe
Else
Found =0
Spu_write_out_mbox(spe id, found);
DFS_SPU
create Tree(no of ele/7)
dfs search tree(node* root , int key)
if key found
Found =1
Spu_write_out_mbox(spe id, found);
// write the result to the ppe
Once a SPE finds the key, it immediately writes the
key status to its outbound mailbox. The PPE keeps listening
to the outbound mailbox of all of the SPEs continuously, The
PPE stalls until the outbound mailbox channel remains empty
[9]. When this channel gets filled up with the key status
reported by any of the SPEs, the PPE immediately reads it IX. FUTURE ENHANCEMENT
and displays the result to the user. Once the key is reported, With the advent of multi-core processors, it has
the PPE immediately terminates all SPE threads, to prevent become highly indispensable for programmers to code
them from searching further and all resources held are freed programs that are capable of running in parallel fashion so as
up. to make the fullest use of resources available. This hybrid
When the key element is not found by any of the search can be considered to be implemented on Cell Broadband
SPEs, the SPEs performing DFS terminates with no signal of Engine. The main reason in opting for the IBM’s multi-core
response. The responsibility of reporting that the key isn’t processor is because of its scalability, accuracy and most of
found in the tree prescribed is allotted to the SPE performing
all the parallel computing capability of 8 cores. In future we
BFS [10]. This is because BFS is always complete by which
arer trying out for implementation of various other searching
there is 100% guarantee of searching all the elements where
algorithm in parallized fashion. The current focus of processor
DFS doesn’t stand complete, there is a possibility of this
technology has transitioned from increasing clock frequency
search running into infinite depths.
to multicore (multiple processing cores on a single die). Part
So if the key to be searched is not found by the SPE
of this transition is due to power and thermal issues of higher
performing BFS, then it writes the “KEY NOT FOUND” status
clock frequencies in current generation silicon and circuits.
to its outbound mailbox, which will be read by the PPE and
One question is whether the future of multicore is the scaling
reported to the user.
of the current two and four identical core processors. There
The main advantage of combining both of these
are indications that the future is not the scaling of current
traditional search strategies is that, one overcomes the
architectures. The IBM Cell Broadband Engine Architecture
disadvantage faced by the other. If the key to be searched is
combines the general-purpose PowerPC core with 8 special-
found very deep in the tree, then it will be searched and
purpose cores.
reported by the SPE performing DFS [2]. Where as if the key
to be searched is found at very shallow depths, then the SPE
performing BFS reports it first, by which time taken for
searching the key element is greatly minimized.

VII. CONCLUSION
Thus the performance issues encountered using the
existing searching algorithms are overcome by the hybrid
search algorithm. They hybrid algorithms work fine and good
with all the cores of the cell broadband engine. The accuracy
of the results is also greatly improved.

135
© 2011 ACEEE
DOI: 01.IJRTET.05.01.69
Short Paper
Int. J. on Recent Trends in Engineering & Technology, Vol. 05, No. 01, Mar 2011

REFERENCES Tatsuya Ishiwata, Yuji Kawamura, Takeshi Yamazaki and


Kazuyoshi Horie
[1] Parallelizing Breadth First Search Using Cell Broadband Engine [7] Implementing A Parallel Matrix Factorization Library On The
Rahul Kumar Gayatri and Pallav Kumar Baruah Cell Broadband Engine
[2] Challenges In Mapping Graph Exploration Algorithms On Vishwas B. C, Abhishek Gadia and Mainak Chaudhuri
Advanced Multi-Core Processors Oreste Villa [8] Introduction To The Cell Broadband Engine
Daniele Paolo Scarpazza, Fabrizio Petrin and Juan Fern (IBM Corporation)
[3] Designing Multithreaded Algorithms For Breadth-First Search [9] Fastest Fourier Transform For The IBM Cell Broadband
And St-Connectivity On The Cray Mta-2
Engine
David A. Bader and Kamesh Madduri
David A. Bader and Virat Agarwal
[4] Practical Computing On The Cell Broadband Engine
[10] Programming the Cell Broadband Engine™ Architecture
[5] Peak-Performance Dfa-Based String Matching On The Cell Examples and Best Practices
Processor
Abraham Arevalo, Ricardo M. Matinata, Maharaja Pandian,
Oreste Villa, Daniele Paolo and Scarpazza Fabrizio Petrini Eitan Peri, Kurtis Ruby and Francois Thomas
[6] Network Processing On An Spe Core In Cell Broadband Engine [11] Cell Broadband Engine Architecture Version 1.01 IBM

136
© 2011 ACEEE
DOI: 01.IJRTET.05.01.69