An Open Hardware Platform for Teaching Parallel Programming

PARALLELLA: AN OPEN HARDWARE PLATFORM FOR
TEACHING PARALLEL PROGRAMMING

Parallel computer architecture
Adin Zuhri
Iman Dankovi
Nadin Hasani
Sarajevo
November, 2016
Abstract
Nowdays all the performance goals are to be achived with parallelisms instead of
dealing with frequency, what was case before 2004. Many companies have
attempted to create platform that will achieve better performance using parallelism
but not all of them were successful in that try. Adapteva is company that launched
their product, the Parallella board at September 2012.Their main goal was to
provide platform for parallel computing, and also make it accessible and open-
sourced.By creating the Parallella board,credit-size,high performance and also
quite energy efficient computer it was to be expected that the main result-
resarching and sharing knowledge about parallel computing, the good results are
to be expected. In this work are explained basic components of the Parallella and
how they work together when the board is configured. It is discussed about
possibilities of using the Parallella board, and there are also specified parallel
programming choices with advices for programmers while working on the
Parallella board.
CONTENTS
1 INTRODUCTION ...............................................................................................5
1. 1 Problem Statement ...........................................................................................5
1. 2 Objectives ........................................................................................................6
1. 3 Thesis Organisation .........................................................................................6
2 ABOUT THE PARALLELLA BOARD ..........................................................7

2. 1. Intoduction to Epiphany ...........................................................................7
2. 2 What is parallella? ..........................................................................................7
2. 3. Specifications .................................................................................................8
2. 3. 1 Epiphany III ............................................................................................12
2. 4 How the Parallella board works one it is configured? ...................................14
2. 5 Why is parallella? .........................................................................................15
3 PROGRAMMING ON THE PARALLELLA BOARD ..................................16

3. 1 Programming advice .....................................................................................16
3. 2 Amdhals Law...............................................................................................17
3. 3 Parallel programming choice .........................................................................17
4 CONCLUSION ....................................................................................................18
REFERENCES ......................................................................................................18
LIST OF FIGURES
Figure 1:Epiphany chip design ..................................................................................7
Figure 2:Parallella board architecture ........................................................................9
Figure 3:Parallella board-bottom view ....................................................................10
Figure 4:Daughter Card Configurations ..................................................................11
Figure 5:eCores ........................................................................................................12
Figure 6:Parallella board-top view...........................................................................14

CHAPTER
INTRODUCTION
As we go forward, it is clear that instead of scaling frequency in order to

improve performances only way to achieve performance goals is coming from
parallelism. In the other side, one allways have to keep in mind physical
constraints. As things are getting smaller and we still want them to go fast, it is
getting harder is to fight with heat issues.
In order to achive parallelism, things have to be scailed. The only way to
scale without limitation is distributed architecture. That entails many core
architecture where memory and networking are integrated in one teil and then the
only way of comunication is neighbours comunication. Without global variables
just point to point communication-it is the way how one can scale to the infinity.
That way of scaling was the key in creating Epihany chip, main component of
Parallella board that will be discused in this work.
1.1 Problem Statement
The release of the Parallella board provides a cheap and energy efficient
computer that requires minimal cofiguration and it is completly open-source.
Our research question is as follows: What is parallella and why is it

interesting?What were the objectives of its production and what are possibilites of
parallella?
1.2 Objectives
In order to give answer to the research question, the proposed objectives of

this work are as follows:
At first place respond to a question:why parallella?Why it is significant?
Discover from which parts parallella is conifugured and what are the
principels in which parallellas funcionalities are based of
Investigate parallel programing choice and advices while working with the
Parallella board
1.3 Thesis Organisation
Chapter 2 explains what is the Parallell board, what were the objectives of
its creation and specifications of the Parallella bord.
Chapter 3 discusses about programming choices and advices for
programmers while using the Parallella board
In Chapter 4 are given the conlcusions and the research is summarised
CHAPTER
ABOUT THE PARALLELLA BOARD
2. 1. Intoduction to Epiphany
This processor has serial of c programmable processors conected with a

network, RISC(reduced instructed set), memory but no cache. In old procesors
amount of energy that goes for the useful work is 1-3%. Everything else is going to
the benefit of the programer. Idea was to make that waste smaller as much as
possible. In order to achive that, all the legacy is cut away by putting 4006 CPUs
in the single chip. [1]
The design philosphy of Epiphany chip co-processor given in the figure
1bellow is used in creation of the key component of the Parallela board.
Figure 1:Epiphany chip design1
1
Source: https://www. youtube. com/watch?v=vV9fcqUUe1Y
2. 2 What is parallella?
The Parallella board is an affordable, energy efficient, high performance,

credit card sized computer[2] that for the purpose has to provide a platform for
developing and implementing high performance parallel processing.
It is launched in September 2012 at $99 price and produced in two versions.
The 66-core version (64-Epiphany cores and two ARM cores) of the Parallella
board achieves over 90 gigaFLOPS (102 floating point operations per second).
The other, 18-core version (16-Epiphany and 2 ARM cores) version that is using
about 5 Watts can reach 32 gigaFLOPS.
In order to pass quickly a large amount of information over the network, the
Parallella has a 1-Gbps Ethernet port.
Key component of the Parallella board is the Epiphany chip co-processor,
mentioned earlier.
Programming for the Epiphany chip (the Parallella boards co-processor) is
done in C.
Some basic primitives are managed with with the SDK (Software
Development Kit) such as emory addressing, barriers, and communication
between eCores.
A workgroup of cores needs to be set up, in order to run programs on the
Epiphany chip.
It can be achived using the provided SDK to give a starting node and the
number of columns and rows in the matrix of cores [2, 3; 4].
2. 3. Specifications
As the Parallella board the we will use later is the Parallella board with 16
core Epiphany co-processor, technical specifications of this board are given
below [5].
2
https://www. kickstarter. com/projects/adapteva/parallella-a-supercomputer-for-
everyone
Figure 2:Parallella board architecture (picture taken from [3])
Figure 2 shows that by the dual-core ARM processors the operating system
and interfaces are run with. Other thing that is also running in the ARM
processors are the programs.
In order to set up and run programs on the individual cores it can use the
Epiphany libraries provided with the SDK to [2, 3, 5, 6].
Zynq-Z7010 dual-core ARM A9 CPU is used to run the operating system

and programs not designed to run on the Epiphany chip. [2]
Figure 3:Parallella board-bottom view(picture taken from [6])
The Parallella board has four expansions connectors placed on the opposite
edges of the bottom side of the board shown in Figure 3.
Connector Functions
PEC_POWER Power and control signal expansion
connector
PEC_FPGA Zynq programmable logic expansion
connector
PEC_NORTH Epiphany north link expansion
connector
PEC_SOUTH Epiphany south link expansion
connector
Robust mating of expansion cards and the Parallella board using matching
BTH-030-01-FDA connectors is allowed becouse of four symmetrically
connectors placed in the bottom side of the Parallella board.
It is possible to connect a single full length credit card sized expansion cards
or two half-length expansion cards, as it shown in the figure below:
Figure 4:Daughter Card Configurations(picture taken from [6])
The left side(pink/green transarent) shows two half-length expansion boards

connected to the backside of the Parallella board while the right side shows a full
length (blue transparent) expansion board connected to the backside of the
Parallella board. [6]
Summary of the Parallella components is given below[5]:
Zynq-Z7010 Dual-core ARM A9 CPU

16-core Epiphany Co-processor
1GB DDR3 RAM
MicroSD Card: Allows storage of local files.
USB 2. 0
Up to 48 GPIO signal
Gigabit Ethernet: The high-speed Gigabit Ethernet allows for rapid transfer
of data across a network allowing the cluster to communicate with lower
latency.
HDMI port
Linux Operating System
54mm x 87mm form factor: The small form factor of each of the boards
makes it highly portable even if using multiple boards. [2]
2. 3. 1 Epiphany III
The Epiphany processor on this board is the Epiphany III (E16G301), the
feature runtime of which is given below [2]:
16 high performance RISC CPU cores:
Figure 5:eCores(picture taken form [2])

C/C++ and OpenCL programmable
32-bit IEEE oating point support
512KB on-chip distributed shared memory: This can be used to pass
messages and
share data across the chip.
32 independent DMA channels
Up to 1GHz operating frequency
32 gigaFLOPS peak performance
512 GB/s local memory bandwidth: The access speed of each RISC core's
local
RAM.
64 GB/s Network-On-Chip bisection bandwidth: The speed at which
message passing
takes place on the chip.
8 GB/s o_-chip bandwidth: The performance of communicating o_ chip.
1. 5ns network per-hop latency
<2Watt maximum chip power consumption: The power consumption of the
Epiphany processor; combined with the rest of the board, the total is 5 W.
Figure 5 illustrates the so-called eCores, the 2D array of cores.
Each eCore has a 1GHz RISC CPU, a network interface , 32 KB of local

memory, and a DMA(direct memory access) engine.
This structure is connected via router to the rest of the chip.
There are three connections to the router to communicates with the rest of
the chip: red is the read request network, the blue connector is the on-chip write
network, green is the off-chip write network.
The eCore CPU is super-scalar. Provides an opportunity to execute two

floating-point operations and a 64-bit memory load/store operation in every clock
cycle.
Up to 32 bytes per clock cycle of bandwidth can be provided by the local
memory. [2, 5].
2. 4 How the Parallella board works one it is configured?
Figure 6:Parallella board-top view(picture taken from [6])
As mentioned earlier, the Parallella board runs Linux operating system.

Zynq-Z7010 Dual-core ARM A9 CPU runs operating system, and once loaded
user can use it as a computer. Next to that it has FPGU logic which alows user to
design his own hardver.
Boot into arm runing Linux and FPU logic is configured to hook up the arm
directly inside the memory map of the ARM to coprocesor device that is next to it.
Coprocesor doesnt run Linux 16 core but it does have 16 CPU-us. It runs maths
and operations.
E16 is like server farm for the client. [1]

2. 5 Why is parallella?
One of the key goals of creating Parallela was to make computing more
accessible by creating an platform which is open-soruce, open documentations and
standards and above that all, affordable.
It is a common thing that platfomrs are proprietaried by the couple of big
universities or companies which causes closing into one architecuture.
The Parallella board is open-sourced. Only thing that is not open source is
Epiphany chip, but that layer is as thin as possible that the most pepople shoud be
able to get away of thinking in a high level apraction and move across platforms.
So, primary aim of Adapteva team while creating the Parallella board was Parallel
programming research. It includes exploring new things like new parallel
lanugages and algorithams.
Secondly, itention to make the Parallella board teaching tool in hope that
universties are going to share their knowledge about parallel computing gained by
working with the Parallella board.
In the third place, to make it an embedded platform(for robotics, drones).

This wasnt goal at the begining, but now there is alredy 5 to 10 projects just in
drones using the Parallella board. It can be used for many useful things, such as:
search and rescue, agriculture, locating areas of of importance in the huge forest,
locating pesticides in a big farms, etc.
And at the and, it can be used as a fun toy for geeks and hackers. [1;3;]
CHAPTER
PROGRAMMING ON THE PARALLELLA BOARD
3. 1 Programming advice
Critical code must be performance scalable to 1000a of threads

This means that the program should be dinamicly scalable.
Somebody(programmer or tool) will have to manage memory in software

Since there is no cache and memory, someone need to manage it and that
cant be programmer becouse perfomance will go way down if every of
programmers needs to manage his own memory.
Programmer needs to know where the bits are stored

It is obvious that something in the system needs to know where the bits are
stored, for example some tool or scedguler
The hardware will fail often, make you software redundant

Software must be redundant-when one component fails, it must be
configured that other take its tasks
The minimum number of languages to be used is 2

In assume that program will fail, only way to lang all the components is to
use more than one program language. For example, in C the main thing is
performance, while in erlang that is robustness. Programmer should be able
to combine more than 2 program languages.
Dont close into one architecture

In the case of proprietary platforms, there is no steping foward as that is case
with open-source platforms that are providing more opportunities to improve
and share knowledge. [1;3;6]
3. 2 Amdhals Law
Speedup=Performance for entire task using the enhancement/Performance for

entire task without using the enhancement
Speedup=1/(P/N+S)
P=parallel fraction(1-S)
N=number of processors
S=serial fraction
The amount of serial code should be as less as possible. But even in case that
there is zero serial code, data must be exchanged in the system. It is also
important how data is decomposed, and also locality where data is stored. It also
depends which algorithm is used, and in that case how much dependency among
data is present. [1]
3. 3 Parallel programming choice
There is a few programming choice, that can be used depending of the

nature of project that programmer wants to implement on the Parallella board:
Batch scheduling(SLURM, LSF)
N clients sends M independent jobs to P servers.

very powerful and easy(if one can make it work)
Key lies in how fast is the network, and how decoupled is database. If it is
possible to complete data decomposition, then this is the right solution
SIMD-open CL
very robust
lot of manual managment, it can be viewed as asembly version of parallel
programing
key lies in fact that programmer has to manage his own memory-create
buffers, own command ques
very flexible and scalable but it scales by the programmer
OPEN MP
(fork join model)
it deals with threads

everyone knows about them
doesnt scale very well beyond small numbers of threds
one have all the problems of paralelism but no all the rewords of the
paralelism
effective way of programming-there are pragmas in code that desribe the
region that one wants to parallelize. It is enough to create threads and use the
barrier that will link it together
MPI
quite robust
there are a lot of processes in the system, usually one process per server
there is client node that will launch the job and tone of server nodes that
will take care of the procesing
every process has ID
there is usally sofisticated runtime sceduler that will track these ID-s and
were he things are
Key lies in interprocess communication- comunication between proceses is
explicit by send-recive call[1;6]
CHAPTER
CONCLUSION
Objective od projects such as the Parallella board is to make people share their
knowledge. It is obvious that future of programming lies in parallel computing.
Although closing in one architecture with which one is familiar will maybe be a
better solution in that period, nothing can last forever. Without desire for
advancement and to explore new things, technology would never have progressed
to a level where it is now. That exacly was aim of the Parallella board development
team-to make platform that is accessible in order to give others a chance to
progress.
REFERENCES
[1] Parallella: An open hardware platform for teaching parallel programming:

https://www. youtube. com/watch?v=vV9fcqUUe1Y
[2] Michael Johan Kruger. Building the Parallella board cluster.

Grahamstown, South Africa. November 23, 2015
[3] The Parallella Board. Online. Accessed: 2015-03-01. Available

from:https://www. parallella. org/.
[4] Adapteva. Epiphany Datasheet. Online. Accessed: 2015. 05. 05.
Available from:
http://adapteva. com/docs/e16g301_datasheet. pdf.
[5] Adapteva. Epiphany SDK Reference. Online. Accessed: 2015. 11.

01. Available from:http://adapteva. com/docs/epiphany_sdk_ref.
pdf.
[6] Parallella. Parallella Reference Manual. Online. Accessed: 2015. 05.
05. Available from: http://www. parallella.
org/docs/parallella_manual. pdf

An Open Hardware Platform for Teaching Parallel Programming

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

An Open Hardware Platform for Teaching Parallel Programming

Hochgeladen von

Copyright:

Verfügbare Formate

PARALLELLA: AN OPEN HARDWARE PLATFORM FOR

TEACHING PARALLEL PROGRAMMING

2 ABOUT THE PARALLELLA BOARD ..........................................................7

3 PROGRAMMING ON THE PARALLELLA BOARD ..................................16

Figure 1:Epiphany chip design ..................................................................................7

Figure 2:Parallella board architecture ........................................................................9

Figure 3:Parallella board-bottom view ....................................................................10

Figure 4:Daughter Card Configurations ..................................................................11

Figure 5:eCores ........................................................................................................12

Figure 6:Parallella board-top view...........................................................................14

As we go forward, it is clear that instead of scaling frequency in order to

1.1 Problem Statement

Our research question is as follows: What is parallella and why is it

In order to give answer to the research question, the proposed objectives of

At first place respond to a question:why parallella?Why it is significant?

1.3 Thesis Organisation

ABOUT THE PARALLELLA BOARD

This processor has serial of c programmable processors conected with a

Figure 1:Epiphany chip design1

The Parallella board is an affordable, energy efficient, high performance,

Zynq-Z7010 dual-core ARM A9 CPU is used to run the operating system

The left side(pink/green transarent) shows two half-length expansion boards

Summary of the Parallella components is given below[5]:

Zynq-Z7010 Dual-core ARM A9 CPU

16 high performance RISC CPU cores:

Figure 5:eCores(picture taken form [2])

Figure 5 illustrates the so-called eCores, the 2D array of cores.

Each eCore has a 1GHz RISC CPU, a network interface , 32 KB of local

This structure is connected via router to the rest of the chip.

The eCore CPU is super-scalar. Provides an opportunity to execute two

Figure 6:Parallella board-top view(picture taken from [6])

As mentioned earlier, the Parallella board runs Linux operating system.

E16 is like server farm for the client. [1]

In the third place, to make it an embedded platform(for robotics, drones).

PROGRAMMING ON THE PARALLELLA BOARD

Critical code must be performance scalable to 1000a of threads

Somebody(programmer or tool) will have to manage memory in software

Programmer needs to know where the bits are stored

The hardware will fail often, make you software redundant

The minimum number of languages to be used is 2

Dont close into one architecture

Speedup=Performance for entire task using the enhancement/Performance for

3. 3 Parallel programming choice

There is a few programming choice, that can be used depending of the

Batch scheduling(SLURM, LSF)

N clients sends M independent jobs to P servers.

it deals with threads

[1] Parallella: An open hardware platform for teaching parallel programming:

[2] Michael Johan Kruger. Building the Parallella board cluster.

[3] The Parallella Board. Online. Accessed: 2015-03-01. Available

[5] Adapteva. Epiphany SDK Reference. Online. Accessed: 2015. 11.

Das könnte Ihnen auch gefallen