You are on page 1of 55

Hadoop and AWS

Login to your AWS account.

Select the EC2 service.

Click on Launch Instance

Click Select

Click on Quick Launch Wizard

Select Ubuntu Server 12!"# L$S (a free tier instance.

Click on %evie& and Launch...

Click on Launch...

Select Create a ne& ke' pair fro! the top "rop#"o$n bo%...

&ive the keypair a na!e an" click on (o&nload )e' *air

Save the keypair $here you can fin" it' an" click on Launch Instances.

(our instance is no$ being launche". Click on +ie& Instances to see it.

)ur instance is initialise"

Click the instance (it*ll have a green light ne%t to it' to "isplay infor!ation about it.
+his $ill be i!portant in a !inute

Select the ,ava SSH Client option.

-nter the path to the key pair file you "o$nloa"e"' i.e. right#click on the file if you*re
not sure.

Start .u++(gen (Start !enu' click /ll .rogra!s 0 .u++( 0 *u$$,-en.

Click on Load button

1in" the fol"er $ith your .pe/ key in.

Select All 0iles .. an" click on your /2S pe/ key.

Settin- up *utt' 1or AWS instance connection

/ success !essage shoul" appear' no$ $e nee" to save the key in .3++(*s o$n for!at.

Click on Save private ke'

Confir! you $ish to save &ithout a passphrase' an" save in the sa!e "irectory.

Connectin- to our instance usin- *u$$, SSH

&o to Start 0 /ll .rogra!s 0 .u++( 0 .u++( to loa" up *U$$, SSH

S$itch back to the /2S console' an" copy the a""ress of your instance' it*ll look
so!ething like ec223"22142152133eu2&est21co/putea/azona&sco/

+his is the a""ress of the instance that $e*ll be using to connect to.

.aste the a""ress here

Scroll "o$n an"
click on Auth

4o$ click on 6ro&se an" navigate to the key you 5ust save" (en"s $ith *ppk* e%tension.

4o$ click on 7pen

Click on 'es $hen the security alert appears.

+ype ubuntu as the login na!e an" press Enter key

2e "on*t nee" a pass$or" as our key $ill be sent across to the instance.

Success6 2e*re no$ logge" in to our Ubuntu instance

Installin- 8ava9

2hilst in the ter!inal enter the follo$ing co!!an"s:

su"o apt#get up"ate
su"o apt#get install open5"k#7#5re

Installin- Hadoop9

&et the file fro! e%ternal site:


3npack it:
tar %:f ha"oop#8.99.8.tar.g:

Copy it to so!e$here !ore sensible like our local user "irectory:

su"o cp 2r ha"oop#;/ /usr/local
+here*s a space here

-"it the ter!inal script:

nano </.bash

/"" these lines at the botto!:

e%port ,/=/>H)M-?usr/
e%port H/@)).>H)M-?usr/local/ha"oop#8.99.8

Save the file (ctrl#% an" type *y*:

/"" it to the ter!inal environ!ent:

source </.bash
4o$ $hen Ha"oop nee"s ,ava the ter!inal $ill point it in the right "irection

Let*s !ove in to the !ain "irectory of the application:

c" /usr/local/ha"oop#;

4o$ e"it Ha"oop*s set up script:

su"o nano conf/ha"

Save (ctrl#%' then type *y*

e%port ,/=/>H)M-?/usr

/"" the configuration file to the ter!inals scope:

source conf/ha"
%unnin- an e:a/ple usin- Sin-le node /ode9

Calculating .A:
su"o bin;hadoop <ar ha"oop#!apre"#e%a!ples#;.5ar pi 1! 1!!!!!!!

Another e:a/ple= usin- so/e actual data

Create a "irectory to put our "ata in:

su"o !k"ir input

Copy the very interesting B-/@M-.t%t file to our ne$ input fol"er:

su"o cp B-/@M-.t%t LAC-4S-.t%t input

4o$ $e count up the total $or"s an" $hat they are (Ha"oop $ill create the output
fol"er for us:
su"o bin;hadoop <ar ha"oop#!apre"#e%a!ples#;.5ar &ordcount input output

Have a look at the final output:

nano output/part2r2!!!!!

What>s happenin-?

public static class +okeni:erMapper
e%ten"s MapperC)b5ect' +e%t' +e%t' Ant2ritable0D

private final static Ant2ritable one ? ne$ Ant2ritable(1E
private +e%t $or" ? ne$ +e%t(E

public voi" /ap()b5ect key' +e%t value' Conte%t conte%t
thro$s A)-%ception' Anterrupte"-%ception D
String+okeni:er itr ? ne$ String+okeni:er(value.toString(E
$hile (itr.hasMore+okens( D
conte%t.$rite($or"' oneE
$he @apper9 splits up the &ords

public static class AntSu!Be"ucer
e%ten"s Be"ucerC+e%t'Ant2ritable'+e%t'Ant2ritable0 D
private Ant2ritable result ? ne$ Ant2ritable(E
public voi" reduce(+e%t key' AterableCAnt2ritable0 values'
Conte%t conte%t
thro$s A)-%ception' Anterrupte"-%ception D
int su! ? 8E
for (Ant2ritable val : values D
su! G? val.get(E
conte%t.$rite(key' resultE
$he %educer9 takes the input o1 A&ord= countB and su/s up the

public static voi" !ain(StringHI args thro$s -%ception D
Configuration conf ? ne$ Configuration(E
StringHI other/rgs ? ne$ &eneric)ptions.arser(conf' args.getBe!aining/rgs(E
if (other/rgs.length 6? 9 D
Syste!.err.println(J3sage: $or"count Cin0 Cout0JE
,ob 5ob ? ne$ ,ob(conf' J$or" countJE
1ileAnput1or!at.a""Anput.ath(5ob' ne$ .ath(other/rgsH8IE
1ile)utput1or!at.set)utput.ath(5ob' ne$ .ath(other/rgsH1IE
Syste!.e%it(5ob.$ait1orCo!pletion(true M 8 : 1E
@ain runs and con1i-ures the @ap%educe <ob
Sets up the <ob
&ith input and
output 1olders
and the /ap
and reduce
classes to use

)ne last e%a!ple' this ti!e using /2S to create the Ha"oop cluster
for us.
2e $ill use the $or" count e%a!ple use" previously.

Hadoop in the AWS Cloud

1irst $e nee" a place to put the "ata after it has been pro"uce"...
A/azon S# (Si!ple Storage Service:
/n online storage $eb service provi"ing storage through $eb
services interfaces (B-S+' S)/.' an" Kit+orrent.

Select S# fro! the console

&ive it a na!e. (not @'6ucket
# so!ething uniNue' also 4)
C/.A+/L L-++-BS.
Choose Ireland fro! the
region list (it*s closer' so less latency.

(our ne$ bucket

%unnin- a @ap %educe pro-ra/ in AWS

Select Elastic @ap%educe in /2S console

Select Create Ce& 8ob 0lo&

Select %un a sa/ple application

Choose the Word Count e%a!ple fro! the "rop "o$n !enu

Beplace A'our bucketB $ith the na!e of the S# bucket $e 5ust create":

4e%t' specify ho$ !any instances you $ant O 5ust leave it at t$o for
no$ (the !ore instances the !ore PPP it $ill be to run your 5ob.

Input data9
7utput data9
+his is going to be store" on our S# bucket...
+o"ays "ate

Select your keypair

)nce it*s "one go back to the AWS console

Select S#

Select your S# bucket.

Select %e1resh fro! the Actions !enu.

0indin- 'our output

+he results have been $ritten to the output fol"er in parts (H@1S for!at.

@ouble#click to "o$nloa".

)pen in a te%t e"itor (notepa"' ge"it.

(ou can "elete the results by right#clicking

on the fol"er an" selecting delete.
So/e notes

/!a:on charges for storage so this is $orth

"oing if you no longer nee" it.

Ha"oop $ill fail if it fin"s a fol"er $ith the

sa!e na!e $hen it $rites the output.

+he S# bucket is $here you $oul" uploa"

your .<ar or .p' files representing your co"e'
as $ell as any "ata. At is $orth creating a
separate fol"er for each of your runs.

Click on the upload button in the interface

to uploa" the! fro! your local !achine.

Shuttin- do&n 'our instance

/!a:on charges by the hour' so !ake sure you close your instance after each session.

Select the instance that is running through EC2 option in the AWS console

Bight#click an" select +er/inate to kill the instance

So/e tips9

Hadoop is not "esigne" to run on 2in"o$s Consi"er using C'-&in +irtualbo: D

https9;;&&&virtualbo:or-' or installing Linu: @int Dhttp9;;&&&linu:/intco/;E alongsi"e your
2in"o$s install (at ho!e.

Stick to earlier versions of Ha"oop such as !22! (they keep !oving things aroun"' especially the class
files that you*ll nee" to co!pile your co"e to <ar

Most books an" tutorials are base" on earlier versions of Ha"oop.

Sin-le2node !o"e is fine for testing your !ap#re"uce co"e before "eploying it.

+here are e%a!ple progra!s in the fol"er at:


Fet in the habit o1 stoppin- 'our instances &hen 'ou>re 1inishedG
Hadoop in Action is your frien" (if you*re using 8ava' consi"er getting a copy.
Chapter 2
Sho$s you ho$ to set everything up fro! scratch.
Chapter #
.rovi"es so!e goo" te!plates to base your co"e on.
Chapter "
@iscusses issues you !ay encounter $ith the "ifferent /.A versions.
Chapter H
+ells you ho$ to launch your MapBe"uce progra!s fro! the co!!an" line an" /2S console' as $ell as
using SQ buckets for "ata storage an" ho$ to access it.

So/e use1ul links
Installin- and usa-e9

%unnin- a <ob usin- the AWS 8ob1lo& DElastic @ap %educeE9
http://$$$.cs.$ashington.e"u/e"ucation/courses/cseRS8h/8Tau/rea"ings/co!!unications988T81#"l.p"f (.age 18T
Accessin- AWS and Hadoop throu-h the ter/inal D1or Linu: usersE9