Hadoop and AWS


Login to your AWS account.

Select the EC2 service.

Click on Launch Instance

Click Select

Click on Quick Launch Wizard

Select Ubuntu Server 12!"# L$S (a free tier instance.

Click on %evie& and Launch...

Click on Launch...

Select Create a ne& ke' pair fro! the top "rop#"o$n bo%...

&ive the keypair a na!e an" click on (o&nload )e' *air

Save the keypair $here you can fin" it' an" click on Launch Instances.

(our instance is no$ being launche". Click on +ie& Instances to see it.

)ur instance is initialise"

Click the instance (it*ll have a green light ne%t to it' to "isplay infor!ation about it.
+his $ill be i!portant in a !inute

Select the ,ava SSH Client option.

-nter the path to the key pair file you "o$nloa"e"' i.e. right#click on the file if you*re
not sure.

Start .u++(gen (Start !enu' click /ll .rogra!s 0 .u++( 0 *u$$,-en.

Click on Load button

1in" the fol"er $ith your .pe/ key in.

Select All 0iles .. an" click on your /2S pe/ key.
Settin- up *utt' 1or AWS instance connection

/ success !essage shoul" appear' no$ $e nee" to save the key in .3++(*s o$n for!at.

Click on Save private ke'

Confir! you $ish to save &ithout a passphrase' an" save in the sa!e "irectory.

Connectin- to our instance usin- *u$$, SSH

&o to Start 0 /ll .rogra!s 0 .u++( 0 .u++( to loa" up *U$$, SSH

S$itch back to the /2S console' an" copy the a""ress of your instance' it*ll look
so!ething like ec223"22142152133eu2&est21co/putea/azona&sco/

+his is the a""ress of the instance that $e*ll be using to connect to.

.aste the a""ress here

Scroll "o$n an"
click on Auth

4o$ click on 6ro&se an" navigate to the key you 5ust save" (en"s $ith *ppk* e%tension.

4o$ click on 7pen

Click on 'es $hen the security alert appears.

+ype ubuntu as the login na!e an" press Enter key

2e "on*t nee" a pass$or" as our key $ill be sent across to the instance.

Success6 2e*re no$ logge" in to our Ubuntu instance

Installin- 8ava9

2hilst in the ter!inal enter the follo$ing co!!an"s:
su"o apt#get up"ate
su"o apt#get install open5"k#7#5re

Installin- Hadoop9

&et the file fro! e%ternal site:
$get http://archive.apache.org/"ist/ha"oop/core/ha"oop#8.99.8/ha"oop#8.99.8.tar.g:

3npack it:
tar %:f ha"oop#8.99.8.tar.g:

Copy it to so!e$here !ore sensible like our local user "irectory:
su"o cp 2r ha"oop#;/ /usr/local
+here*s a space here

-"it the ter!inal script:
nano </.bash

/"" these lines at the botto!:
e%port ,/=/>H)M-?usr/
e%port H/@)).>H)M-?usr/local/ha"oop#8.99.8

Save the file (ctrl#% an" type *y*:

/"" it to the ter!inal environ!ent:

source </.bash
4o$ $hen Ha"oop nee"s ,ava the ter!inal $ill point it in the right "irection

Let*s !ove in to the !ain "irectory of the application:
c" /usr/local/ha"oop#;

4o$ e"it Ha"oop*s set up script:

su"o nano conf/ha"oop#env.sh

Save (ctrl#%' then type *y*
e%port ,/=/>H)M-?/usr

/"" the configuration file to the ter!inals scope:
source conf/ha"oop#env.sh
%unnin- an e:a/ple usin- Sin-le node /ode9

Calculating .A:
su"o bin;hadoop <ar ha"oop#!apre"#e%a!ples#;.5ar pi 1! 1!!!!!!!

Another e:a/ple= usin- so/e actual data

Create a "irectory to put our "ata in:
su"o !k"ir input

Copy the very interesting B-/@M-.t%t file to our ne$ input fol"er:

su"o cp B-/@M-.t%t LAC-4S-.t%t input

4o$ $e count up the total $or"s an" $hat they are (Ha"oop $ill create the output
fol"er for us:
su"o bin;hadoop <ar ha"oop#!apre"#e%a!ples#;.5ar &ordcount input output

Have a look at the final output:
nano output/part2r2!!!!!

What>s happenin-?

public static class +okeni:erMapper
e%ten"s MapperC)b5ect' +e%t' +e%t' Ant2ritable0D

private final static Ant2ritable one ? ne$ Ant2ritable(1E
private +e%t $or" ? ne$ +e%t(E

public voi" /ap()b5ect key' +e%t value' Conte%t conte%t
thro$s A)-%ception' Anterrupte"-%ception D
String+okeni:er itr ? ne$ String+okeni:er(value.toString(E
$hile (itr.hasMore+okens( D
conte%t.$rite($or"' oneE
$he @apper9 splits up the &ords

public static class AntSu!Be"ucer
e%ten"s Be"ucerC+e%t'Ant2ritable'+e%t'Ant2ritable0 D
private Ant2ritable result ? ne$ Ant2ritable(E
public voi" reduce(+e%t key' AterableCAnt2ritable0 values'
Conte%t conte%t
thro$s A)-%ception' Anterrupte"-%ception D
int su! ? 8E
for (Ant2ritable val : values D
su! G? val.get(E
conte%t.$rite(key' resultE
$he %educer9 takes the input o1 A&ord= countB and su/s up the

public static voi" !ain(StringHI args thro$s -%ception D
Configuration conf ? ne$ Configuration(E
StringHI other/rgs ? ne$ &eneric)ptions.arser(conf' args.getBe!aining/rgs(E
if (other/rgs.length 6? 9 D
Syste!.err.println(J3sage: $or"count Cin0 Cout0JE
,ob 5ob ? ne$ ,ob(conf' J$or" countJE
1ileAnput1or!at.a""Anput.ath(5ob' ne$ .ath(other/rgsH8IE
1ile)utput1or!at.set)utput.ath(5ob' ne$ .ath(other/rgsH1IE
Syste!.e%it(5ob.$ait1orCo!pletion(true M 8 : 1E
@ain runs and con1i-ures the @ap%educe <ob
Sets up the <ob
&ith input and
output 1olders
and the /ap
and reduce
classes to use

)ne last e%a!ple' this ti!e using /2S to create the Ha"oop cluster
for us.
2e $ill use the $or" count e%a!ple use" previously.

Hadoop in the AWS Cloud

1irst $e nee" a place to put the "ata after it has been pro"uce"...
A/azon S# (Si!ple Storage Service:
/n online storage $eb service provi"ing storage through $eb
services interfaces (B-S+' S)/.' an" Kit+orrent.

Select S# fro! the console

&ive it a na!e. (not @'6ucket
# so!ething uniNue' also 4)
C/.A+/L L-++-BS.
Choose Ireland fro! the
region list (it*s closer' so less latency.

(our ne$ bucket

%unnin- a @ap %educe pro-ra/ in AWS

Select Elastic @ap%educe in /2S console

Select Create Ce& 8ob 0lo&

Select %un a sa/ple application

Choose the Word Count e%a!ple fro! the "rop "o$n !enu

Beplace A'our bucketB $ith the na!e of the S# bucket $e 5ust create":

4e%t' specify ho$ !any instances you $ant O 5ust leave it at t$o for
no$ (the !ore instances the !ore PPP it $ill be to run your 5ob.

Input data9
7utput data9
+his is going to be store" on our S# bucket...
+o"ays "ate

Select your keypair

)nce it*s "one go back to the AWS console

Select S#

Select your S# bucket.

Select %e1resh fro! the Actions !enu.
0indin- 'our output

+he results have been $ritten to the output fol"er in parts (H@1S for!at.

@ouble#click to "o$nloa".

)pen in a te%t e"itor (notepa"' ge"it.

(ou can "elete the results by right#clicking
on the fol"er an" selecting delete.
So/e notes

/!a:on charges for storage so this is $orth
"oing if you no longer nee" it.

Ha"oop $ill fail if it fin"s a fol"er $ith the
sa!e na!e $hen it $rites the output.

+he S# bucket is $here you $oul" uploa"
your .<ar or .p' files representing your co"e'
as $ell as any "ata. At is $orth creating a
separate fol"er for each of your runs.

Click on the upload button in the interface
to uploa" the! fro! your local !achine.

Shuttin- do&n 'our instance

/!a:on charges by the hour' so !ake sure you close your instance after each session.

Select the instance that is running through EC2 option in the AWS console

Bight#click an" select +er/inate to kill the instance

So/e tips9

Hadoop is not "esigne" to run on 2in"o$s Consi"er using C'-&in +irtualbo: D
https9;;&&&virtualbo:or-' or installing Linu: @int Dhttp9;;&&&linu:/intco/;E alongsi"e your
2in"o$s install (at ho!e.

Stick to earlier versions of Ha"oop such as !22! (they keep !oving things aroun"' especially the class
files that you*ll nee" to co!pile your co"e to <ar

Most books an" tutorials are base" on earlier versions of Ha"oop.

Sin-le2node !o"e is fine for testing your !ap#re"uce co"e before "eploying it.

+here are e%a!ple progra!s in the fol"er at:

Fet in the habit o1 stoppin- 'our instances &hen 'ou>re 1inishedG
Hadoop in Action is your frien" (if you*re using 8ava' consi"er getting a copy.
Chapter 2
Sho$s you ho$ to set everything up fro! scratch.
Chapter #
.rovi"es so!e goo" te!plates to base your co"e on.
Chapter "
@iscusses issues you !ay encounter $ith the "ifferent /.A versions.
Chapter H
+ells you ho$ to launch your MapBe"uce progra!s fro! the co!!an" line an" /2S console' as $ell as
using SQ buckets for "ata storage an" ho$ to access it.

So/e use1ul links
Installin- and usa-e9

%unnin- a <ob usin- the AWS 8ob1lo& DElastic @ap %educeE9
http://$$$.cs.$ashington.e"u/e"ucation/courses/cseRS8h/8Tau/rea"ings/co!!unications988T81#"l.p"f (.age 18T
Accessin- AWS and Hadoop throu-h the ter/inal D1or Linu: usersE9

Sign up to vote on this title
UsefulNot useful

Master Your Semester with Scribd & The New York Times

Special offer: Get 4 months of Scribd and The New York Times for just $1.87 per week!

Master Your Semester with a Special Offer from Scribd & The New York Times