Sie sind auf Seite 1von 3

05/03/2016

HadoopStreaming

HADOOPSTREAMING
http://www.tutorialspoint.com/hadoop/hadoop_streaming.htm

Copyrighttutorialspoint.com

HadoopstreamingisautilitythatcomeswiththeHadoopdistribution.Thisutilityallowsyoutocreateand
runMap/Reducejobswithanyexecutableorscriptasthemapperand/orthereducer.

ExampleUsingPython
ForHadoopstreaming,weareconsideringthewordcountproblem.AnyjobinHadoopmusthavetwo
phases:mapperandreducer.Wehavewrittencodesforthemapperandthereducerinpythonscripttorun
itunderHadoop.OnecanalsowritethesameinPerlandRuby.

MapperPhaseCode
!/usr/bin/python
importsys
#Inputtakesfromstandardinputformylineinsys.stdin:
#Removewhitespaceeithersidemyline=myline.strip()
#Breakthelineintowordswords=myline.split()
#Iteratethewordslistformywordinwords:
#Writetheresultstostandardoutputprint'%s\t%s'%(myword,1)

Makesurethisfilehasexecutionpermissionchmod + x/home/expert/hadoop 1.2.1/mapper. py .

ReducerPhaseCode
#!/usr/bin/python
fromoperatorimportitemgetter
importsys
current_word=""
current_count=0
word=""
#Inputtakesfromstandardinputformylineinsys.stdin:
#Removewhitespaceeithersidemyline=myline.strip()
#Splittheinputwegotfrommapper.pyword,count=myline.split('\t',1)
#Convertcountvariabletointeger
try:
count=int(count)
exceptValueError:
#Countwasnotanumber,sosilentlyignorethislinecontinue
ifcurrent_word==word:
current_count+=count
else:
ifcurrent_word:
#Writeresulttostandardoutputprint'%s\t%s'%(current_word,current_count)
current_count=count
current_word=word
#Donotforgettooutputthelastwordifneeded!

http://www.tutorialspoint.com/cgibin/printpage.cgi

1/3

05/03/2016

HadoopStreaming

ifcurrent_word==word:
print'%s\t%s'%(current_word,current_count)

Savethemapperandreducercodesinmapper.pyandreducer.pyinHadoophomedirectory.Makesure
thesefileshaveexecutionpermissionchmod + xmapper. pyandchmod + xreducer. py .Aspythonis
indentationsensitivesothesamecodecanbedownloadfromthebelowlink.

ExecutionofWordCountProgram
$$HADOOP_HOME/bin/hadoopjarcontrib/streaming/hadoopstreaming1.
2.1.jar\
inputinput_dirs\
outputoutput_dir\
mapper<path/mapper.py\
reducer<path/reducer.py

Where"\"isusedforlinecontinuationforclearreadability.

ForExample,
./bin/hadoopjarcontrib/streaming/hadoopstreaming1.2.1.jarinputmyinputoutput
myoutputmapper/home/expert/hadoop1.2.1/mapper.pyreducer/home/expert/hadoop
1.2.1/reducer.py

HowStreamingWorks
Intheaboveexample,boththemapperandthereducerarepythonscriptsthatreadtheinputfrom
standardinputandemittheoutputtostandardoutput.TheutilitywillcreateaMap/Reducejob,submit
thejobtoanappropriatecluster,andmonitortheprogressofthejobuntilitcompletes.
Whenascriptisspecifiedformappers,eachmappertaskwilllaunchthescriptasaseparateprocesswhen
themapperisinitialized.Asthemappertaskruns,itconvertsitsinputsintolinesandfeedthelinestothe
standardinputS T DI N oftheprocess.Inthemeantime,themappercollectsthelineorientedoutputs
fromthestandardoutputS T DOU T oftheprocessandconvertseachlineintoakey/valuepair,whichis
collectedastheoutputofthemapper.Bydefault,theprefixofalineuptothefirsttabcharacteristhekey
andtherestofthelineexcludingthetabcharacter willbethevalue.Ifthereisnotabcharacterintheline,
thentheentirelineisconsideredasthekeyandthevalueisnull.However,thiscanbecustomized,asper
oneneed.
Whenascriptisspecifiedforreducers,eachreducertaskwilllaunchthescriptasaseparateprocess,then
thereducerisinitialized.Asthereducertaskruns,itconvertsitsinputkey/valuespairsintolinesandfeeds
thelinestothestandardinputS T DI N oftheprocess.Inthemeantime,thereducercollectstheline
orientedoutputsfromthestandardoutputS T DOU T oftheprocess,convertseachlineintoakey/value
pair,whichiscollectedastheoutputofthereducer.Bydefault,theprefixofalineuptothefirsttab
characteristhekeyandtherestofthelineexcludingthetabcharacter isthevalue.However,thiscanbe
customizedasperspecificrequirements.

ImportantCommands

http://www.tutorialspoint.com/cgibin/printpage.cgi

2/3

05/03/2016

HadoopStreaming

Parameters

Description

inputdirectory/filename

Inputlocationformapper.Required

outputdirectoryname

Outputlocationforreducer.Required

mapperexecutableorscriptor
JavaClassName

Mapperexecutable.Required

reducerexecutableorscriptor
JavaClassName

Reducerexecutable.Required

filefilename

Makesthemapper,reducer,orcombinerexecutable
availablelocallyonthecomputenodes.

inputformatJavaClassName

Classyousupplyshouldreturnkey/valuepairsofTextclass.
Ifnotspecified,TextInputFormatisusedasthedefault.

outputformatJavaClassName

Classyousupplyshouldtakekey/valuepairsofTextclass.If
notspecified,TextOutputformatisusedasthedefault.

partitionerJavaClassName

Classthatdetermineswhichreduceakeyissentto.

combinerstreamingCommandor
JavaClassName

Combinerexecutableformapoutput.

cmdenvname=value

Passestheenvironmentvariabletostreamingcommands.

inputreader

Forbackwardscompatibility:specifiesarecordreaderclass
insteadof aninputf ormatclass .

verbose

Verboseoutput.

lazyOutput

Createsoutputlazily.Forexample,iftheoutputformatis
basedonFileOutputFormat,theoutputfileiscreatedonlyon
thefirstcalltooutput.collectorC ontext. write .

numReduceTasks

Specifiesthenumberofreducers.

mapdebug

Scripttocallwhenmaptaskfails.

reducedebug

Scripttocallwhenreducetaskfails.

http://www.tutorialspoint.com/cgibin/printpage.cgi

3/3

Das könnte Ihnen auch gefallen