Beruflich Dokumente
Kultur Dokumente
HadoopStreaming
HADOOPSTREAMING
http://www.tutorialspoint.com/hadoop/hadoop_streaming.htm
Copyrighttutorialspoint.com
HadoopstreamingisautilitythatcomeswiththeHadoopdistribution.Thisutilityallowsyoutocreateand
runMap/Reducejobswithanyexecutableorscriptasthemapperand/orthereducer.
ExampleUsingPython
ForHadoopstreaming,weareconsideringthewordcountproblem.AnyjobinHadoopmusthavetwo
phases:mapperandreducer.Wehavewrittencodesforthemapperandthereducerinpythonscripttorun
itunderHadoop.OnecanalsowritethesameinPerlandRuby.
MapperPhaseCode
!/usr/bin/python
importsys
#Inputtakesfromstandardinputformylineinsys.stdin:
#Removewhitespaceeithersidemyline=myline.strip()
#Breakthelineintowordswords=myline.split()
#Iteratethewordslistformywordinwords:
#Writetheresultstostandardoutputprint'%s\t%s'%(myword,1)
ReducerPhaseCode
#!/usr/bin/python
fromoperatorimportitemgetter
importsys
current_word=""
current_count=0
word=""
#Inputtakesfromstandardinputformylineinsys.stdin:
#Removewhitespaceeithersidemyline=myline.strip()
#Splittheinputwegotfrommapper.pyword,count=myline.split('\t',1)
#Convertcountvariabletointeger
try:
count=int(count)
exceptValueError:
#Countwasnotanumber,sosilentlyignorethislinecontinue
ifcurrent_word==word:
current_count+=count
else:
ifcurrent_word:
#Writeresulttostandardoutputprint'%s\t%s'%(current_word,current_count)
current_count=count
current_word=word
#Donotforgettooutputthelastwordifneeded!
http://www.tutorialspoint.com/cgibin/printpage.cgi
1/3
05/03/2016
HadoopStreaming
ifcurrent_word==word:
print'%s\t%s'%(current_word,current_count)
Savethemapperandreducercodesinmapper.pyandreducer.pyinHadoophomedirectory.Makesure
thesefileshaveexecutionpermissionchmod + xmapper. pyandchmod + xreducer. py .Aspythonis
indentationsensitivesothesamecodecanbedownloadfromthebelowlink.
ExecutionofWordCountProgram
$$HADOOP_HOME/bin/hadoopjarcontrib/streaming/hadoopstreaming1.
2.1.jar\
inputinput_dirs\
outputoutput_dir\
mapper<path/mapper.py\
reducer<path/reducer.py
Where"\"isusedforlinecontinuationforclearreadability.
ForExample,
./bin/hadoopjarcontrib/streaming/hadoopstreaming1.2.1.jarinputmyinputoutput
myoutputmapper/home/expert/hadoop1.2.1/mapper.pyreducer/home/expert/hadoop
1.2.1/reducer.py
HowStreamingWorks
Intheaboveexample,boththemapperandthereducerarepythonscriptsthatreadtheinputfrom
standardinputandemittheoutputtostandardoutput.TheutilitywillcreateaMap/Reducejob,submit
thejobtoanappropriatecluster,andmonitortheprogressofthejobuntilitcompletes.
Whenascriptisspecifiedformappers,eachmappertaskwilllaunchthescriptasaseparateprocesswhen
themapperisinitialized.Asthemappertaskruns,itconvertsitsinputsintolinesandfeedthelinestothe
standardinputS T DI N oftheprocess.Inthemeantime,themappercollectsthelineorientedoutputs
fromthestandardoutputS T DOU T oftheprocessandconvertseachlineintoakey/valuepair,whichis
collectedastheoutputofthemapper.Bydefault,theprefixofalineuptothefirsttabcharacteristhekey
andtherestofthelineexcludingthetabcharacter willbethevalue.Ifthereisnotabcharacterintheline,
thentheentirelineisconsideredasthekeyandthevalueisnull.However,thiscanbecustomized,asper
oneneed.
Whenascriptisspecifiedforreducers,eachreducertaskwilllaunchthescriptasaseparateprocess,then
thereducerisinitialized.Asthereducertaskruns,itconvertsitsinputkey/valuespairsintolinesandfeeds
thelinestothestandardinputS T DI N oftheprocess.Inthemeantime,thereducercollectstheline
orientedoutputsfromthestandardoutputS T DOU T oftheprocess,convertseachlineintoakey/value
pair,whichiscollectedastheoutputofthereducer.Bydefault,theprefixofalineuptothefirsttab
characteristhekeyandtherestofthelineexcludingthetabcharacter isthevalue.However,thiscanbe
customizedasperspecificrequirements.
ImportantCommands
http://www.tutorialspoint.com/cgibin/printpage.cgi
2/3
05/03/2016
HadoopStreaming
Parameters
Description
inputdirectory/filename
Inputlocationformapper.Required
outputdirectoryname
Outputlocationforreducer.Required
mapperexecutableorscriptor
JavaClassName
Mapperexecutable.Required
reducerexecutableorscriptor
JavaClassName
Reducerexecutable.Required
filefilename
Makesthemapper,reducer,orcombinerexecutable
availablelocallyonthecomputenodes.
inputformatJavaClassName
Classyousupplyshouldreturnkey/valuepairsofTextclass.
Ifnotspecified,TextInputFormatisusedasthedefault.
outputformatJavaClassName
Classyousupplyshouldtakekey/valuepairsofTextclass.If
notspecified,TextOutputformatisusedasthedefault.
partitionerJavaClassName
Classthatdetermineswhichreduceakeyissentto.
combinerstreamingCommandor
JavaClassName
Combinerexecutableformapoutput.
cmdenvname=value
Passestheenvironmentvariabletostreamingcommands.
inputreader
Forbackwardscompatibility:specifiesarecordreaderclass
insteadof aninputf ormatclass .
verbose
Verboseoutput.
lazyOutput
Createsoutputlazily.Forexample,iftheoutputformatis
basedonFileOutputFormat,theoutputfileiscreatedonlyon
thefirstcalltooutput.collectorC ontext. write .
numReduceTasks
Specifiesthenumberofreducers.
mapdebug
Scripttocallwhenmaptaskfails.
reducedebug
Scripttocallwhenreducetaskfails.
http://www.tutorialspoint.com/cgibin/printpage.cgi
3/3