Beruflich Dokumente
Kultur Dokumente
TECHNO.FUNDA.
GuidedbyProf.SCGupta
Assignment4
MapReduce
Group11
AbhijitKawale(2012CS50273)
PreritPatidar(2012CS50294)
ArvindBhuria(2012CS50280)
SaurabhNakra(2012CS50298)
BhaveshChauhan(2012CS10221)
InstallationSteps:
MapReducewasinstalledwithHadoopinthepreviousassignment.
Filesrequiredtochangewere:mapredsite.xmlandyarnsite.xml
mapredsite.xmlcontainsthehostandportforthemapreducejobtrackerand
yarnsite.xmlcontainsthepropertiesforthenodetoworkasyarnnode.
PartALargetextfilewordcount
RunningMapreduce:
ToruntheWordCountprogramcodewaswritteninWordCount.javaanduploadedto
themastervm.
Step1
Directoryunitswascreatedinthehomefoldertostoreallthe.classfiles
Commandmkdirunits
Step2
hadoopcore.jarwasneededforthewordcountprogramtocompileandexecute.So,
hadoopcore1.2.1.jarwasdownloadedtothemastervm.
Commandwgethttp://mvnrepository.com/artifact/org.apache.hadoop/hadoopcore/1.2.1
Step3
ThefollowingcommandsareusedforcompilingtheWordCount.javaprogramand
creatingajarfortheprogram.
javacclasspathhadoopcore1.2.1.jardunitsWordCount.java
jarcvfunits.jarCunits/.
Step4
ThefollowingcommandisusedtocreateaninputdirectoryinHDFS.
$HADOOP_HOME/bin/hadoopfsmkdir/user/hadoop/input_dir
Step5
The following command is used to copy the input file named sample.txt in the input
directoryofHDFS.
$HADOOP_HOME/bin/hadoopfsput/home/hduser/sample.txt
/user/hadoop/input_dir
Step6
Afterthisapplicationwasrunusing:
$HADOOP_HOME/bin/hadoopjarunits.jarWordCount
/user/hadoop/input_dir/user/hadoop/output_dir
Step7
ThefollowingcommandisusedtoseetheoutputinPart00000file.Thisfileis
generatedbyHDFS.
$HADOOP_HOME/bin/hadoopfscat/user/hadoop/output_dir/part00000
ProblemsEncountered:
Whenrunningtheapplicationwegotthefollowingerror:
Gotexception:java.net.ConnectException:CallFrom
baadaldesktopvm/127.0.1.1to
baadalservervm.cse.iitd.ernet.in:54310failedonconnection
exception:java.net.ConnectException:Connectionrefused
Wefiguredoutthatproblemwasin/etc/hostnamefilewhichcontainedtheVM
hostname.Itwasbaadaldesktopvmbydefaultbutitshouldhavebeenthesameas
masterorslavenameaddedto/etc/hosts.So,wechanged/etc/hostnameineachofthe
VMsandsetthehostnametomasterformastervmandslave1forslave1,slave2for
slave2andslave3forslave3.Finallly,theproblemgofixedandweranthewordcount
applicationsuccessfully.
Herearethescreenshotsafterrunning:
ResultsforWORDCOUNT:
Inputfileusedwassample.txtpresentinthesubmissionfolder.Itwasa2.4MBsizefile
whichcontainstheline"samplefileherearetherandomwords"62856times.
Resultsobtainedafterrunningthewordcountprogramwereasfollows:
Asitcanbeseenthatoutoftheprogramwasaccurate.
(b)AftershuttingdownoneVM,resultsdidnotchange,Herearethe
screenshots.
ComputingAveragegradeofthecoursesusing
MapReduce
InputfilewasgeneratedusingjavacodeAverage.javapresentinthesubmissionfolder.
Itcontains10,000rowsand1250studentsaredistributedamong8courses.
Average.javawascreatedtocalculatetheaveragegradeofeachofthe8courses.
Fortheinputfilegrades_our.txtpresentinthesubmissionfolderoutputwas:
ApproachUsed:
Mapperreturnskeyvaluepair.Keywascourseid,andvaluewascorrespondinggrade
ofastudentinthatcourse
CombinerandReducertakeinputkeyasText(courseid)andinputvaluesasiteratorof
<FloatWritable>andoutputskeyasText(courseid)andvalueasFloatWritable(average
grade).
Herearethescreenshotsafterrunning: