You are on page 1of 6

Installation guide

Installation guide
Revision History
Revision No. Description Date Author
1.0 Created
20131025 Shirlin Voon
1.0 NUTCH
1. Download nutch from the link and unzi the !le.
htt"##www.aache.org#d$n#closer.cgi#nutch#
2. Install %&V& in u'untu
• javac -version
• sudo apt-get install openjdk-7-jdk
3. Set %&V&()*+,
• export JAVA_HOME = /usr/li/jv!/java-7-openjdk-i"#$
• ec%o &JAVA_HOME ## to check whether %&V&()*+, set correctl$
-. .un /in/nutc%0
5. .un the following command if $ou see /1ermission denied0
• c%!od 'x in/nutc%
2. Create folder urls and sa3e the starting url need to 'e crawled inside
seed(txt !le
4. ,dit the !le con)/regex-url*lter(txt. 5his !le stores the 'lacklist !le.
6. Download sorl from the link and unzi the !le.
htt"##www.aache.org#d$n#closer.cgi#lucene#solr#
7. Start sorl using command 'elow"
• cd Applications/apac%e-solr-"($(+/exa!ple
• java ,jar start(jar
10. S*.8 &D+I9 :,;SI5,
• htt"##localhost"6763#solr#admin#
• htt"##localhost"6763#solr#admin#stats.<s
11. Start nutch using command 'elow"
• cd &lications#aache=nutch=1.4
• 'in#nutch in<ect crawl#crawld' urls
• 'in#nutch generate crawl#crawld' crawl#segments
• s1 > ?ls @d crawl#segments#2A B tail =1?
• echo Cs1
• 'in#nutch fetch Cs1
• 'in#nutch arse Cs1
1
Installation guide
• 'in#nutch udated crawl#crawld' Cs1
• 'in#nutch in3ertlinks crawl#linkd' @dir crawl#segments
• 'in#nutch solrindeD htt"##124.0.0.1"6763#solr# crawl#crawld' @
linkd' crawl#linkd' crawl#segments#A
12. Start nutch command can simlif$ into 1 command as 'elow"
• 'in#nutch crawl urls @dir crawl @solr htt"##localhost"6763#solr#
=deth 3 @to9 5
2.0 Apachey!"#$H$
1. Create folder www at home
2. Download h+$&dmin and unzi and ut in folder www
htt"##www.hm$admin.net#home(age#downloads.h
3. sudo at=get install udate
-. sudo at=get install hm$admin
5. sudo at=get install aache2
2. sudo at=get install li'aache2=mod=h5
4. Install +$SE8 FotionalG
• sudo at=get install m$sHl=ser3er li'aache2=mod=auth=m$sHl h5=
m$sHl
• sudo m$sHl(install(d'
• sudo at=get install m$sHl=client=core=5.5
• sudo at=get install h5=cli
6. Chage default document root
• sudo c #etc#aache2#sites=a3aila'le#default #etc#aache2#sites=
a3aila'le#m$site
• gksudo gedit #etc#aache2#sites=a3aila'le#m$site
• Change documentroot to new location. 1S" make sure no sace in
new location
• Change IDirector$ to new location
7. Deacti3ate old site and acti3ate new site
• sudo a2dissite default JJ sudo a2ensite m$site
10. .estart &ache2
• sudo ser3ice aache2 restart
%.0 &IT
1. sudo at=get install git ##install KI5
2. git init ##initialize git
2
Installation guide
3. git status ##check status
-. git add A ##add all !le
git add 1rism3 ##add folder
git add /test.<a3a0 ##add 1 !le
5. git rm A ##remo3e all !le
2. git commit @m /Descrition0 ##commit 'efore ush to ser3er
4. git ush htt"##de3L200.15.12.1-0"6060#1rism.git
6. git ull htt"##de3L200.15.12.1-0"6060#1rism.git
'.0 !IN&#( N)D( C#U!T(R !(TU$
1. Mou ma$ refer to link 'elow"
htt"##www.michael=noll.com#tutorials#running=hadoo=on=u'untu=linuD=single=
node=cluster#
2. +ake sure %&V& is installed
3. Create an user used for all machine
• sudo addgrou hadoo
• sudo adduser @ingrou hadoo rism
• su @ rism
-. Install ssh
• sudo at=get install ssh
• ssh localhost
5. Kenerate SS) ke$
• ssh=ke$gen @t rsa @1 /0
2. ,na'le SS) access to local machine without ke$ in assword e3er$ time
• cat #home#rism#.ssh#id(rsa.u' NN #home#rism#.ssh#authorized(ke$s
4. Disa'ling I132
• sudo nano #etc#s$sctl.conf
• co$ following lines to end of the !le
# disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
• sa3e and restart 1C
• Check whether I132 is ena'led using following command
• cat #roc#s$s#net#i32#conf#all#disa'le(i32
• Value 0 means ena'le and 3alue 1 means disa'le
6. Odate Chome#.'ashrc
7. *en conf#hadoo=en3.sh and set %&V&()*+,
10. Create director$ and set ermission
3
Installation guide
• sudo mkdir @ #a#hadoo#tm
• sudo chown rism #a#hadoo#tm
• sudo chmod 450 #a#hadoo#tm
11. Set 3 site.Dml !le
• conf#core=site.Dml
• conf#mared=site.Dml
• conf#hdfs=site.Dml
12. 'in#hadoo namenode @format AA if ermission denied run /chmod PD
'in#hadoo0
13. 'in#start=all.sh
1-. <s F+ake sure there is 9ame9odeQ Data9odeQ %o'5rackerQ 5ask5rackerQ
Secondar$9ame9odeG
15. Co$ !le from local !le s$stem to )DRS and run rocess
• 'in#hadoo dfs @co$Rrom8ocal SlocalRileDirector$T S)DRS Director$T
• 'in#hadoo dfs @ls S)DRS Director$T
• 'in#hadoo <ar hadooAeDamlesA.<ar wordcount S)DRS inutT S)DRS
oututT
12. *thers command
• 'in#hadoo dfs @cat #home#rism#outut#art=r=0000
• 'in#hadoo dfs @getmerge S)DRS Director$T SlocalRileDirector$T
• 'in#hadoo dfs @rmr #home#rism#test=outut ##remo3e director$
from )DRS
*.0 U#TI N)D( C#U!T(R !(TU$
1. Mou ma$ refer to link 'elow"
htt"##www.michael=noll.com#tutorials#running=hadoo=on=u'untu=linuD=multi=
node=cluster#
2. sudo nano #etc#hosts
200.15.12.1-1 rism1 ##master
200.15.12.2-2 rism2 ##sla3e
3. Co$ master SS) ke$ to sla3e authorized(ke$
• ssh=co$=id @I #home#rism#.ssh#id(rsa.u' rismLrism2
-. ,dit conf#masters Fonl$ for master 1CG
• Change /localhost0 to /rism10 ##rism1 is master
5. &dd all sla3e nodes in conf#sla3es Fonl$ for master 1CG
-
Installation guide
• rism1 ##rism1 as master and also sla3e
• rism2 ##rism2 as sla3e
• rism3 ##rism3 as sla3e
2. Set 3 site.Dml !le Fin all machinesG
• conf#core=site.Dml ##change /localhost0 to /rism10
• conf#mared=site.Dml ##change /localhost0 to /rism10
• conf#hdfs=site.Dml ##change 3alue to num'er of nodes
4. Rormat namenode Fin all machinesG
• 'in#hadoo namenode @format
• If fail to format the namenode
i. sudo rm #a#hadoo#tm @r ##remo3e director$
#a#hadoo#tm
ii. sudo mkdir #a#hadoo#tm ##create 'ack the director$
iii. sudo chown @. rism #a#hadoo#tm ##gi3e access ermission
to rism
6. Start DRS and +a.educe
• 'in#start=dfs.sh
• 'in#start=mared.sh
• If datanode does not start at sla3e
i. .eformat namenode in sla3e AAdata will lostAA
*.
ii. Odate namesaceID in ro'lem datanode
+anuall$ co$ 9ame9ode namesa3eID to Data9ode namesaceID
9ame9ode" #a#hadoo#tm#dfs#name#current#V,.SI*9
Data9ode" #a#hadoo#tm#dfs#data#current#V,.SI*9
7. Co$ !le from local !le s$stem to )DRS and run rocess
• 'in#hadoo dfs @co$Rrom8ocal SlocalRileDirector$T S)DRS Director$T
• 'in#hadoo dfs @ls S)DRS Director$T
• 'in#hadoo <ar hadooAeDamlesA.<ar wordcount S)DRS inutT S)DRS
oututT
10. If reduce <o' startQ lease check #etc#hosts !le for all machine. +ake
sure the hostname can 'e resol3e.
+.0 #INU, use-ul co..and
1. sudo mkdir #a##Create director$
2. sudo rm #a @r ##delete director$ recursi3e
3. sudo chown rism #a ##gi3e ermission for director$ /#a0
to user /rism0
-. sudo nano #etc#hosts ##oen !le /#etc#hosts0 in command
5
Installation guide
5. sudo gedit #etc#hosts ##oen !le /#etc#hosts0 in document
2. sudo addgrou hadoo ##add grou /hadoo0
4. sudo adduser @ingrou hadoo rism ##add user /rism0 into
grou /hadoo0
6. su @ rism ##change user to /rism0
/.0 R($)RT 0)RAT
1.0 2AC3U$ 4 R(C)5(R6
7.0 CH(C3#I!T
2