Beruflich Dokumente
Kultur Dokumente
Installation guide
Revision History
Revision No. Description Date Author
1.0 Created
20131025 Shirlin Voon
1.0 NUTCH
1. Download nutch from the link and unzi the !le.
htt"##www.aache.org#d$n#closer.cgi#nutch#
2. Install %&V& in u'untu
javac -version
sudo apt-get install openjdk-7-jdk
3. Set %&V&()*+,
export JAVA_HOME = /usr/li/jv!/java-7-openjdk-i"#$
ec%o &JAVA_HOME ## to check whether %&V&()*+, set correctl$
-. .un /in/nutc%0
5. .un the following command if $ou see /1ermission denied0
c%!od 'x in/nutc%
2. Create folder urls and sa3e the starting url need to 'e crawled inside
seed(txt !le
4. ,dit the !le con)/regex-url*lter(txt. 5his !le stores the 'lacklist !le.
6. Download sorl from the link and unzi the !le.
htt"##www.aache.org#d$n#closer.cgi#lucene#solr#
7. Start sorl using command 'elow"
cd Applications/apac%e-solr-"($(+/exa!ple
java ,jar start(jar
10. S*.8 &D+I9 :,;SI5,
htt"##localhost"6763#solr#admin#
htt"##localhost"6763#solr#admin#stats.<s
11. Start nutch using command 'elow"
cd &lications#aache=nutch=1.4
'in#nutch in<ect crawl#crawld' urls
'in#nutch generate crawl#crawld' crawl#segments
s1 > ?ls @d crawl#segments#2A B tail =1?
echo Cs1
'in#nutch fetch Cs1
'in#nutch arse Cs1
1
Installation guide
'in#nutch udated crawl#crawld' Cs1
'in#nutch in3ertlinks crawl#linkd' @dir crawl#segments
'in#nutch solrindeD htt"##124.0.0.1"6763#solr# crawl#crawld' @
linkd' crawl#linkd' crawl#segments#A
12. Start nutch command can simlif$ into 1 command as 'elow"
'in#nutch crawl urls @dir crawl @solr htt"##localhost"6763#solr#
=deth 3 @to9 5
2.0 Apachey!"#$H$
1. Create folder www at home
2. Download h+$&dmin and unzi and ut in folder www
htt"##www.hm$admin.net#home(age#downloads.h
3. sudo at=get install udate
-. sudo at=get install hm$admin
5. sudo at=get install aache2
2. sudo at=get install li'aache2=mod=h5
4. Install +$SE8 FotionalG
sudo at=get install m$sHl=ser3er li'aache2=mod=auth=m$sHl h5=
m$sHl
sudo m$sHl(install(d'
sudo at=get install m$sHl=client=core=5.5
sudo at=get install h5=cli
6. Chage default document root
sudo c #etc#aache2#sites=a3aila'le#default #etc#aache2#sites=
a3aila'le#m$site
gksudo gedit #etc#aache2#sites=a3aila'le#m$site
Change documentroot to new location. 1S" make sure no sace in
new location
Change IDirector$ to new location
7. Deacti3ate old site and acti3ate new site
sudo a2dissite default JJ sudo a2ensite m$site
10. .estart &ache2
sudo ser3ice aache2 restart
%.0 &IT
1. sudo at=get install git ##install KI5
2. git init ##initialize git
2
Installation guide
3. git status ##check status
-. git add A ##add all !le
git add 1rism3 ##add folder
git add /test.<a3a0 ##add 1 !le
5. git rm A ##remo3e all !le
2. git commit @m /Descrition0 ##commit 'efore ush to ser3er
4. git ush htt"##de3L200.15.12.1-0"6060#1rism.git
6. git ull htt"##de3L200.15.12.1-0"6060#1rism.git
'.0 !IN&#( N)D( C#U!T(R !(TU$
1. Mou ma$ refer to link 'elow"
htt"##www.michael=noll.com#tutorials#running=hadoo=on=u'untu=linuD=single=
node=cluster#
2. +ake sure %&V& is installed
3. Create an user used for all machine
sudo addgrou hadoo
sudo adduser @ingrou hadoo rism
su @ rism
-. Install ssh
sudo at=get install ssh
ssh localhost
5. Kenerate SS) ke$
ssh=ke$gen @t rsa @1 /0
2. ,na'le SS) access to local machine without ke$ in assword e3er$ time
cat #home#rism#.ssh#id(rsa.u' NN #home#rism#.ssh#authorized(ke$s
4. Disa'ling I132
sudo nano #etc#s$sctl.conf
co$ following lines to end of the !le
# disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
sa3e and restart 1C
Check whether I132 is ena'led using following command
cat #roc#s$s#net#i32#conf#all#disa'le(i32
Value 0 means ena'le and 3alue 1 means disa'le
6. Odate Chome#.'ashrc
7. *en conf#hadoo=en3.sh and set %&V&()*+,
10. Create director$ and set ermission
3
Installation guide
sudo mkdir @ #a#hadoo#tm
sudo chown rism #a#hadoo#tm
sudo chmod 450 #a#hadoo#tm
11. Set 3 site.Dml !le
conf#core=site.Dml
conf#mared=site.Dml
conf#hdfs=site.Dml
12. 'in#hadoo namenode @format AA if ermission denied run /chmod PD
'in#hadoo0
13. 'in#start=all.sh
1-. <s F+ake sure there is 9ame9odeQ Data9odeQ %o'5rackerQ 5ask5rackerQ
Secondar$9ame9odeG
15. Co$ !le from local !le s$stem to )DRS and run rocess
'in#hadoo dfs @co$Rrom8ocal SlocalRileDirector$T S)DRS Director$T
'in#hadoo dfs @ls S)DRS Director$T
'in#hadoo <ar hadooAeDamlesA.<ar wordcount S)DRS inutT S)DRS
oututT
12. *thers command
'in#hadoo dfs @cat #home#rism#outut#art=r=0000
'in#hadoo dfs @getmerge S)DRS Director$T SlocalRileDirector$T
'in#hadoo dfs @rmr #home#rism#test=outut ##remo3e director$
from )DRS
*.0 U#TI N)D( C#U!T(R !(TU$
1. Mou ma$ refer to link 'elow"
htt"##www.michael=noll.com#tutorials#running=hadoo=on=u'untu=linuD=multi=
node=cluster#
2. sudo nano #etc#hosts
200.15.12.1-1 rism1 ##master
200.15.12.2-2 rism2 ##sla3e
3. Co$ master SS) ke$ to sla3e authorized(ke$
ssh=co$=id @I #home#rism#.ssh#id(rsa.u' rismLrism2
-. ,dit conf#masters Fonl$ for master 1CG
Change /localhost0 to /rism10 ##rism1 is master
5. &dd all sla3e nodes in conf#sla3es Fonl$ for master 1CG
-
Installation guide
rism1 ##rism1 as master and also sla3e
rism2 ##rism2 as sla3e
rism3 ##rism3 as sla3e
2. Set 3 site.Dml !le Fin all machinesG
conf#core=site.Dml ##change /localhost0 to /rism10
conf#mared=site.Dml ##change /localhost0 to /rism10
conf#hdfs=site.Dml ##change 3alue to num'er of nodes
4. Rormat namenode Fin all machinesG
'in#hadoo namenode @format
If fail to format the namenode
i. sudo rm #a#hadoo#tm @r ##remo3e director$
#a#hadoo#tm
ii. sudo mkdir #a#hadoo#tm ##create 'ack the director$
iii. sudo chown @. rism #a#hadoo#tm ##gi3e access ermission
to rism
6. Start DRS and +a.educe
'in#start=dfs.sh
'in#start=mared.sh
If datanode does not start at sla3e
i. .eformat namenode in sla3e AAdata will lostAA
*.
ii. Odate namesaceID in ro'lem datanode
+anuall$ co$ 9ame9ode namesa3eID to Data9ode namesaceID
9ame9ode" #a#hadoo#tm#dfs#name#current#V,.SI*9
Data9ode" #a#hadoo#tm#dfs#data#current#V,.SI*9
7. Co$ !le from local !le s$stem to )DRS and run rocess
'in#hadoo dfs @co$Rrom8ocal SlocalRileDirector$T S)DRS Director$T
'in#hadoo dfs @ls S)DRS Director$T
'in#hadoo <ar hadooAeDamlesA.<ar wordcount S)DRS inutT S)DRS
oututT
10. If reduce <o' startQ lease check #etc#hosts !le for all machine. +ake
sure the hostname can 'e resol3e.
+.0 #INU, use-ul co..and
1. sudo mkdir #a##Create director$
2. sudo rm #a @r ##delete director$ recursi3e
3. sudo chown rism #a ##gi3e ermission for director$ /#a0
to user /rism0
-. sudo nano #etc#hosts ##oen !le /#etc#hosts0 in command
5
Installation guide
5. sudo gedit #etc#hosts ##oen !le /#etc#hosts0 in document
2. sudo addgrou hadoo ##add grou /hadoo0
4. sudo adduser @ingrou hadoo rism ##add user /rism0 into
grou /hadoo0
6. su @ rism ##change user to /rism0
/.0 R($)RT 0)RAT
1.0 2AC3U$ 4 R(C)5(R6
7.0 CH(C3#I!T
2