Sie sind auf Seite 1von 10

Install sendiri

Tuesday, August 02, 2016 1:26 PM

1. Tambahkan list ip dan nama host di masing-masing file /etc/host pada server hadoop yang ada
167.205.64.109 hadoop1.unb.ac.id hadoop1
167.205.64.100 hadoop2.unb.ac.id hadoop2
167.205.64.79 hadoop3.unb.ac.id hadoop3

2. Install java di semua server hadoop, jangan lupa set proxy


root@hadoop1:/home/hadoop# apt-add-repository ppa:webupd8team/java
Oracle Java (JDK) Installer (automatically downloads and installs Oracle JDK7 / JDK8 / JDK9).
There are no actual Java files in this PPA.

More info (and Ubuntu installation instructions):


- for Oracle Java 7: http://www.webupd8.org/2012/01/install-oracle-java-jdk-7-in-ubuntu-
via.html
- for Oracle Java 8: http://www.webupd8.org/2012/09/install-oracle-java-8-in-ubuntu-via-
ppa.html

Debian installation instructions:


- Oracle Java 7: http://www.webupd8.org/2012/06/how-to-install-oracle-java-7-in-debian.html
- Oracle Java 8: http://www.webupd8.org/2014/03/how-to-install-oracle-java-8-in-debian.html

Oracle Java 9 (for both Ubuntu and Debian): http://www.webupd8.org/2015/02/install-oracle-


java-9-in-ubuntu-linux.html

For JDK9, the PPA uses standard builds from: https://jdk9.java.net/download/ (and not the
Jigsaw builds!).

Important!!! For now, you should continue to use Java 8 because Oracle Java 9 is available as an
early access release (it should be released in 2016)! You should only use Oracle Java 9 if you
explicitly need it, because it may contain bugs and it might not include the latest security
patches! Also, some Java options were removed in JDK9, so you may encounter issues with
various Java apps. More information and installation instructions (Ubuntu / Linux Mint / Debian):
http://www.webupd8.org/2015/02/install-oracle-java-9-in-ubuntu-linux.html
More info: https://launchpad.net/~webupd8team/+archive/ubuntu/java
Press [ENTER] to continue or ctrl-c to cancel adding it

gpg: keyring `/tmp/tmp508q25fn/secring.gpg' created


gpg: keyring `/tmp/tmp508q25fn/pubring.gpg' created
gpg: requesting key EEA14886 from hkp server keyserver.ubuntu.com
gpg: /tmp/tmp508q25fn/trustdb.gpg: trustdb created
gpg: key EEA14886: public key "Launchpad VLC" imported
gpg: no ultimately trusted keys found
gpg: Total number processed: 1
gpg: imported: 1 (RSA: 1)
OK

root@hadoop1:/home/hadoop# apt-get update

Big data Technical implementation Page 1


root@hadoop1:/home/hadoop# apt-get install oracle-java7-installer

root@hadoop1:/home/hadoop# update-java-alternatives -s java-7-oracle

3. Disable ipv6
Tambahkan baris berikut pada /etc/sysctl.conf
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

Lakukan hal ini:


hadoopuser@hadoop1:~$ cat /proc/sys/net/ipv6/conf/all/disable_ipv6
Dan hasilnya harus bernilai 1

4. Setting user dan group untuk hadoop


root@hadoop3:/home/hadoop# addgroup hadoopgroup
root@hadoop3:/home/hadoop# adduser --ingroup hadoopgroup hadoopuser
Login sebagai hadoop user: su hadoopuser pada hadoop1 server
Buat ras key untuk ssh, lakukan di server hadoop1: ssh-keygen -t rsa -P ""
Buat agar key rsa terotorisasi untuk akses ssh tanpa password, lakukan di server hadoop1: cat
/home/hadoopuser/.ssh/id_rsa.pub >> /home/hadoopuser/.ssh/authorized_keys
Ubah permission untuk file key, lakukan di server hadoop1: chmod 600
/home/hadoopuser/.ssh/authorized_keys
Copykan authorized_keys ke hadoop2 dan hadoop3, sekarang kita bisa ssh tanpa password dari
server hadoop1 ke server hadoop2 dan hadoop3:

hadoopuser@hadoop1:~/.ssh$ ssh hadoop2


Welcome to Ubuntu 14.04.2 LTS (GNU/Linux 3.16.0-30-generic x86_64)

* Documentation: https://help.ubuntu.com/

System information as of Tue Aug 2 13:49:28 WIB 2016

System load: 0.0 Processes: 92


Usage of /: 4.7% of 36.29GB Users logged in: 2
Memory usage: 4% IP address for eth0: 167.205.64.100
Swap usage: 0%

Graph this data and manage this system at:


https://landscape.canonical.com/

New release '16.04.1 LTS' available.


Run 'do-release-upgrade' to upgrade to it.

Last login: Tue Aug 2 13:49:29 2016 from hadoop1.itb.ac.id

Big data Technical implementation Page 2


5. Install binary hadoop pada ketiga server
cd /home/hadoopuser/
Wget http://www-eu.apache.org/dist/hadoop/common/hadoop-2.6.3/hadoop-2.6.3.tar.gz
Tar xzvf hadoop-2.6.3.tar.gz
Tambahkan baris berikut pada file .bashrc di masing-masing server:

# Set HADOOP_HOME
HADOOP_HOME=/home/hadoopuser/hadoop
export HADOOP_HOME
# Set JAVA_HOME
JAVA_HOME=/usr/lib/jvm/java-7-oracle
export JAVA_HOME
# Add Hadoop bin and sbin directory to PATH
export PATH=$PATH:$HADOOP_HOME/bin;$HADOOP_HOME/sbin

6. Ubah konfigurasi
Konfigurasi core-site.xml di setiap node, arahkan ke server master

<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoopuser/tmp</value>
<description>Temporary Directory.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop1:9000</value>
<description>Use HDFS as file storage engine</description>
</property>

Konfigurasi mapred-site.xml di master node saja:


<configuration>
<property>
<name>mapreduce.jobtracker.address</name>
<value>hadoop1:9000</value>
<description>The host and port that the MapReduce job tracker runs
at. If .local., then jobs are run in-process as a single map
and reduce task.
</description>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<description>The framework for running mapreduce jobs</description>
</property>
Konfigurasi hdfs-site.xml di master dan slave:
<property>
<name>dfs.replication</name>

Big data Technical implementation Page 3


<name>dfs.replication</name>
<value>2</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/hadoop-data/hadoopuser/hdfs/namenode</value>
<description>Determines where on the local filesystem the DFS name node should store the
name table(fsimage). If this is a comma-delimited list of directories then the name table is
replicated in all of the directories, for redundancy.
</description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/hadoop-data/hadoopuser/hdfs/datanode</value>
<description>Determines where on the local filesystem an DFS data node should store its
blocks. If this is a comma-delimited list of directories, then data will be stored in all named
directories, typically on different devices. Directories that do not exist are ignored.
</description>
</property>
Tambahkan konfigurasi pada yarn-site.xml, untuk master dan slave:
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hadoop1:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>hadoop1:8032</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>hadoop1:8088</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hadoop1:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>hadoop1:8033</value>
</property>
Isi file hadoop/etc/hadoop/slaves pada master dengan baris berikut:
hadoop1
hadoop2
Hadoop3

Big data Technical implementation Page 4


Tambahkan baris berikut: export JAVA_HOME=/usr/lib/jvm/java-7-oracle
Pada file konfigurasi hadoop-site.env
Format namenode hanya pada master saja: hdfs namenode -format dan kemudian akan tercipta
folder /hadoop-data/

Start DFS pada master: hadoop/sbin/start-dfs.sh, jika sukses akan ada output seperti ini:

hadoopuser@hadoop1:~/hadoop/sbin$ ./start-dfs.sh
Starting namenodes on [hadoop1]
hadoop1: starting namenode, logging to /home/hadoopuser/hadoop/logs/hadoop-hadoopuser-
namenode-hadoop1.out
hadoop2: starting datanode, logging to /home/hadoopuser/hadoop/logs/hadoop-hadoopuser-
datanode-hadoop2.out
hadoop3: starting datanode, logging to /home/hadoopuser/hadoop/logs/hadoop-hadoopuser-
datanode-hadoop3.out
hadoop1: starting datanode, logging to /home/hadoopuser/hadoop/logs/hadoop-hadoopuser-
datanode-hadoop1.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /home/hadoopuser/hadoop/logs/hadoop-
hadoopuser-secondarynamenode-hadoop1.out

Start yarn di master node: /home/hadoopuser/hadoop/sbin/start-yarn.sh, outputnya seperti


ini:
hadoopuser@hadoop3:~$ /home/hadoopuser/hadoop/sbin/start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /home/hadoopuser/hadoop/logs/yarn-hadoopuser-
resourcemanager-hadoop3.out
The authenticity of host 'localhost (::1)' can't be established.
ECDSA key fingerprint is 6d:a0:30:ae:d4:e8:03:c5:5c:d6:c1:fb:53:a4:a0:ef.
Are you sure you want to continue connecting (yes/no)? yes
localhost: Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
hadoopuser@localhost's password: dan jika dilihat proses yang ada akan terlihat seperti ini:

hadoopuser@hadoop3:~$ ps ax | grep node


6745 ? Sl 0:04 /usr/lib/jvm/java-7-oracle/bin/java -Dproc_nodemanager -Xmx1000m -
Dhadoop.log.dir=/home/hadoopuser/hadoop/logs -
Dyarn.log.dir=/home/hadoopuser/hadoop/logs -Dhadoop.log.file=yarn-hadoopuser-
nodemanager-hadoop3.log -Dyarn.log.file=yarn-hadoopuser-nodemanager-hadoop3.log -
Dyarn.home.dir= -Dyarn.id.str=hadoopuser -Dhadoop.root.logger=INFO,RFA -
Dyarn.root.logger=INFO,RFA -Djava.library.path=/home/hadoopuser/hadoop/lib/native -
Dyarn.policy.file=hadoop-policy.xml -server -Dhadoop.log.dir=/home/hadoopuser/hadoop/logs -
Dyarn.log.dir=/home/hadoopuser/hadoop/logs -Dhadoop.log.file=yarn-hadoopuser-
nodemanager-hadoop3.log -Dyarn.log.file=yarn-hadoopuser-nodemanager-hadoop3.log -
Dyarn.home.dir=/home/hadoopuser/hadoop -Dhadoop.home.dir=/home/hadoopuser/hadoop -
Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -
Djava.library.path=/home/hadoopuser/hadoop/lib/native -classpath
/home/hadoopuser/hadoop/etc/hadoop:/home/hadoopuser/hadoop/etc/hadoop:/home/hadoo
puser/hadoop/etc/hadoop:/home/hadoopuser/hadoop/share/hadoop/common/lib/
*:/home/hadoopuser/hadoop/share/hadoop/common/
*:/home/hadoopuser/hadoop/share/hadoop/hdfs:/home/hadoopuser/hadoop/share/hadoop/h

Big data Technical implementation Page 5


*:/home/hadoopuser/hadoop/share/hadoop/hdfs:/home/hadoopuser/hadoop/share/hadoop/h
dfs/lib/*:/home/hadoopuser/hadoop/share/hadoop/hdfs/
*:/home/hadoopuser/hadoop/share/hadoop/yarn/lib/
*:/home/hadoopuser/hadoop/share/hadoop/yarn/
*:/home/hadoopuser/hadoop/share/hadoop/mapreduce/lib/
*:/home/hadoopuser/hadoop/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/
*.jar:/contrib/capacity-scheduler/*.jar:/home/hadoopuser/hadoop/share/hadoop/yarn/
*:/home/hadoopuser/hadoop/share/hadoop/yarn/lib/
*:/home/hadoopuser/hadoop/etc/hadoop/nm-config/log4j.properties
org.apache.hadoop.yarn.server.nodemanager.NodeManager

Ketikan JPS disetiap node untuk memastikan node bekerja, hasilnya akan terlihat seperti ini:
hadoopuser@hadoop2:~/hadoop/sbin$ jps
7433 NodeManager
7573 Jps

hadoopuser@hadoop3:~$ jps
2175 ResourceManager
7347 NodeManager
7479 Jps

hadoopuser@hadoop1:~/hadoop/sbin$ jps
13417 Jps
12800 SecondaryNameNode
13100 NodeManager
12963 ResourceManager
hadoopuser@hadoop1:~/hadoop/sbin$
Hasil netstat

hadoopuser@hadoop1:~/hadoop/sbin$ netstat -lpten | grep java


(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 0.0.0.0:59381 0.0.0.0:* LISTEN 1001 42625 5802/java
tcp 0 0 0.0.0.0:50070 0.0.0.0:* LISTEN 1001 36990 5143/java
tcp 0 0 127.0.0.1:8088 0.0.0.0:* LISTEN 1001 38694 5656/java
tcp 0 0 0.0.0.0:13562 0.0.0.0:* LISTEN 1001 43419 5802/java
tcp 0 0 0.0.0.0:50010 0.0.0.0:* LISTEN 1001 37967 5312/java
tcp 0 0 0.0.0.0:50075 0.0.0.0:* LISTEN 1001 37974 5312/java
tcp 0 0 127.0.0.1:8030 0.0.0.0:* LISTEN 1001 38491 5656/java
tcp 0 0 127.0.0.1:8031 0.0.0.0:* LISTEN 1001 38485 5656/java
tcp 0 0 127.0.0.1:8032 0.0.0.0:* LISTEN 1001 38495 5656/java
tcp 0 0 127.0.0.1:8033 0.0.0.0:* LISTEN 1001 38696 5656/java
tcp 0 0 0.0.0.0:50020 0.0.0.0:* LISTEN 1001 37387 5312/java
tcp 0 0 0.0.0.0:8040 0.0.0.0:* LISTEN 1001 42632 5802/java
tcp 0 0 127.0.0.1:9000 0.0.0.0:* LISTEN 1001 35588 5143/java
tcp 0 0 0.0.0.0:8042 0.0.0.0:* LISTEN 1001 43420 5802/java
tcp 0 0 0.0.0.0:50090 0.0.0.0:* LISTEN 1001 38308 5507/java

root@hadoop2:/home/hadoopuser# netstat -lpten | grep java


tcp 0 0 0.0.0.0:13562 0.0.0.0:* LISTEN 1001 14691 1324/java
tcp 0 0 0.0.0.0:8040 0.0.0.0:* LISTEN 1001 16033 1324/java
tcp 0 0 0.0.0.0:8042 0.0.0.0:* LISTEN 1001 14692 1324/java

Big data Technical implementation Page 6


tcp 0 0 0.0.0.0:8042 0.0.0.0:* LISTEN 1001 14692 1324/java
tcp 0 0 0.0.0.0:56010 0.0.0.0:* LISTEN 1001 14684 1324/java

hadoopuser@hadoop3:~$ netstat -lpten | grep java


(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 127.0.1.1:8088 0.0.0.0:* LISTEN 1001 15909 1471/java
tcp 0 0 0.0.0.0:13562 0.0.0.0:* LISTEN 1001 14595 1315/java
tcp 0 0 0.0.0.0:59387 0.0.0.0:* LISTEN 1001 15845 1315/java
tcp 0 0 127.0.1.1:8030 0.0.0.0:* LISTEN 1001 15901 1471/java
tcp 0 0 127.0.1.1:8031 0.0.0.0:* LISTEN 1001 14863 1471/java
tcp 0 0 127.0.1.1:8032 0.0.0.0:* LISTEN 1001 15905 1471/java
tcp 0 0 127.0.1.1:8033 0.0.0.0:* LISTEN 1001 14872 1471/java
tcp 0 0 0.0.0.0:8040 0.0.0.0:* LISTEN 1001 15850 1315/java
tcp 0 0 0.0.0.0:8042 0.0.0.0:* LISTEN 1001 14596 1315/java

Pastikan nodes sudah terlihat dengan mengetikan url: http://situsmaster:8088/cluster/nodes

Coba submitt job:

hadoopuser@hadoop1:~/hadoop/sbin$ hadoop jar


/home/hadoopuser/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.3.jar
pi 30 100
Number of Maps = 30
Samples per Map = 100
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8

Big data Technical implementation Page 7


Wrote input for Map #8
Wrote input for Map #9
Wrote input for Map #10
Wrote input for Map #11
Wrote input for Map #12
Wrote input for Map #13
Wrote input for Map #14
Wrote input for Map #15
Wrote input for Map #16
Wrote input for Map #17
Wrote input for Map #18
Wrote input for Map #19
Wrote input for Map #20
Wrote input for Map #21
Wrote input for Map #22
Wrote input for Map #23
Wrote input for Map #24
Wrote input for Map #25
Wrote input for Map #26
Wrote input for Map #27
Wrote input for Map #28
Wrote input for Map #29
Starting Job
16/08/03 14:09:59 INFO client.RMProxy: Connecting to ResourceManager at
hadoop1/127.0.0.1:8032
16/08/03 14:09:59 INFO input.FileInputFormat: Total input paths to process : 30
16/08/03 14:09:59 INFO mapreduce.JobSubmitter: number of splits:30
16/08/03 14:10:00 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_
1470207995468_0001
16/08/03 14:10:00 INFO impl.YarnClientImpl: Submitted application application_1470207995468
_0001
16/08/03 14:10:00 INFO mapreduce.Job: The url to track the job:
http://localhost:8088/proxy/application_1470207995468_0001/
16/08/03 14:10:00 INFO mapreduce.Job: Running job: job_1470207995468_0001
16/08/03 14:10:06 INFO mapreduce.Job: Job job_1470207995468_0001 running in uber mode :
false
16/08/03 14:10:06 INFO mapreduce.Job: map 0% reduce 0%
16/08/03 14:10:27 INFO mapreduce.Job: map 17% reduce 0%
16/08/03 14:10:28 INFO mapreduce.Job: map 20% reduce 0%
16/08/03 14:10:45 INFO mapreduce.Job: map 37% reduce 0%
16/08/03 14:10:46 INFO mapreduce.Job: map 40% reduce 0%
16/08/03 14:10:59 INFO mapreduce.Job: map 43% reduce 0%
16/08/03 14:11:03 INFO mapreduce.Job: map 57% reduce 0%
16/08/03 14:11:04 INFO mapreduce.Job: map 57% reduce 17%
16/08/03 14:11:08 INFO mapreduce.Job: map 57% reduce 19%
16/08/03 14:11:10 INFO mapreduce.Job: map 60% reduce 19%
16/08/03 14:11:14 INFO mapreduce.Job: map 67% reduce 20%
16/08/03 14:11:15 INFO mapreduce.Job: map 70% reduce 20%
16/08/03 14:11:16 INFO mapreduce.Job: map 73% reduce 20%
16/08/03 14:11:17 INFO mapreduce.Job: map 73% reduce 24%
16/08/03 14:11:22 INFO mapreduce.Job: map 77% reduce 24%
16/08/03 14:11:23 INFO mapreduce.Job: map 77% reduce 26%
16/08/03 14:11:24 INFO mapreduce.Job: map 80% reduce 26%
16/08/03 14:11:25 INFO mapreduce.Job: map 87% reduce 26%

Big data Technical implementation Page 8


16/08/03 14:11:25 INFO mapreduce.Job: map 87% reduce 26%
16/08/03 14:11:26 INFO mapreduce.Job: map 90% reduce 29%
16/08/03 14:11:29 INFO mapreduce.Job: map 90% reduce 30%
16/08/03 14:11:30 INFO mapreduce.Job: map 93% reduce 30%
16/08/03 14:11:31 INFO mapreduce.Job: map 97% reduce 30%
16/08/03 14:11:32 INFO mapreduce.Job: map 100% reduce 67%
16/08/03 14:11:33 INFO mapreduce.Job: map 100% reduce 100%
16/08/03 14:11:33 INFO mapreduce.Job: Job job_1470207995468_0001 completed successfully
16/08/03 14:11:33 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=666
FILE: Number of bytes written=3300274
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=8060
HDFS: Number of bytes written=215
HDFS: Number of read operations=123
HDFS: Number of large read operations=0
HDFS: Number of write operations=3
Job Counters
Launched map tasks=30
Launched reduce tasks=1
Data-local map tasks=30
Total time spent by all maps in occupied slots (ms)=348614
Total time spent by all reduces in occupied slots (ms)=46397
Total time spent by all map tasks (ms)=348614
Total time spent by all reduce tasks (ms)=46397
Total vcore-milliseconds taken by all map tasks=348614
Total vcore-milliseconds taken by all reduce tasks=46397
Total megabyte-milliseconds taken by all map tasks=356980736
Total megabyte-milliseconds taken by all reduce tasks=47510528
Map-Reduce Framework
Map input records=30
Map output records=60
Map output bytes=540
Map output materialized bytes=840
Input split bytes=4520
Combine input records=0
Combine output records=0
Reduce input groups=2
Reduce shuffle bytes=840
Reduce input records=60
Reduce output records=0
Spilled Records=120
Shuffled Maps =30
Failed Shuffles=0
Merged Map outputs=30
GC time elapsed (ms)=2580
CPU time spent (ms)=10630
Physical memory (bytes) snapshot=8148742144
Virtual memory (bytes) snapshot=21373816832
Total committed heap usage (bytes)=5925502976
Shuffle Errors

Big data Technical implementation Page 9


Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=3540
File Output Format Counters
Bytes Written=97
Job Finished in 94.135 seconds
Estimated value of Pi is 3.14133333333333333333
Akses manajemen job: hadoopuser@hadoop1:~/hadoop/sbin$ lynx
http://hadoop1:8088/cluster/apps

Big data Technical implementation Page 10

Das könnte Ihnen auch gefallen