Tutorial Installasi Hadoop PDF

Install sendiri
Tuesday, August 02, 2016 1:26 PM
1. Tambahkan list ip dan nama host di masing-masing file /etc/host pada server hadoop yang ada
167.205.64.109 hadoop1.unb.ac.id hadoop1
2. Install java di semua server hadoop, jangan lupa set proxy

root@hadoop1:/home/hadoop# apt-add-repository ppa:webupd8team/java
Oracle Java (JDK) Installer (automatically downloads and installs Oracle JDK7 / JDK8 / JDK9).
There are no actual Java files in this PPA.
More info (and Ubuntu installation instructions):

- for Oracle Java 7: http://www.webupd8.org/2012/01/install-oracle-java-jdk-7-in-ubuntu-
via.html
- for Oracle Java 8: http://www.webupd8.org/2012/09/install-oracle-java-8-in-ubuntu-via-
ppa.html
Debian installation instructions:

- Oracle Java 7: http://www.webupd8.org/2012/06/how-to-install-oracle-java-7-in-debian.html
- Oracle Java 8: http://www.webupd8.org/2014/03/how-to-install-oracle-java-8-in-debian.html
Oracle Java 9 (for both Ubuntu and Debian): http://www.webupd8.org/2015/02/install-oracle-

java-9-in-ubuntu-linux.html
For JDK9, the PPA uses standard builds from: https://jdk9.java.net/download/ (and not the
Jigsaw builds!).
Important!!! For now, you should continue to use Java 8 because Oracle Java 9 is available as an
early access release (it should be released in 2016)! You should only use Oracle Java 9 if you
explicitly need it, because it may contain bugs and it might not include the latest security
patches! Also, some Java options were removed in JDK9, so you may encounter issues with
various Java apps. More information and installation instructions (Ubuntu / Linux Mint / Debian):
http://www.webupd8.org/2015/02/install-oracle-java-9-in-ubuntu-linux.html
More info: https://launchpad.net/~webupd8team/+archive/ubuntu/java
Press [ENTER] to continue or ctrl-c to cancel adding it
gpg: keyring `/tmp/tmp508q25fn/secring.gpg' created

gpg: keyring `/tmp/tmp508q25fn/pubring.gpg' created
gpg: requesting key EEA14886 from hkp server keyserver.ubuntu.com
gpg: /tmp/tmp508q25fn/trustdb.gpg: trustdb created
gpg: key EEA14886: public key "Launchpad VLC" imported
gpg: no ultimately trusted keys found
gpg: Total number processed: 1
gpg: imported: 1 (RSA: 1)
OK
root@hadoop1:/home/hadoop# apt-get update
Big data Technical implementation Page 1

root@hadoop1:/home/hadoop# apt-get install oracle-java7-installer
root@hadoop1:/home/hadoop# update-java-alternatives -s java-7-oracle
3. Disable ipv6
Tambahkan baris berikut pada /etc/sysctl.conf
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
Lakukan hal ini:

hadoopuser@hadoop1:~$ cat /proc/sys/net/ipv6/conf/all/disable_ipv6
Dan hasilnya harus bernilai 1
4. Setting user dan group untuk hadoop

root@hadoop3:/home/hadoop# addgroup hadoopgroup
root@hadoop3:/home/hadoop# adduser --ingroup hadoopgroup hadoopuser
Login sebagai hadoop user: su hadoopuser pada hadoop1 server
Buat ras key untuk ssh, lakukan di server hadoop1: ssh-keygen -t rsa -P ""
Buat agar key rsa terotorisasi untuk akses ssh tanpa password, lakukan di server hadoop1: cat
/home/hadoopuser/.ssh/id_rsa.pub >> /home/hadoopuser/.ssh/authorized_keys
Ubah permission untuk file key, lakukan di server hadoop1: chmod 600
/home/hadoopuser/.ssh/authorized_keys
Copykan authorized_keys ke hadoop2 dan hadoop3, sekarang kita bisa ssh tanpa password dari
server hadoop1 ke server hadoop2 dan hadoop3:
hadoopuser@hadoop1:~/.ssh$ ssh hadoop2

Welcome to Ubuntu 14.04.2 LTS (GNU/Linux 3.16.0-30-generic x86_64)
* Documentation: https://help.ubuntu.com/
System information as of Tue Aug 2 13:49:28 WIB 2016
System load: 0.0 Processes: 92

Usage of /: 4.7% of 36.29GB Users logged in: 2
Memory usage: 4% IP address for eth0: 167.205.64.100
Swap usage: 0%
Graph this data and manage this system at:

https://landscape.canonical.com/
New release '16.04.1 LTS' available.

Run 'do-release-upgrade' to upgrade to it.
Last login: Tue Aug 2 13:49:29 2016 from hadoop1.itb.ac.id

5. Install binary hadoop pada ketiga server
cd /home/hadoopuser/
Wget http://www-eu.apache.org/dist/hadoop/common/hadoop-2.6.3/hadoop-2.6.3.tar.gz
Tar xzvf hadoop-2.6.3.tar.gz
Tambahkan baris berikut pada file .bashrc di masing-masing server:
# Set HADOOP_HOME
HADOOP_HOME=/home/hadoopuser/hadoop
export HADOOP_HOME
# Set JAVA_HOME
JAVA_HOME=/usr/lib/jvm/java-7-oracle
export JAVA_HOME
# Add Hadoop bin and sbin directory to PATH
export PATH=$PATH:$HADOOP_HOME/bin;$HADOOP_HOME/sbin
6. Ubah konfigurasi
Konfigurasi core-site.xml di setiap node, arahkan ke server master
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoopuser/tmp</value>
<description>Temporary Directory.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop1:9000</value>
<description>Use HDFS as file storage engine</description>
</property>
Konfigurasi mapred-site.xml di master node saja:

<configuration>
<property>
<name>mapreduce.jobtracker.address</name>
<value>hadoop1:9000</value>
<description>The host and port that the MapReduce job tracker runs
at. If .local., then jobs are run in-process as a single map
and reduce task.
</description>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<description>The framework for running mapreduce jobs</description>
</property>
Konfigurasi hdfs-site.xml di master dan slave:
<property>
<name>dfs.replication</name>

<name>dfs.replication</name>
<value>2</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/hadoop-data/hadoopuser/hdfs/namenode</value>
<description>Determines where on the local filesystem the DFS name node should store the
name table(fsimage). If this is a comma-delimited list of directories then the name table is
replicated in all of the directories, for redundancy.
</description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/hadoop-data/hadoopuser/hdfs/datanode</value>
<description>Determines where on the local filesystem an DFS data node should store its
blocks. If this is a comma-delimited list of directories, then data will be stored in all named
directories, typically on different devices. Directories that do not exist are ignored.
</description>
</property>
Tambahkan konfigurasi pada yarn-site.xml, untuk master dan slave:
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
</property>
<property>
<name>yarn.resourcemanager.address</name>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
</property>
Isi file hadoop/etc/hadoop/slaves pada master dengan baris berikut:
hadoop1
hadoop2
Hadoop3

Tambahkan baris berikut: export JAVA_HOME=/usr/lib/jvm/java-7-oracle
Pada file konfigurasi hadoop-site.env
Format namenode hanya pada master saja: hdfs namenode -format dan kemudian akan tercipta
folder /hadoop-data/
Start DFS pada master: hadoop/sbin/start-dfs.sh, jika sukses akan ada output seperti ini:
hadoopuser@hadoop1:~/hadoop/sbin$ ./start-dfs.sh
Starting namenodes on [hadoop1]
hadoop1: starting namenode, logging to /home/hadoopuser/hadoop/logs/hadoop-hadoopuser-
namenode-hadoop1.out
hadoop2: starting datanode, logging to /home/hadoopuser/hadoop/logs/hadoop-hadoopuser-
datanode-hadoop2.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /home/hadoopuser/hadoop/logs/hadoop-
hadoopuser-secondarynamenode-hadoop1.out
Start yarn di master node: /home/hadoopuser/hadoop/sbin/start-yarn.sh, outputnya seperti

ini:
hadoopuser@hadoop3:~$ /home/hadoopuser/hadoop/sbin/start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /home/hadoopuser/hadoop/logs/yarn-hadoopuser-
resourcemanager-hadoop3.out
The authenticity of host 'localhost (::1)' can't be established.
ECDSA key fingerprint is 6d:a0:30:ae:d4:e8:03:c5:5c:d6:c1:fb:53:a4:a0:ef.
Are you sure you want to continue connecting (yes/no)? yes
localhost: Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
hadoopuser@localhost's password: dan jika dilihat proses yang ada akan terlihat seperti ini:
hadoopuser@hadoop3:~$ ps ax | grep node

6745 ? Sl 0:04 /usr/lib/jvm/java-7-oracle/bin/java -Dproc_nodemanager -Xmx1000m -
Dhadoop.log.dir=/home/hadoopuser/hadoop/logs -
Dyarn.log.dir=/home/hadoopuser/hadoop/logs -Dhadoop.log.file=yarn-hadoopuser-
nodemanager-hadoop3.log -Dyarn.log.file=yarn-hadoopuser-nodemanager-hadoop3.log -
Dyarn.home.dir= -Dyarn.id.str=hadoopuser -Dhadoop.root.logger=INFO,RFA -
Dyarn.root.logger=INFO,RFA -Djava.library.path=/home/hadoopuser/hadoop/lib/native -
Dyarn.policy.file=hadoop-policy.xml -server -Dhadoop.log.dir=/home/hadoopuser/hadoop/logs -
Dyarn.log.dir=/home/hadoopuser/hadoop/logs -Dhadoop.log.file=yarn-hadoopuser-
nodemanager-hadoop3.log -Dyarn.log.file=yarn-hadoopuser-nodemanager-hadoop3.log -
Dyarn.home.dir=/home/hadoopuser/hadoop -Dhadoop.home.dir=/home/hadoopuser/hadoop -
Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -
Djava.library.path=/home/hadoopuser/hadoop/lib/native -classpath
/home/hadoopuser/hadoop/etc/hadoop:/home/hadoopuser/hadoop/etc/hadoop:/home/hadoo
puser/hadoop/etc/hadoop:/home/hadoopuser/hadoop/share/hadoop/common/lib/
*:/home/hadoopuser/hadoop/share/hadoop/common/
*:/home/hadoopuser/hadoop/share/hadoop/hdfs:/home/hadoopuser/hadoop/share/hadoop/h

*:/home/hadoopuser/hadoop/share/hadoop/hdfs:/home/hadoopuser/hadoop/share/hadoop/h
dfs/lib/*:/home/hadoopuser/hadoop/share/hadoop/hdfs/
*:/home/hadoopuser/hadoop/share/hadoop/yarn/lib/
*:/home/hadoopuser/hadoop/share/hadoop/yarn/
*:/home/hadoopuser/hadoop/share/hadoop/mapreduce/lib/
*:/home/hadoopuser/hadoop/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/
*.jar:/contrib/capacity-scheduler/*.jar:/home/hadoopuser/hadoop/share/hadoop/yarn/
*:/home/hadoopuser/hadoop/share/hadoop/yarn/lib/
*:/home/hadoopuser/hadoop/etc/hadoop/nm-config/log4j.properties
org.apache.hadoop.yarn.server.nodemanager.NodeManager
Ketikan JPS disetiap node untuk memastikan node bekerja, hasilnya akan terlihat seperti ini:
hadoopuser@hadoop2:~/hadoop/sbin$ jps
7433 NodeManager
7573 Jps
hadoopuser@hadoop3:~$ jps
2175 ResourceManager
7347 NodeManager
7479 Jps
hadoopuser@hadoop1:~/hadoop/sbin$ jps
13417 Jps
12800 SecondaryNameNode
13100 NodeManager
12963 ResourceManager
hadoopuser@hadoop1:~/hadoop/sbin$
Hasil netstat
hadoopuser@hadoop1:~/hadoop/sbin$ netstat -lpten | grep java

(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 0.0.0.0:59381 0.0.0.0:* LISTEN 1001 42625 5802/java
tcp 0 0 0.0.0.0:50070 0.0.0.0:* LISTEN 1001 36990 5143/java
tcp 0 0 127.0.0.1:8088 0.0.0.0:* LISTEN 1001 38694 5656/java
tcp 0 0 0.0.0.0:13562 0.0.0.0:* LISTEN 1001 43419 5802/java
tcp 0 0 0.0.0.0:50010 0.0.0.0:* LISTEN 1001 37967 5312/java
tcp 0 0 0.0.0.0:50075 0.0.0.0:* LISTEN 1001 37974 5312/java
tcp 0 0 127.0.0.1:8030 0.0.0.0:* LISTEN 1001 38491 5656/java
tcp 0 0 127.0.0.1:8031 0.0.0.0:* LISTEN 1001 38485 5656/java
tcp 0 0 127.0.0.1:8032 0.0.0.0:* LISTEN 1001 38495 5656/java
tcp 0 0 127.0.0.1:8033 0.0.0.0:* LISTEN 1001 38696 5656/java
tcp 0 0 0.0.0.0:50020 0.0.0.0:* LISTEN 1001 37387 5312/java
tcp 0 0 0.0.0.0:8040 0.0.0.0:* LISTEN 1001 42632 5802/java
tcp 0 0 127.0.0.1:9000 0.0.0.0:* LISTEN 1001 35588 5143/java
tcp 0 0 0.0.0.0:8042 0.0.0.0:* LISTEN 1001 43420 5802/java
tcp 0 0 0.0.0.0:50090 0.0.0.0:* LISTEN 1001 38308 5507/java
root@hadoop2:/home/hadoopuser# netstat -lpten | grep java

tcp 0 0 0.0.0.0:13562 0.0.0.0:* LISTEN 1001 14691 1324/java
tcp 0 0 0.0.0.0:8040 0.0.0.0:* LISTEN 1001 16033 1324/java
tcp 0 0 0.0.0.0:8042 0.0.0.0:* LISTEN 1001 14692 1324/java

tcp 0 0 0.0.0.0:8042 0.0.0.0:* LISTEN 1001 14692 1324/java
tcp 0 0 0.0.0.0:56010 0.0.0.0:* LISTEN 1001 14684 1324/java
hadoopuser@hadoop3:~$ netstat -lpten | grep java

(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 127.0.1.1:8088 0.0.0.0:* LISTEN 1001 15909 1471/java
tcp 0 0 0.0.0.0:13562 0.0.0.0:* LISTEN 1001 14595 1315/java
tcp 0 0 0.0.0.0:59387 0.0.0.0:* LISTEN 1001 15845 1315/java
tcp 0 0 127.0.1.1:8030 0.0.0.0:* LISTEN 1001 15901 1471/java
tcp 0 0 127.0.1.1:8031 0.0.0.0:* LISTEN 1001 14863 1471/java
tcp 0 0 127.0.1.1:8032 0.0.0.0:* LISTEN 1001 15905 1471/java
tcp 0 0 127.0.1.1:8033 0.0.0.0:* LISTEN 1001 14872 1471/java
tcp 0 0 0.0.0.0:8040 0.0.0.0:* LISTEN 1001 15850 1315/java
tcp 0 0 0.0.0.0:8042 0.0.0.0:* LISTEN 1001 14596 1315/java
Pastikan nodes sudah terlihat dengan mengetikan url: http://situsmaster:8088/cluster/nodes
Coba submitt job:
hadoopuser@hadoop1:~/hadoop/sbin$ hadoop jar

/home/hadoopuser/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.3.jar
pi 30 100
Number of Maps = 30
Samples per Map = 100
Wrote input for Map #0

Starting Job
16/08/03 14:09:59 INFO client.RMProxy: Connecting to ResourceManager at
hadoop1/127.0.0.1:8032
16/08/03 14:09:59 INFO input.FileInputFormat: Total input paths to process : 30
16/08/03 14:09:59 INFO mapreduce.JobSubmitter: number of splits:30
16/08/03 14:10:00 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_
1470207995468_0001
16/08/03 14:10:00 INFO impl.YarnClientImpl: Submitted application application_1470207995468
_0001
16/08/03 14:10:00 INFO mapreduce.Job: The url to track the job:
http://localhost:8088/proxy/application_1470207995468_0001/
16/08/03 14:10:00 INFO mapreduce.Job: Running job: job_1470207995468_0001
16/08/03 14:10:06 INFO mapreduce.Job: Job job_1470207995468_0001 running in uber mode :
false
16/08/03 14:10:06 INFO mapreduce.Job: map 0% reduce 0%

16/08/03 14:11:33 INFO mapreduce.Job: Job job_1470207995468_0001 completed successfully
16/08/03 14:11:33 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=666
FILE: Number of bytes written=3300274
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=8060
HDFS: Number of bytes written=215
HDFS: Number of read operations=123
HDFS: Number of large read operations=0
HDFS: Number of write operations=3
Job Counters
Launched map tasks=30
Launched reduce tasks=1
Data-local map tasks=30
Total time spent by all maps in occupied slots (ms)=348614
Total time spent by all reduces in occupied slots (ms)=46397
Total time spent by all map tasks (ms)=348614
Total time spent by all reduce tasks (ms)=46397
Total vcore-milliseconds taken by all map tasks=348614
Total vcore-milliseconds taken by all reduce tasks=46397
Total megabyte-milliseconds taken by all map tasks=356980736
Total megabyte-milliseconds taken by all reduce tasks=47510528
Map-Reduce Framework
Map input records=30
Map output records=60
Map output bytes=540
Map output materialized bytes=840
Input split bytes=4520
Combine input records=0
Combine output records=0
Reduce input groups=2
Reduce shuffle bytes=840
Reduce input records=60
Reduce output records=0
Spilled Records=120
Shuffled Maps =30
Failed Shuffles=0
Merged Map outputs=30
GC time elapsed (ms)=2580
CPU time spent (ms)=10630
Physical memory (bytes) snapshot=8148742144
Virtual memory (bytes) snapshot=21373816832
Total committed heap usage (bytes)=5925502976
Shuffle Errors

Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=3540
File Output Format Counters
Bytes Written=97
Job Finished in 94.135 seconds
Estimated value of Pi is 3.14133333333333333333
Akses manajemen job: hadoopuser@hadoop1:~/hadoop/sbin$ lynx
http://hadoop1:8088/cluster/apps

Tutorial Installasi Hadoop PDF

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Tutorial Installasi Hadoop PDF

Hochgeladen von

Copyright:

Verfügbare Formate

Install sendiri

Tuesday, August 02, 2016 1:26 PM

2. Install java di semua server hadoop, jangan lupa set proxy

More info (and Ubuntu installation instructions):

Debian installation instructions:

Oracle Java 9 (for both Ubuntu and Debian): http://www.webupd8.org/2015/02/install-oracle-

gpg: keyring `/tmp/tmp508q25fn/secring.gpg' created

root@hadoop1:/home/hadoop# apt-get update

Big data Technical implementation Page 1

root@hadoop1:/home/hadoop# update-java-alternatives -s java-7-oracle

Lakukan hal ini:

4. Setting user dan group untuk hadoop

hadoopuser@hadoop1:~/.ssh$ ssh hadoop2

System information as of Tue Aug 2 13:49:28 WIB 2016

System load: 0.0 Processes: 92

Graph this data and manage this system at:

New release '16.04.1 LTS' available.

Last login: Tue Aug 2 13:49:29 2016 from hadoop1.itb.ac.id

Big data Technical implementation Page 2

Konfigurasi mapred-site.xml di master node saja:

Big data Technical implementation Page 3

Big data Technical implementation Page 4

Start yarn di master node: /home/hadoopuser/hadoop/sbin/start-yarn.sh, outputnya seperti

hadoopuser@hadoop3:~$ ps ax | grep node

Big data Technical implementation Page 5

hadoopuser@hadoop1:~/hadoop/sbin$ netstat -lpten | grep java

root@hadoop2:/home/hadoopuser# netstat -lpten | grep java

Big data Technical implementation Page 6

hadoopuser@hadoop3:~$ netstat -lpten | grep java

Pastikan nodes sudah terlihat dengan mengetikan url: http://situsmaster:8088/cluster/nodes

Coba submitt job:

hadoopuser@hadoop1:~/hadoop/sbin$ hadoop jar

Big data Technical implementation Page 7

Big data Technical implementation Page 8

Big data Technical implementation Page 9

Big data Technical implementation Page 10

Das könnte Ihnen auch gefallen