Sie sind auf Seite 1von 15

0

Running Hadoop On Ubuntu Linux


1
Introduction
Single-Node Cluster
http://www.michael-
noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Single-Node_Cluster)
Multi-Node Cluster
http://www.michael-
noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)
Decommission
Issues
ITRI Cloud Storage System Architecture
Agenda
2
HDFS is highly fault-tolerant and is designed to
be deployed on low-cost hardware. HDFS
provides high throughput access to application
data and is suitable for applications that have
large data sets. HDFS relaxes a few POSIX
requirements to enable streaming access to file
system data.
Introduction
3
Introduction (cont)
HDFS Architecture (source:http://hadoop.apache.org/core/docs/current/hdfs_design.html
4
Introduction (cont)
HDFS multi-node overview (source:http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)
5
Introduction (cont)
HDFS multi-node cluster Architecture (source:http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)
6
Prerequisites
Sun JAVA 6
Add a hadoop system user
Configuration SSH public key authentication
Single-Node Cluster need to access localhost
Disabling IPv6
Hadoop installation
Configuration
<HADOOP_INSTALL>/conf/hadoop-env.sh
<HADOOP_INSTALL>/conf/core-site.xml
<HADOOP_INSTALL>/conf/mapred-site.xml
<HADOOP_INSTALL>/conf/hdfs-site.xml




Single-Node Cluster
7
Formatting the name node
<HADOOP_INSTALL>/hadoop/bin/hadoop namenode -format
Starting/Stop your single-node cluster
<HADOOP_INSTALL>/bin/start-all.sh
<HADOOP_INSTALL>/bin/stop-all.sh
Check hadoop processes are running
jps
Copy local example data to HDFS
<HADOOP_INSTALL>/ bin/hadoop dfs -copyFromLocal
/tmp/gutenberg gutenberg
<HADOOP_INSTALL>/ bin/hadoop dfs ls
<HADOOP_INSTALL>/ bin/hadoop dfs -ls gutenberg
Run the MapReduce job
bin/hadoop jar hadoop-0.20.2-examples.jar wordcount
gutenberg gutenberg-output







Single-Node Cluster(cont)
8
http://localhost:50030/ - web UI for MapReduce job
tracker(s)
http://localhost:50060/ - web UI for task tracker(s)
http://localhost:50070/ - web UI for HDFS name
node(s)






Single-Node Cluster(cont)
9
/etc/hosts
SSH access
Configuration
<HADOOP_INSTALL>/conf/masters
master
<HADOOP_INSTALL>/conf/slaves
master
slave
anotherslave01
anotherslave02
anotherslave03
<HADOOP_INSTALL>/conf/core-site.xml
<value>hdfs://master:54310</value>
<HADOOP_INSTALL>/conf/mapred-site.xml
<HADOOP_INSTALL>/conf/hdfs-site.xml

Multi-Node Cluster
10
Make a large cluster smaller by taking out a bunch of
nodes simultaneously. How can this be done?
Create a file excludes
slave97
slave98
slave99
Add configuration in <HADOOP_INSTALL>/conf/hadoop-
site.xml
<property>
<name>dfs.hosts.exclude</name>
<value>excludes</value>
</property>
<HADOOP_INSTALL>/bin/hadoop dfsadmin -refreshNodes
Decommission
11
NameNode backup
NameNode shutdown
DataNode shutdown
Add DataNode dynamically
Remove DataNode dynamically(Decommission?)
How to tune file/block size?
Big data testing
Issues
12
Cloud Storage System Architecture
HDFS
Client
HDFS
DataNode
HDFS
NameNode
HDFS
DataNode

iSCSI
Target
iSCSI
Initiator
VM
Volume
DMS
13
Read Flow
HDFS
Client
HDFS
DataNode
HDFS
NameNode
iSCSI
Target
iSCSI
Initiator
VM
Volume
I.1 I.2
I.4
1
4
5
6
I.5
I.3 I.4
DMS
2
3
14
Write Flow
HDFS
DataNode 1
HDFS
NameNode
1
4
5
6
HDFS
DataNode 2
7
VM (Domain-U)
HDFS
Client
iSCSI
Target
iSCSI
Initiator
VM
Volume
I.1 I.2
I.4
I.5
I.3 I.4 9
8
10
7.1 8.2
11
12
DMS
2
3

Das könnte Ihnen auch gefallen