0 Bewertungen0% fanden dieses Dokument nützlich (0 Abstimmungen)
12 Ansichten15 Seiten
This document provides an overview of setting up Hadoop on Ubuntu Linux for both single-node and multi-node clusters. It discusses prerequisites, installation, configuration, starting and stopping the clusters, and basic usage of HDFS and MapReduce. It also briefly covers decommissioning nodes, backup processes, and tuning file sizes. Finally, it diagrams the architecture of an ITRI cloud storage system using Hadoop and iSCSI.
This document provides an overview of setting up Hadoop on Ubuntu Linux for both single-node and multi-node clusters. It discusses prerequisites, installation, configuration, starting and stopping the clusters, and basic usage of HDFS and MapReduce. It also briefly covers decommissioning nodes, backup processes, and tuning file sizes. Finally, it diagrams the architecture of an ITRI cloud storage system using Hadoop and iSCSI.
This document provides an overview of setting up Hadoop on Ubuntu Linux for both single-node and multi-node clusters. It discusses prerequisites, installation, configuration, starting and stopping the clusters, and basic usage of HDFS and MapReduce. It also briefly covers decommissioning nodes, backup processes, and tuning file sizes. Finally, it diagrams the architecture of an ITRI cloud storage system using Hadoop and iSCSI.
1 Introduction Single-Node Cluster http://www.michael- noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Single-Node_Cluster) Multi-Node Cluster http://www.michael- noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster) Decommission Issues ITRI Cloud Storage System Architecture Agenda 2 HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets. HDFS relaxes a few POSIX requirements to enable streaming access to file system data. Introduction 3 Introduction (cont) HDFS Architecture (source:http://hadoop.apache.org/core/docs/current/hdfs_design.html 4 Introduction (cont) HDFS multi-node overview (source:http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster) 5 Introduction (cont) HDFS multi-node cluster Architecture (source:http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster) 6 Prerequisites Sun JAVA 6 Add a hadoop system user Configuration SSH public key authentication Single-Node Cluster need to access localhost Disabling IPv6 Hadoop installation Configuration <HADOOP_INSTALL>/conf/hadoop-env.sh <HADOOP_INSTALL>/conf/core-site.xml <HADOOP_INSTALL>/conf/mapred-site.xml <HADOOP_INSTALL>/conf/hdfs-site.xml
Single-Node Cluster 7 Formatting the name node <HADOOP_INSTALL>/hadoop/bin/hadoop namenode -format Starting/Stop your single-node cluster <HADOOP_INSTALL>/bin/start-all.sh <HADOOP_INSTALL>/bin/stop-all.sh Check hadoop processes are running jps Copy local example data to HDFS <HADOOP_INSTALL>/ bin/hadoop dfs -copyFromLocal /tmp/gutenberg gutenberg <HADOOP_INSTALL>/ bin/hadoop dfs ls <HADOOP_INSTALL>/ bin/hadoop dfs -ls gutenberg Run the MapReduce job bin/hadoop jar hadoop-0.20.2-examples.jar wordcount gutenberg gutenberg-output
Single-Node Cluster(cont) 8 http://localhost:50030/ - web UI for MapReduce job tracker(s) http://localhost:50060/ - web UI for task tracker(s) http://localhost:50070/ - web UI for HDFS name node(s)
Multi-Node Cluster 10 Make a large cluster smaller by taking out a bunch of nodes simultaneously. How can this be done? Create a file excludes slave97 slave98 slave99 Add configuration in <HADOOP_INSTALL>/conf/hadoop- site.xml <property> <name>dfs.hosts.exclude</name> <value>excludes</value> </property> <HADOOP_INSTALL>/bin/hadoop dfsadmin -refreshNodes Decommission 11 NameNode backup NameNode shutdown DataNode shutdown Add DataNode dynamically Remove DataNode dynamically(Decommission?) How to tune file/block size? Big data testing Issues 12 Cloud Storage System Architecture HDFS Client HDFS DataNode HDFS NameNode HDFS DataNode