You are on page 1of 17

EMC World Big Data Lab 101 Guide for Isilon OneFS Clusters

2












Page intentionally left blank

3

Table of Contents



EMC World Big Data Lab 101 Guide for Isilon OneFS Clusters ....................................................................... 1
Lab Configuration ..................................................................................................................................................... 4
Lab1: Installing the first node in a cluster (1 node) .................................................................................................. 4
Lab2 : Configuring Anonymous File Sharing ......................................................................................................... 11
Study Questions: ........................................................................................................................................................ 13
OneFS : How Big Data is Done! ............................................................................................................................... 14




4


Lab Configuration

Lab Guide
This Lab Guide contains information and instructions for performing the labs during the OneFS 6.0 EMC World
Introduction to Isilon Big Data

Virtual Clusters
The Isilon Virtual Clusters are designed to provide you, the user with a very close approximation of configuring and
using a real Isilon IQ Cluster with OneFS 6.0. These virtual clusters run on the well-known VMware platform,
VMware Player (free), VMware Client and VMware Server. These products can be acquired from the VMware
website www.VMware.com. We provide you with a three node virtual cluster just as you would purchase a three
node hardware cluster in your enterprise. This lab is designed to show you how to configure the virtual cluster as
you would the actual hardware cluster. It uses the same steps and processes.

I nitial Connectivity
The normal way clusters are configured is using a serial connection to the first node. You would configure the initial
parameters and start the cluster join process. This simulated lab will demonstrate how to configure a single node
cluster, with the normal serial display redirected to your laptop output.

Text Conventions
Displayed text
a. Entered text
b. Single Keys such as the Enter key
c. Windows Dialog box
d. Windows button

Lab1: Installing the first node in a cluster (1 node)

Goal:
In this lab, you and your lab partner AKA participant1 and participant2, will navigate through the steps of
configuring a cluster from scratch using the simulated serial port. The team will log into the cluster and configure
the node using the steps outlined below. After the cluster node restarts you will then use both the Web
Administration Interface and the command line to navigate around the one node cluster. The user id for the
simulated serial, ssh and administrative web logins will be root and the password will be a.

Activity:
Step 1: Initial node configuration
Each pair will configure your initial node. If you were at a customer site youd need to serially connect to the cluster
to configure the first node of your cluster.



1. Click on the VMware Workstation Node 1 Tab. Click into the black space area and you can enter information.
To leave hold the CTRL + ALT keys together and you are back on the PC interface.
2. Were going to create a new cluster. Select Create a new cluster by typing 1 and pressing the Enter key
(Note: if you mess up anything you can always type back to step back a step in the wizard)

5
3. At the prompt Please enter new password for root:, type a (without quotes) and press Enter
4. Enter a again when prompted (Password changed.)
5. At the prompt Please enter new password for admin, type a (without quotes) and press Enter
6. Enter a again when prompted (Password changed.)
7. Would you like to enable SupportIQ? [yes] type yes and press Enter
8. Please enter company name: type EMCWorld and press Enter
9. Please enter contact name : type BigDataUser and press Enter
10. Please enter contact phone: type 12065551212 and press Enter
11. Please enter contact email: type EMCWorldUser@isilon.com and press Enter


12. Enter a new name for the cluster: BigData
13. You need to set the Character Encoding for the cluster (Note: if you mess up anything you can always type
back to step back a step in the wizard)
a. Select the default UTF-8 by just pressing Enter
14. You are now going to configure the Internal-A interface, which is used for inter-node communications. On a
normal cluster this would be the Infiniband connections.
a. Select Configure Netmask by typing 1 and pressing Enter
b. Type 255.255.255.0 and press Enter (Example below)
c. Select Configure int-a IP ranges by typing 3 and pressing Enter (Example below)


6



d. Select Add an IP Range by typing 1 and pressing Enter
e. Use the following values for the Low and High Addresses:
1. Low IP address: Type 192.168.10.1 and press Enter
2. High IP address: Type 192.168.10.254 and press Enter
Example where 192.168.10.1 ~ 192.168.10.254

f. Now that these are set, select Keep current IP ranges: by just pressing Enter



g. Now that you are done creating all the IP ranges needed for this Internal-A interface, select Keep the
current configuration by just pressing Enter
h. After entering information for Internal-A, you will be presented with options to configure Internal-A or
Internal-B. You are only going to be using one internal interface in the lab.
1. Now that you are done with internal networking, select Finished with internal interfaces by just
pressing Enter

Select the internal interfaces to configure

7
[ 1] int-a - primary internal interface
[ 2] int-b - secondary internal interface (failover)
[Enter] Exit configuring internal interfaces
Configure internal interfaces >>> press Enter

15. You are now going to configure the External Subnet, which is used by all the clients to access data stored on the
cluster.
a. Select Configure ext-1 by typing 1 and pressing Enter
b. Select Set Netmask by typing 1 and pressing Enter
c. Type 255.255.255.0 and press Enter.
d. Leave MTU as 1500. This is the Ethernet frame transfer unit
e. Select Set IP Ranges by typing 3 and pressing Enter
f. Select Add an IP Range by typing 1 and pressing Enter
g. Use the following values for the Low and High Address:
1. Low IP address: Type 192.168.1.10 and press Enter
2. High IP address: Type 192.168.1.20 and press Enter
Example: 192.168.1.10 ~ 192.168.1.20



h. Select Keep current configuration by just pressing Enter
i. Now that these are set, select Keep current IP ranges: by just pressing Enter

16. Now you need to specify a gateway (IP network router) for the external network
a. Type 192.168.1.1 and press Enter
17. Now you can optionally configure SmartConnect setting. You are not going to do this now.
a. Select Keep current SmartConnect settings by just pressing Enter
18. Now you need to specify DNS Settings
a. Select Domain Name Servers by typing 1 and pressing Enter
b. Specify one or more DNS server IP addresses separated by commas in the order you would like them used.
1. Type 192.168.1.50 and press Enter
c. Select DNS Search List by typing 2 and pressing Enter
d. Specify one or more DNS domains to search separated by commas in the order you would like the
searched.
1. Type emcworld.isilon.com and press Enter
e. Now that this is set, select Keep current DNS settings by just pressing Enter
(Example below)


8



19. After entering information for External Subnet, you will be presented with options to configure ext-1 to change
anything you have just setup or to Finish configuring External Subnet
a. Select Exit configuring external network by just pressing Enter
20. Now you need to specify the date and time for the cluster
a. Select Configure timezone by typing 1 and pressing Enter



b. Normally, you would select your local timezone, but for these labs select
[5] Pacific Time Zone (PST) since the Terminal Server is in the PST timezone by typing the number next to
PST and pressing Enter
c. Time will reset to the new timezone on these systems.
d. Now that you are done with time configuration, select Keep current date and time: by just pressing
Enter
21. Now you need to specify the cluster add node setting
a. Select Manual join by typing 1 and pressing Enter
22. You will be shown a summary of all the settings you have configured. Review this information. If you need to
go back to fix a setting, type back to be taken to the previous screen. The network information displayed
refers to the internal (inter-node) network.



9


a. When prompted to Commit Changes?, type yes and press Enter



23. The node will take approximately 60 ~ 120 seconds to finish the boot process. After that time, a login prompt
will be presented, as in the example below. Your system is now ready to use. You can connect to your cluster
via ssh to get to the CLI or via a web browser to get to the Administrative Web Interface. You will do both
next.



Step 2: Connecting to the cluster Administrative Web Interface
All participant pairs will login to the Administrative Web Interface at the same time.
1. Open a web browser
2. Enter the IP address of node 1 of the cluster just built http://192.168.1.10.
3. Accept the SSL security warnings
a. If you see a page with a directory listing, scroll to the bottom and click on the Isilon Web UI: link
b. If you dont get either of these, try to go directly to the web admin interface by opening your web browser
and connecting to: https:// 192.168.1.10:8080
(For future reference, please note any node in the cluster will display the same information)
4. Enter the username root or admin with password a


10



5. From the login page, type root for Username and a for the password and click the Log in button (the
admin user could be used as well)
6. Once logged in, you will be taken to the Cluster Status screen
7. Verify that the node you configured is healthy and has the correct IP address displayed.

Step 3: Connecting to the cluster command line
All participant pairs will login to the command line at the same time.
1. Open a standard ssh connection
a. If using some other Unix / Linux shell system # ssh 192.168.1.10
i. Using putty (Windows)
ii. Host Name (or IP address): 192.168.1.10
Port (leave as the standard 22)
iii. Connection type: SSH
b. Login using the root account and the password you set earlier in the lab (step 1 item 5)
2. At this time you should be in a shell prompt
3. Type the command isi status (or isi stat for short), and press Enter.
This will display cluster status information for your single node cluster.



4. Type the command isi config (or isi conf for short), and press Enter
This is an interactive shell allowing the administrator access to some of the same tasks available via the
Administrative Web Interface. To see a list of available commands type help and press Enter at the
prompt. Type quit to get out of the isi config shell. Now type isi and press enter. You will get a list of all

11
the isi commands. These commands allow you to do almost all the things that can be done from the
Administrative Web Interface and more. If you want more detail on these commands, type man isi and press
enter.




Lab2 : Configuring Anonymous File Sharing
The participants will connect anonymously to the cluster and configure CIFS volumes (drive letters) and NFS
mounts using the lab infrastructure. Each participant will create their own subdirectories and mounts for their
data and viewing pleasure. The participants will view data through the Administrative Web interface while data
is being copied onto the cluster. The participants will view and edit their share properties.

Task1: Connecting clients to the cluster via CIFS

Goal:
In this lab, participants, will connect clients via CIFS, and create a new Windows share.

Activity:
Step 1: Connect to the default Windows ifs share on two nodes
The Isilon Systems cluster comes with a default Windows Share ifs. Later in this lab, we will show how to view
and edit the properties of the share, but for now lets just connect:
1. Connect the Windows client to the first node in the cluster.
a. From the Start menu select My Computer
b. From the Tools menu select Map Network Drive
c. A Map Network Drive dialog box will appear:

12




d. For the Drive leave as is or select a drive letter
e. For the Folder options enter the following:
\\192.168.1.10\ifs
f. Click the Finish button
You should now see the drive you mapped listed in the My Computer screen.
g. Double click on the drive to open
h. Open the data directory
i. Create a new directory inside of data, called participantX (X is the
participant number assigned by the trainer)
j. Open a Windows folder (or any content you choose) from the laptop onto the participantX directory.
This will copy the content into that directory on the cluster
k. While the copy operation is in progress, go to the Administrative Web Interface to view the throughput
statistics from the Status > Cluster Status page
l. Under the Status section for the cluster, note that while only one node is showing throughput the capacity
used for all nodes is incrementing
2. Connect the Windows client to the second node in the cluster
a. Following the same method as above, open up My Computer and from the tools menu select Map
Network Drive
b. In the Folder option enter the following: \\192.168.1.10\ifs
c. There should now be two mapped drives in the My Computer screen.
d. Note that the size in TB for each network drive is the same
e. Open the network drive connected to the second node, and go into the data directory

Step 2: Viewing the properties of a Windows share, creating a new one and mapping the new share
1. To view the properties of a share in the Administrative Web Interface
a. Login to the Administrative Web Interface as root or admin
b. In the drop down menu select File Sharing > CIFS - Windows File Sharing > Shares
c. Now there should be a listing of all available Windows Shares
i. Click on the Edit link next to ifs
2. To create a new share
a. Login to the Administrative Web Interface as root or admin
b. In the drop down menu select File Sharing > CIFS - Windows File Sharing > Add Share

13

c. Enter the following information
i. In the Share name:, type participantX
ii. The Description: field can be left blank for this lab
iii. In the Directory to share:, click the Browse button
iv. In the pop-up window, browse to the participantX directory just created by clicking on the + icon
on data, clicking on your directory and click OK
d. Click Submit to create the share
e. The File Sharing > CIFS - Windows File Sharing > Shares page should be displayed showing you all
the shares on the system
3. Map to the new share
a. From the Start menu select My Computer
b. From the Tools menu select Map Network Drive
c. For the Folder entry, enter \\192.168.1.10\participantX
d. Click Finish
e. Open the new share, and view the content directory inside

Study Questions:
1. When mapped drives to shares from different nodes, was it the same data?
2. Was the end user experience affected by using the cluster vs. a standard Windows Server share?



14

OneFS : How Big Data is Done!



OneFS is a distributed clustered file system that runs on all nodes in the cluster. There is no master or controlling
node in the cluster all nodes are peers and share in the workload. As nodes are added, the file system grows
dynamically and content is evenly distributed to every node. Because all information is shared among nodes, the
entire file system is accessible by clients connecting to any node in the cluster.


OneFS stripes data across nodes and disks. During a write, the system breaks data into smaller logical sections
called stripes and then logically places the data in a stripe unit. As the system lays data across the cluster, it fills the

15
stripe units until the maximum width of the cluster is reached. Each OneFS block is 8 KB, and a stripe unit consists
of 16 blocks, for a total of 128 KB per stripe unit.
OneFS uses advanced data layout algorithms to determine data layout for maximum efficiency and performance.
Data is evenly distributed across nodes as it is written. The system can continuously reallocate data and make
storage space more usable and efficient. Depending on the file size and the stripe width (determined by the number
of nodes), as the cluster size increases, the system stores large files more efficiently.


OneFS protects stripes with either parity, also known as error correction code (ECC), or mirroring. The process of
creating parity starts breaking a file down into chunks, then a value is calculated for each chunk. The value of the
chunks is then added together and the sum of those values is the parity value.
The steps to this are as follows:
Step 1 - Files are broken into smaller sections called stripes
Step 2 - Stripes are broken into even smaller pieces called chunks. A parity chunk is calculated
Step 3 - Each chunk including the parity chunk is then written to a separate device (for Raid5 a HD for Isilon a
Node)
Step 4 - If a hard drive is lost what happens?


16



Step 5 - The values of the remaining data are gathered.
Step 6 - The missing value is calculated, which is the parity value minus the remaining data values.
Step 7 - The calculated values are used to make recreate the missing stripes.
Step 8 - Stripes are recombined to make the file without any action or even the knowledge of the end user.



The Isilon clustered storage system provides a proprietary system called FlexProtect, which detects and repairs files
and directories that are in a degraded state. Isilon FlexProtect protects data in the cluster, rebuilding failed disks in
the event of a failure, using free storage space across the entire cluster to further prevent data loss, and monitoring
and preemptively migrating data off of at-risk components.

17

FlexProtect distributes all data and error-correction information across the entire Isilon cluster and ensures that all
data remains intact and accessible even in the event of simultaneous component failures. Protection settings can be
changed without taking the cluster or file system offline. Protection is applied at the file not the block level and
OneFS allow different protection level on directories.



OneFS supports N+1, N+2:1, N+2, N+3:1, N+3, and N+4 data protection schemes, and up to 8x mirroring. For most
nodes, the default protection policy is N+1, which means that one drive, multiple drives within a node, or an entire
node can fail without causing any data loss. Optionally, you can enable N+2, N+3, or N+4 protection, which allows
the cluster to sustain two, three, or four simultaneous failures without causing data loss.

The default protection setting for the cluster is +2:1. Isilon provides recommendations on when to move to a higher
protection level based on the quantity and type of nodes.