Sie sind auf Seite 1von 30

PRACTICAL GUIDE

OF BUILDING A HPC
CLUSTER
An Introduction


Hafizal Yazid, Megat Harun M.A, Azraf Azman,
Mohd Rizal M., Anwar A.R., Rafhayudi J.





First Edition

ii

Contents
Preface iv
1. Introduction 1
1.1 Strategy 1
1.2 Hardware Requirement 1
2. LINUX Environment and Commands 2
2.1 File System Structure and Paths 2
2.2 Basic Commands 3
3. Preparation Steps 4
3.1 Operating System Installation 4
3.2 Ethernet Connection 5
3.3 Setting Network Connection 6
3.4 Network Connection for Master 6
3.5 Network Connection for Nodes 7
3.6 Connection Testing 8
4. Setting up for Master and Nodes 9
4.1 Master 9
4.1.1 Edit ntsysv 9
4.1.2 Edit rc.local file 9
4.1.3 Edit hosts file 9
4.1.4 Edit sync.sh file 10
4.1.5 Edit bashrc file 11
4.1.6 Edit ssh_config file 12
4.1.7 Edit selinux file 12
4.1.8 Edit yum.config file 12
4.1.9 Download hpc.tar.gz 12
4.2 Nodes 13
iii

4.2.1 Edit ntsysv 13
4.2.2 Edit rc.local file 13
4.2.3 Edit hosts file 14
4.2.4 Edit ssh_config file 14
4.2.5 Edit selinux file 15
5. Assemble of Master and Nodes 15
5.1 Master 15
5.2 Nodes 20
5.3 Add User 24
Glossary 25













iv

Practical Guide of Building a HPC Cluster
Preface
Often a document is written by a Gurus whom tends to details out every single aspect
in order to develop a solid understanding of the subject matter to the reader. There are
always pros and cons. A lengthy description might ruin the interest of the reader and too
short might only delivered superficial knowledge. A balance between that is the aim of
this reference and the best person to do it is the newbie. A handful terminology, strategy,
instruction and list of important command which has been proven useful are the strength
of this document. Perhaps this document could serves as foundation which attracts the
readers to dwell on the subject matter.
The intended readers are the one who has problem to be solved, as mine in the simulation
of the radiation transport. As a researcher who involved in simulation work, using stand-
alone PC could take a lengthy period of time. The advancement of PC today enables us to
group them together and work as one unit or cluster. There are lots of references which
sometimes create ambiguity instead of clear crystal information on building up the
cluster. It is my hope to make this document as quick reference which provides sufficient
clear information. By following the given steps should bring the readers to have their
own cluster and finally solve their very own problems.
A freeware CenTos 6.3 is used as the operating system and dedicated simulation software
is used to test the parallel computational approach.
Hafizal Yazid, PhD.
January 1, 2014.

1

1. Introduction
1.1 Strategy
A system that is being adopted in this work is to have one master node and the rest is the
slave nodes. The system is shown in Figure 1.1.

Figure 1.1 Cluster system configurations
1.2 Hardware requirement
A homogenous system is used as it is easy to develop and gain higher efficiency from the
cluster set-up. List of hardwares used are given in Table 1.1.
Table 1.1 List of hardware
Hardwares Examples
Internet switch TP-LINK 8-Port Gigabit Desktop Switch 1000 Mbps
Desktop PC (master and slaves) Intel Pentium i5
MSI motherboard
3 Gigabyte DDR3 RAM
500 Gigabyte Western Digital HDD
DVD drive
KVM switch (optional) D-LINK 4-Port USB KVM Switch
2

The next step is to familiarize ourselves with the Linux based environment and some of
the typical commands. This is important as most of the readers are assumed to be the user
of windows based environment. These are described in the next chapter.
2. LINUX environment and commands
2.1 File system structure and paths
The file system structure is arranged in a hierarchical manner. Therefore the file names
are specified by paths. If starting from the root directory, the first forward slashed (/) is
used. If that is not the case, it means starting from the current directory. For example:
/home/group/sally/curriculum_virtae
The path of the file is shown from the root directory as the forward slash (/) is present at
the beginning, or
John/curriculum_virtae
The path of the file is only shown until John directory.
File system hierarchy and a brief description are given in Table 2.1.
Table 2.1 File system hierarchy
Directory Information
/- Root Directory.
/bin bin contains binaries and commands to be used.
/boot boot contains everything required for booting process.
/dev dev contains device or special files.
/etc etc contains configuration files and directories of the current system.
/home home contains user account home directories.
/lib Location for storing libraries, run-time load libraries and kernel modules.
/mnt Location for temporarily mount a filesystem.
/opt opt is reserved for the installation of add-on application software.
/proc proc is used for kernel status routines.
/root The root accounts home directory.
/sbin Location for utilities, root-only commands and addition to the binaries in /bin
/tmp Location for a program to place its temporary files
/usr One section of the filesystem
/var var contains variable data files
3


2.2 Basic commands
Some useful commands are given in Table 2.2 as quick reference.
Table 2.2 Commands and description
Commands Description
ls -l Display a long listing of the file information which consists of
permissions, links, owner group, file size, creation date and file name.
pwd The working directory is displayed.
cd directoryname Change directory from present directory to another specified directory.
cd .. Move back to previous directory.
cd / Move to root directory.
mkdir directoryname To make a new directory.
rm rf filename Recursively remove files and directories.
scp r filename Secure copy recursively.
mv filename To move a file. File at the original location is not maintained.
nano filename File editing using nano text editor.
pico filename File editing using pico text editor.
CTRL-o Save a file.
CTRL-x To exit from the text editor.
q Quit the running process.
top -a To display the running processes and getting the process ID.
kill -9 process id To kill processes and all its child processes based on process ID.
cat /proc/cpuinfo To show cpu info.
cat /proc/meminfo To show memory info.
su To switch user.
ssh nodename To switch between nodes (master/ slaves).
chmod 770 filename Allow the user and group read, write and execute and others no access.
chown newowner Change the owner of the directory.
3. Preparation steps
3.1 Operating system installation
A Linux based operating system, CenTos 6.3 is used in this work. Currently, it is
downloadable at http://www.centos.org. It is a freeware, open source operating system. A
step by step instruction is given below:
BIOS is set to boot from DVD drive.
Centos CD is inserted into DVD drive. Restart the PC.
4

Select Install or upgrade Centos. Skip to media test. Select Next.
Select English during installation process. Select U.S English for keyboard.
Select Basic storage devices. Select Yes, discard any data.
Name the computer. E.g master.island1.mint.gov.my. Select Next.
Select city for time zone. E.g. Asia/Kuala Lumpur. Select Next.
Key in Root password. Then confirm. Select Next.
Select Use All Space for installation. Select Next.
Select Write changes to disk. (formatting of hard disk).
Select Software Development Workstation. Mark Centos at the bottom. Select Next.
Dependency check..Installation CentOS 6 (1477 packages).
Reboot. Welcome front page is displayed.
Select Forward.
Select Agree the licence agreement. Select Forward.
First user is created. Username, Full Name, Password, Confirm Password are filled in.
Select Forward.
Select Enable Kdump. Select Yes to reboot. Select OK.
By following the steps, CenTos 6.3 is successfully installed as the operating system. The
same steps are followed for master and nodes. The only difference is the name of the
computer. Before proceed the installation, it is advisable to fix the PC name. For
example:
Master: master.island1.mint.gov.my
5

Nodes: node1.island1.mint.gov.my
node2.island1.mint.gov.my
node3.island1.mint.gov.my
Beside PC name, communication within cluster is achieved through internal address and
the addresses are given for each PC. For example:
Master: 192.168.0.1
Node1: 192.168.0.2
Node2: 192.168.0.3
Node3: 192.168.0.4
3.2 Ethernet connection
For master, two Ethernet ports are used namely eth0 and eth1. For eth1 (built-in port), it
is connected to the wall (outside network) and for eth0 (network card), it is connected to
the Fast Ethernet Switch. For nodes, the Ethernet port, eth0 (built-in) is connected to the
Fast Ethernet Switch. By using this kind of connection, only master is detected by the
outside network.
3.3 Setting network connection
If your PC has a static IP address given by your administrator, you have to use that IP in
the setting. If that is not the case, then your PC has a dynamic IP address (typical case for
safety reason). Anyhow, you can know your IP by using ifconfig command as shown in
Figure 3.1. In this example, the IP address is 10.10.2.137.
In this work, we assume our PC has a dynamic IP address as the above IP might be
changed by the administrator. The next step is to set-up the network connection for each
nodes and master. By right clicking on the network logon, the connection is accessed.
6


Figure 3.1 Checking IP address using ifconfig command.
3.4 Network connection for master
A step-by-step instruction is given below for System_eth0.
Select System_eth0.
Ensures the Connect automatically button is checked.
Select IPv4 settings.
Select Manual.
Key-in Address, Netmask and Gateway as follows (example).

Address Netmask Gateway
192.168.0.1 255.255.255.0 0.0.0.0
Ensures Require IPv4 button is checked.
Ensures Available to all users is unchecked.
7

Select Apply, key-in password for root, Authenticate and Close.
A step-by-step instruction is given below for System_eth1.
Select System_eth1.
Ensures the Connect automatically button is checked.
Select IPv4 settings.
Select Automatic (DHCP) addresses only.
Key-in your DNS servers (example): 10.10.150.39, 8.8.8.8
Ensures Require IPv4 button is checked.
Ensures Available to all users is unchecked.
Select Apply, key-in password for root, Authenticate and Close.
3.5 Network connection for nodes
For nodes, internal IP (eth0) only is considered. The instruction is given below.
Select System_eth0.
Ensures the Connect automatically button is checked.
Select IPv4 settings.
Select manual

Address Netmask Gateway
192.168.0.2 255.255.255.0 192.168.0.1

Key-in your DNS servers (example): 10.10.150.39, 8.8.8.8
8

Ensures Require IPv4 button is checked.
Ensures Available to all users is checked.
Select Apply, key-in password for root, Authenticate and Close.
3.6 Connection testing
In our system, master is able to be connected to the internet. First, we have to initiate the
network in the master PC. Open terminal and switch to root.
Command: su
From root, switch to init.d directory and key-in network restart. Enter.
Command: /etc/rc.d/init.d/network restart
Check whether the network is established or not.
Command: ifconfig
Your IP address should easily being identified if the network is working. You also could
ping to other address. For example:
Command: ping 10.10.6.1
You should be able to get reply from that address.
4. Setting up for master and nodes
4.1 Master
4.1.1 Edit ntsysv
Command: ntsysv
Disable all abrt, cups (for printer) and iptables (for firewall). Enable nfs.

9

4.1.2 Edit rc.local file
The file is opened using a text editor. In this work, nano text editor is used.
Command: nano /etc/rc.local
You shall see the below line.
touch /var/lock/subsys/local
Add in below lines.
setenforce 0
/etc/rc.d/init.d/iptables stop
disable selinux
Save and exit.8
Command: Ctrl O and Ctrl X
4.1.3 Edit hosts file
Command: nano /etc/hosts
You shall see the below lines.
#e.g. 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
Add in below lines.
192.168.0.1 master master.island1.mint.gov.my
192.168.0.2 node1 node1.island1.mint.gov.my

10

192.168.0.3 node2 node2.island1.mint.gov.my
192.168.0.4 node3 node3.island1.mint.gov.my
Save and exit.9
4.1.4 Edit sync.sh file
Command: CD /etc/cron.daily
At the cron.daily directory, create sync.sh file.
Command: nano
Exit and Save as sync.sh file. The status of sync.sh file is checked.
Command: ls l
-rw -r - - r - -. 1 root root sync.sh
Change the mode so that you can edit.
Command: chmod 755 /etc/cron.daily/sync.sh
Check again.
Command: ls l
-rwxr - xr - x . 1 root root sync.sh (green color)
Now you can edit the file.
Command: nano /etc/cron.daily/sync.sh
Add in below lines.10
for node in node1 node2 node3

11

do
scp /etc/passwd $node:/etc/passwd
scp /etc/shadow $node:/etc/shadow
scp /etc/group $node:/etc/group
scp /etc/hosts $node:/etc/hosts
done
Save and exit.11
4.1.5 Edit bashrc12 file
Command: nano /etc/bashrc
Add in below lines.
alias pico=nano
PATH=$PATH:/usr/local/maui/sbin:/usr/local/maui/bin
Save and exit.
4.1.6 Edit ssh_config file
Command: nano /etc/ssh/ssh_config
Make changes to:
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
Save and exit.

12

4.1.7 Edit selinux file
Command: nano /etc/sysconfig/selinux
Add in below lines.
SELINUXTYPE=targeted
SELINUX=disabled
Save and exit.
4.1.8 Edit yum.config file
Command: nano /etc/yum.conf
Add in below lines.
proxy=http://xxx.xx.xx:port
For example: ( proxy=http://10.10.150.102:8080)
4.1.9 Download hpc.tar.gz
Download the file from the root directory [root@master~].
Command: wget http://10.10.7.200/hpc.tar.gz15
Command: scp address:/folder/filename address:/folder/16
Untar the file at the current directory.
Command: tar xf hpc.tar.gz
The hpc file is then will be available. This hpc file contains files which are required in
the cluster set up. Then switch to nodes.

13

4.2 Nodes
4.2.1 Edit ntsysv7
Command: ntsysv
Disable all abrt, cups (for printer) and iptables (for firewall). Enable nfs.
4.2.2 Edit rc.local file
Mount master into nodes. For example mount 10.10.7.xxx:/home /home or mount
master:/home /home.
Command: nano /etc/rc.local
You shall see the below line.
touch /var/lock/subsys/local
Add in below lines.
setenforce 0
/etc/rc.d/init.d/iptables stop
disable selinux
mount putehsolo:/home /home
Save and exit.8
Command: Ctrl O and Ctrl X
4.2.3 Edit hosts file
Command: nano /etc/hosts
You shall see the below lines.

14

#e.g. 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
Add in below lines.
192.168.0.1 master master.island1.mint.gov.my
192.168.0.2 node1 node1.island1.mint.gov.my
192.168.0.3 node2 node2.island1.mint.gov.my
192.168.0.4 node3 node3.island1.mint.gov.my
Save and exit.9
(Please ensures this file is similar to master)
4.2.4 Edit ssh_config file
Command: nano /etc/ssh/ssh_config
Make changes to:
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
Save and exit.
4.2.5 Edit selinux13 file
Command: nano /etc/sysconfig/selinux
Add in below lines.
SELINUXTYPE=targeted
SELINUX=disabled

15

Save and exit.
5. Assemble of master and nodes
At this stage you can assemble them to form a cluster through the fast Ethernet switch.
By default you can ping nodes from master and vice versa to check the connectivity. By
using ssh command you can log in to a remote computer (from master to nodes or vice
versa) and work on it remotely. For example:
Command: ssh node1
Switch back to master using the same command. For example:
Command: ssh master
Then the set-up is proceeds on the master in which previously has hpc file.
5.1 Master
Switch to root as the installation proceeds in root as user.
Command: su
As the hpc file is located in root directory, switch to root directory and list the files in that
directory.
Command: cd /root
Command: ls
[root@master ~]# ls

16


Open hpc file. There are many files inside that folder. Untar the torque-4.1.2 file.
Command: tar xf torque-4.1.2
The file will be extracted.
[root@master hpc]# ls

Change to torque-4.1.2 directory and there are many files inside that folder.
Command: cd torque-4.1.2
List the files in that folder.
Command: ls
[root@master torque-4.1.2]# ls

17


Now it is the time to set-up the torque. In this directory run configure, make and make
install.
Command: ./configure
Command: make
Command: make install
Then list back the content of the directory.
Command: ls
It is now torque packages are listed.

18


Now switch to home, apps directory.
Command: cd /home/apps
At this directory, make directory named torque.
Command: mkdir torque
Copy torque packages into this directory.
Command: scp r /root/hpc/torque-4.1.2/torque-package* /home/apps/torque
Then copy mpich2-1.3.2p1 file into apps directory.
Command: scp r /root/hpc/mpich2-1.3.2p1 /home/apps
Proceed with the mpich2-1.3.2p1 installation. Go to apps directory and open mpich2-
1.3.2p1 folder. In this folder, do the installation.

19

Command: ./configure
Command: make
Command: make install
Then move back to torque-4.1.2 directory. Three files are required to be copied from
torque-4.1.2 directory into etc directory.
Command: cp /root/hpc/torque-4.1.2/contrib./init.d/pbs_mom /etc/init.d/
Command: cp /root/hpc/torque-4.1.2/contrib./init.d/pbs_server /etc/init.d/
Command: cp /root/hpc/torque-4.1.2/contrib./init.d/trqauthd /etc/init.d/
Next check configuration of the files.
Command: chkconfig add pbs_server
Command: chkconfig add pbs_mom
Command: chkconfig add trqauthd
Then declare the nodes in the server. If the nodes file does not exist, create the file.
Command: nano /var/spool/torque/server_priv/nodes
Key-in the following.
master.island1.mint.gov.my np=2
node1.island1.mint.gov.my np=2
node2.island1.mint.gov.my np=2
node3.island1.mint.gov.my np=2
Save and exit.

20

Then check the server name.
Command: nano /var/spool/torque/server_name
Please ensure that in this case the server name is master.island1.mint.gov.my. If that is
not the case, key in the server name.
Next is to set-up maui.
Command: nano /usr/local/maui/maui.cfg
Key-in the following.
SERVERHOST master
Save and exit.
Command: nano /etc/maui.d
Key-in the following.
MAUI_PREFIX=/usr/local/maui
Save and exit.
Copy maui.d from maui-3.3.1.
Command: cp /root/hpc/maui-3.3.1/etc/maui.d /etc/rc.d/init.d/
5.2 Nodes
From master, move to node1.
Command: ssh node1
Copy hpc.tar.gz folder into the root directory.
Command: scp r master:/root/hpc.tar.gz node1:/root/

21

Untar the file.
Change the mode so that you can edit if necessary.
Command: chmod 755 /root/hpc
Check again.
Command: ls l
-rwxr - xr - x . 1 root root
Now you can edit the file.
Move to hpc/torque-4.1.2 folder.
Command: cd /root/hpc/torque-4.1.2
List the files in that folder.
Command: ls
[root@node1 torque-4.1.2]# ls
Install torque-4.1.2 as previously in master.
Then the torque packages will be listed.

22


Now switch to home, apps directory.
Command: cd /home/apps
At this directory, make directory named torque.
Command: mkdir torque
Copy torque packages into this directory.
Command: scp r /root/hpc/torque-4.1.2/torque-package* /home/apps/torque
Then copy mpich2-1.3.2p1 file into apps directory.
Command: scp r /root/hpc/mpich2-1.3.2p1 /home/apps
Proceed with the mpich2-1.3.2p1 installation. Go to apps directory and open mpich2-
1.3.2p1 folder. In this folder, do the installation.

23

Command: ./configure
Command: make
Command: make install
At this stage torque-4.1.2 and mpich2-1.3.2p1 has been properly configured at node1.
Then declare the server in the nodes using config file.
Command: nano /var/spool/torque/mom_priv/config
Key-in the following.
$pbserver master.island1.mint.gov.my
$usecp *://
$logevent 255
Save and exit.
Then check the server name.
Command: nano /var/spool/torque/server_name
Please ensure that in this case the server name is master.island1.mint.gov.my. If that is
not the case, key in the server name.
Move back to master and reboot master and nodes for changes to take effect.
Then start HPC in master and nodes. In master do the following:
1. /etc/rc.d/init.d/trqauthd start
2. /etc/rc.d/init.d/pbs_server start
3. /etc/rc.d/init.d/maui.d start

24

4. /etc/rc.d/init.d/pbs_mom start
Then move to node to start pbs_mom as following:
Command: /etc/rc.d/init.d/pbs_mom start or /etc/rc.d/init.d/pbs_mom
Or just activate from master: ssh nodeX pbs_mom
In master do the following:
[root@master~]# pbs_server t create
[root@master~]#qterm t quick
Then test the server configuration.
[root@master~]# qmgr c p s
[root@master~]# qstat q
[root@master~]# pbsnodes a
5.3 Add user
A new user is added in master using addUser.sh command. Then copy a file named
keyless.sh from master to user directory. Then run the script in the user directory one
time only using sh keyless.sh. Example:
Command: ./keyless.sh master mtec 22
Everytime when add a new user, please ensure the sync.sh is running to synchronize
master and nodes.
At this stage the cluster is ready to be installed with any applications that need to be run
in parallel. Good luck.


25

Glossary
homogeneous system A group of computers which have similar hardware
configuration.
KVM switch KVM is an abbreviation for keyboard, video and mouse. It is a hardware
device that allows a user to control multiple nodes from one node or master.
ntsysv - The ntsysv utility provides a simple interface for activating or deactivating
services. You can use ntsysv to turn an xinetd-managed service on or off. You can also
use ntsysv to configure runlevels. By default, only the current runlevel is configured. To
configure a different runlevel, specify one or more runlevels with the --level option. For
example, the command ntsysv --level 345 configures runlevels 3, 4, and 5.
rc.local This file contains BASH commands which will be run after run-level specific
commands whenever the system is booted.
hosts The hosts file is a computer file used by an operating system to map hostnames
to IP addresses. The hosts file is a plain text file, and is conventionally named hosts.
sync.sh synchronisation script fetching all other components of the update system,
overwrites getfile.sh, allowing the update system to effectively update itself.
bashrc When Bash starts, it executes the commands in a variety of different scripts.
When Bash is invoked as an interactive login shell, it first reads and executes commands
from the file /etc/profile, if that file exists. After reading that file, it looks for
~/.bash_profile, ~/.bash_login, and ~/.profile, in that order, and reads and executes
commands from the first one that exists and is readable. When a login shell exits, Bash
reads and executes commands from the file ~/.bash_logout, if it exists. When an
interactive shell that is not a login shell is started, Bash reads and executes commands
from ~/.bashrc, if that file exists. This may be inhibited by using the --norc option. The --
rcfile file option will force Bash to read and execute commands from file instead of
~/.bashrc. Creating useful aliases (for example alias ll='ls -l'), Adding more directories to
PATH and setting new environment variables.

26

ssh ssh (SSH client) is a program for logging into a remote machine and for executing
commands on a remote machine. It is intended to replace rlogin and rsh, and provide
secure encrypted communications between two untrusted hosts over an insecure network.
X11 connections and arbitrary TCP/IP ports can also be forwarded over the secure
channel. ssh connects and logs into the specified hostname The user must prove his/her
identity to the remote machine using one of several methods depending on the protocol
version used.
selinux Security-Enhanced Linux (SELinux) is a Linux kernel security module that
provides the mechanism for supporting access control security policies, including United
States Department of Defense-style mandatory access controls (MAC). SELinux is a set
of kernel modifications and user-space tools that can be added to various Linux
distributions. Its architecture strives to separate enforcement of security decisions from
the security policy itself and streamlines the volume of software charged with security
policy enforcement. The key concepts underlying SELinux can be traced to several
earlier projects by the United States National Security Agency. It has been integrated into
the Linux kernel mainline since version 2.6, on 8 August 2003.
Yum Yum is nice package manager for RPM-based systems. If you are already using
yum, you can set up OpenVZ yum repository and install/update OpenVZ software using
yum.

Das könnte Ihnen auch gefallen