POC Linux7 Multinode GFS2 Cluster On CentOS 7.3

Proof of concept Linux high availability GFS2 cluster
Proof of concept
Building a multi-node High Availability

GFS2 cluster for CentOS Linux using iSCSI.
By: Kjell Erik Furnes

Version 1.02
February 2018
Written by Kjell Erik Furnes Page 1 of 69

kjell.erik.furnes@online.no
Contents
About the writer: ................................................................................................................................. 4
Prefix.................................................................................................................................................... 4
Disclaimer ............................................................................................................................................ 5
Credits ................................................................................................................................................. 5
Current environment........................................................................................................................... 6
My home test environment........................................................................................................ 6
Initial installation ................................................................................................................................. 7
Common server configuration .................................................................................................... 7
Initial warnings ................................................................................................................................ 9
Restrict the server, (part 1), ssh access ......................................................................................... 10
Repo server................................................................................................................................ 10
Update the yum config .................................................................................................................. 10
Update the server.......................................................................................................................... 11
Restrict the server, (part 2), Restrict host access .......................................................................... 13
TCP_WRAPPER........................................................................................................................... 13
Linux jump host ......................................................................................................................... 13
Restrict the server (part 3), Restrict sudo ..................................................................................... 13
Preparing the server .......................................................................................................................... 14
Estabilsh old notation of network interfaces ............................................................................ 16
Uninstall NetWorkManager ...................................................................................................... 16
Alter the network interface card configuratig files ....................................................................... 16
Modify grub ................................................................................................................................... 17
Disable IPv6 ............................................................................................................................... 17
Time synchronization .................................................................................................................... 18
Setting up chrony ...................................................................................................................... 18
sudo chronyc sourcestats -v ...................................................................................................... 20
Reboot ........................................................................................................................................... 20
Configure firewalld ........................................................................................................................ 20
systemd ......................................................................................................................................... 22
ISCSI ............................................................................................................................................... 23
Install iscsi package ....................................................................................................................... 23
Naming the ISCSI Initiator ......................................................................................................... 24
Connect to iscsi target ............................................................................................................... 28
Modify iscsi service.................................................................................................................... 29

Vmware housekeeping .............................................................................................................. 31

Planning the cluster ........................................................................................................................... 31
Setting up the cluster ........................................................................................................................ 31
Number of nodes in a cluster .................................................................................................... 32
Cluster related package installation .......................................................................................... 33
Set up the cluster .............................................................................................................................. 33
Encrypting the cluster sync traffic ............................................................................................. 33
Generating the authkey file:...................................................................................................... 33
Verify the cluste configuration .................................................................................................. 37
Setting up fencing .......................................................................................................................... 39
Fencing agents ........................................................................................................................... 39
Watchdog .......................................................................................................................................... 44
Install the watchdog service ...................................................................................................... 44
Test the fencing mechanism...................................................................................................... 46
GFS2 ............................................................................................................................................... 52
Add more disk to you GFS cluster ..................................................................................................... 56
Extend the VG ............................................................................................................................ 56
Create a new logical volume ..................................................................................................... 57
Create a new GFS2 filesystem ................................................................................................... 57
Create a new GFS2 cluster resource.......................................................................................... 58
Moifying the custer ........................................................................................................................... 58
Add Nodes to the cluster ............................................................................................................... 58
Adding the new node ................................................................................................................ 59
Setting a cluster node in standby mode .................................................................................... 64
Taking a cluster node out of standby mode .............................................................................. 65
Banning a resource from a cluster node ................................................................................... 66
Removing the ban ..................................................................................................................... 66
Moving a cluster resource. ........................................................................................................ 67
Remove nodes from the cluster .................................................................................................... 68

About the writer:

In my younger days I held both Norwegian and US Pilot and Instructor licenses, heading for a career
in the skies. Now, the closest I get is some cloud services.
I am an Oracle database administrator, Linux administrator and Commvault Backup administrator

currently employed at The University of Oslo in Norway.
I have worked on most of the normal platforms (HP-UX, AIX, TRU64, Linux and Windows) as a system
administrator and as an Oracle database administrator since 1991 (Oracle database version 6), and
SAP Basis administrator since 1998. I have worked in Norwegian governmental organizations and in
the Norwegian private sector.
When I am not clonking on a keyboard chances are that I might be out on some bike-ride.You can
follow me on STRAVA.
https://www.strava.com/athletes/2233700
Prefix
When I have written documents before that I plan to publish on the Internet I like to write about
something that has not been written much about before.
This time I am writing about Linux cluster, and that is a topic where there exist a very large quantity
of documents already on the Internet. However I am writing this document because clustering is
something that is constantly evolving. Also many of the documents you can find online is related to
two-node clusters, and there is some differences between a two-node cluster and a cluster that
consist of more than two nodes.
In my opinion, a two-node cluster is not production ready.
I am going to show you how to build a four node cluster which I am going to expand to a six node
cluster, and finally I am going to remove one node from the cluster.
The cluster I am going to set up will be a cluster for GFS2.
In this document, I have altered some output to make it more readable. Mostly this means that I
have removed whitespace on lines to make the line fit on a single line where that has been possible.

Disclaimer
This document was written as a proof of concept to show the possibilities of a certain
configuration. The writer or anyone associated with the writer cannot be made liable for any
damage to your systems nor can they be made liable for any loss of data if you choose to
follow the concept shown in this document.
Trademarked names, logos, and images may appear in this document. Rather than to use a
trademark symbol with every occurrence of a trademarked name, logo or image, I use the names,
logos or images only in an editorial fashion and to the benefit of the trademark owner, with no
intention or infringement of the trademark.
The use in this document of trade names, trademarks, service marks, and similar terms, even if they
are not identified as such, is not to be taken as an expression of opinion as to whether or not they are
subject to proprietary rights.
While the advice and information in this document are believed to be true and accurate at the date of
writing, neither the writer nor anyone associated with the writer can accept any legal responsibility for
any errors or omissions that may be made.
The writer or anyone associated with the writer makes no warranty, expressed or implied, with respect
to the material contained within this document.
This document is not meant to replace the official installation documentation or

configuration guides provided by The CentOS Project. If you choose
to follow this document you do so at your own risk.
The comments and statements made in this document are solely those of the
writer and does not reflect the views or opinion of the writers employer or anyone
associated with the writer.
The document use the domain name example.com as defined at

https://en.wikipedia.org/wiki/Example.com
Credits
Linux is a registered trademark of Linux Torvalds
RedHat is a registered trademark of Red Hat Inc

https://www.redhat.com/en/about/trademark-guidelines-and-policies
CentOS is a registered trademark of The CentOS Project

https://www.centos.org/leagal/trademarks.
VMware is a registered trademark of Vmware in the United States and other

Countries
https://www.vmware.com/help/trademarks.html

Current environment
My home test environment
NTP
DNS
amygdala cerebellum
sinister dextro
Net Function Speed MTU

10.47.253.0/24 Public 1gbit 1500 Active
170.20.16.0/24 storage170 1gbit 1500 Not in use
171.20.16.0/24 storage171 1gbit 1500 Active
172.20.16.0/24 cluster private 1gbit 1500 Active
In my configuration I have chosen to separate the different types of network traffic. If you are
running your cluster on Fiber-Channel it is obvious that you would reduce the number of network
interfaces. But in addition I have separated the traffic so that I have cluster intercommunication
running on a separate network. This is not necessary. You are able to run the cluster private traffic
over the same network as your client traffic. In older days the cluster private interconnect was called
a heartbeat, and there was even a process called heartbeat. The cluster private interconnect is now
handled by the corosync process.
The cluster interconnect is quite chatty, and it is very time critical for the cluster responses. If your
system is I/O intensive to such an extent that you risk creating latency over a single network I advise
that you separate the cluster private interconnect on a separate network.
Host IP Function NIC’S

Sinister 10.47.253.6 VMware ESXi 6.5 Free 5
Dexter 10.47.253.7 VMWare ESXi 6.5 Free 5
Amygdala 10.47.253.36 jump host 1
Cerebellum 10.47.253.9 DNS, NTP 1
Cortex 10.47.253.3 Storage, NAS, ISCSI 4
Thalamus 10.47.253.99 Local yum repository server 1

Initial installation
Into my basic environment I am going to place a Linux cluster
Cluster name: cerebrum

cerebrum
poc-gfs-cl01 poc-gfs-cl02 poc-gfs-cl03 poc-gfs-cl04 poc-gfs-cl05
NTP
DNS
amygdala cerebellum
sinister dextro
All my servers that will be included in this document are vmware guest servers located on my ESXi
host servers.
Odd number guests are located on the sinister host, and even numbered guests are located on the
dexter host.
Both the public name and the private name are registered in my dns.
Public host name Public IP Private name Private IP

poc-gfs-cl01 10.47.253.101 privcl01 172.20.16.101
poc-gfs-cl02 10.47.253.102 privcl02 172.20.16.102
poc-gfs-cl03 10.47.253.103 privcl03 172.20.16.103
poc-gfs-cl04 10.47.253.104 privcl04 172.20.16.104
poc-gfs-cl05 10.47.253.105 privcl05 172.20.16.105
poc-gfs-cl06 10.47.253.106 privcl06 172.20.16.106
Each of the cluster nodes has 4 NIC’s, public (10), private (172) , and storage (171).
The IP address of the additional NICs will follow the last octet address of the public IP address. So a
server which has IP address of 10.47.253.101 will also have IP addresses, 171.20.16.101 and
172.16.20.101.
Common server configuration

All VMWare guest servers that I have set up for this document has been installed by using CentOS
minimal installation done from Centos-7-x86_64-Everything-1611.iso
On each of my servers I have created one additional user apart from root. The user was created
during the installation, and the user is an administrator (member of the wheel group).

During the installation and configuration I will keep to the default settings as much as possible. I have
made some exceptions where it has been necessary to implement the configuration I want to
demonstrate, and some alterations in order to restrict the access to the server.

Initial warnings
I will repeat these warnings at the appropriate location within the document. I just list some of them
here to give you a heads up before you start reading.
History lesson: I once tried to remove all core dump files from a running production system. I had a
word document with the command
find / -name core –exec rm –rf {} \;
characterset difference caused the – before the name parameter to be void. The resulting command
turned out to be
find / -exec rm –rf {} \;
which is a good way to say find anything and delete it.
I almost put a company out of business that day. The system was saved because the find started in
the /bin directory making the find command unable to fork and delete the rest of the system.
Lesson learned: Don’t cut and paste from a word document.
Warning:

I am not a big fan of disabling security measures. So because I am going to reduce security by running
selinux in Permissive mode I am going to increase security on my servers by other means.
Restrict the server, (part 1), ssh access
Repo server
If you are planning to have an environment where you will be adding new servers into your cluster
configuration it is wise to be able to control the configuration and versions of packages to be
installed on your systems in order to have consistency over time.
I have done this by installing a local yum repository server.
If you don’t have a local yum repository server you can skip the configuration changes I am outlining
here.
Update the yum config
13 [base]
14 name=CentOS-$releasever - Base
15 mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=os&infra=$infra
16 #baseurl=http://mirror.centos.org/centos/$releasever/os/$basearch/
17 gpgcheck=1
18 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
19
20 #released updates
21 [updates]
22 name=CentOS-$releasever - Updates
23 mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=updates&infra=$infra

24 #baseurl=http://mirror.centos.org/centos/$releasever/updates/$basearch/
25 gpgcheck=1
27
28 #additional packages that may be useful
29 [extras]
30 name=CentOS-$releasever - Extras
31 mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=extras&infra=$infra
32 #baseurl=http://mirror.centos.org/centos/$releasever/extras/$basearch/
33 gpgcheck=1
35
36 #additional packages that extend functionality of existing packages
37 [centosplus]
38 name=CentOS-$releasever - Plus
39 mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=centosplus&infra=$infra
40 #baseurl=http://mirror.centos.org/centos/$releasever/centosplus/$basearch/
41 gpgcheck=1
42 enabled=0
Change each occurance of the mirrorlist url to the url of your local server
baseurl=http://<reposerver>/centos/$releasever/updates/$basearch/
Update the server
and let the command complete.

Since I am in control of my package repository I now know that the installed pacakges on my server is
the same for all of my installations, and it is time to start the modifications to prepare the server to
become a cluster server.

Restrict the server, (part 2), Restrict host access
TCP_WRAPPER
As part of the update from the vanilla installation (from the DVD) our system now also has
TCP_WRAPPERS installed which will allow me to restrict which host are allowed to connect to my
server.
Linux jump host

In my configuration I have a jump host to reduce the attack surface of my servers. My jump host has
the IP address 10.47.253.36, and by applying a TCP_WRAPPER config to my client servers I reduce the
attack surface accordingly.
With these settings the only ssh connection in to my server is from my jump host, and I have disabled
a direct root acces over ssh.
Do a reboot now to start the server on the new kernel (there was a new kernel in the update )
Restrict the server (part 3), Restrict sudo
Now I will restrict so that no user is allowed to become root by using su or sudo –i. I am also
restricting so that no users will be able to modify the /etc/sudoers file. Users must enter password
for sudo commands if they enter a command more than 3 minutes since last time they entered the
password ( set it to 0 to enforce password for every command ( make sure you are either good
friends with administrator colleges or not a friend at all before you set this to 0)). And all sudo
commands are logged.
To view the log of activities where the sudo command has been used run

Preparing the server
In RHEL7 and clones there was a change in how the interfaces on the server was named. Traditionally
the network interface was named eth[0123]. With the release of version 7 there are five different
ways (schemes) to naming the interface as long as biosdevname is enabled (default). The system
process will do a cascading loop from 1 to 5.
Sheme 1: BIOS or firmware indexing for onboard devices (eno)
Scheme 2: BIOS or firmware indexing for PCI devices (ens)
Scheme 3: Naming by physical location of the device (enp)
Scheme 4: Naming by MAC address (enx)
Scheme 5: If all else fail, use traditional naming (eth).
From this list you see that even if I don’t have any IPv6 in my network the default is still to configure
local link IPv6 addresses for each NIC. These link local addresses can be viewed as localhost IP
addresses just to ensure that some local processes has a IPv6 address to talk to. I don’t want IPv6 on
my network so one of the first things I will do is to disable IPv6 completely. This will be done when
modifying the grub configuration file a little later.

To disable the new naming schemes of the interface you need to modify the parameters used during
the boot sequence.
NOTE: Prior to release 7.3 there was a compatibility issue between the system process and VMware
causing the interface naming to get an acpi index numbers during the boot sequence. (Actually you
would get a non-initialized parameter result of the acpi index). If you were lucky your interface would
get the same number after a boot, but you could also get a different interface name after a boot.
This is fixed in release 7.3
Source: https://thornelabs.net/2016/07/23/kickstart-centos-7-with-eth0-instead-of-predictable-
network-interface-names.html

Estabilsh old notation of network interfaces
Uninstall NetWorkManager
In order to replace the new notation of the network interface we need to remove the
NetworkManager package from the server
Alter the network interface card configuratig files

Rename and modify NIC start files
To get a list of which interface has which address

Copy and alter the interface scripts accordingly. I also create a backup file of the original file. (I also
remove the " basicly because it looks out of place)
Change ifcfg-ens192 to ifcfg-eth2
Modify grub
In order to re-establish the old notation of the network interfaces we need to modify the
commandline for the boot loader.
The textfile for the configuration of the bootloader is located in the file
/etc/defaults/grub
We need to append the following to the GRUB_CMDLINE_LINUX
net.ifnames=0 biosdevname=0
Disable IPv6
Since I also want to disable IPv6 I need to append the following to the same line
ipv6.disable=1
modify grub line GRUB_CMDLINE_LINUX append at end inside the quotation marks
when you are done the GRUB_CMDLINE_LINUX should look something like this:

Now you are done with modifying the grub textfile. Now you need to create a binary file which will
be read during the boot sequence.
Time synchronization
When setting up a cluster it is vital that the nodes in the cluster are time synchronized. To avoid
performance degredation with unnecessary inode updates because the nodes are unsynchronized.
RedHat recommend that you use the precision time protocol (PTP) provided in the linuxptp package,
but they also state that the cluster nodes must be time synchronized within a few minutes. I like to
think that this must be a fault in the documentation. If I am unable to keep my cluster nodes to
within the fraction of a second with normal NTP I would say that I have a network issue that needs to
be investigated and solved.
Since I am using VMware for my servers my network interfaces does not support PTP.
Setting up chrony
In my network I have set up one NTP server (10.47.253.9). This server is synchronized from the
outside world using a number of public NTP servers, but internally all my servers are synchronized by
just one server.
The configuration file for chrony is /etc/chrony.conf, and there is just a few lines I need to change for
my needs.

Verify that you are now using your local server for time synchronization.
To get an explanation of the different fields used in the output. Use the –v parameter

sudo chronyc sourcestats -v
Reboot
Configure firewalld

If you are still not getting the correct zones. That is a known issue. Don’t worry you have done the
correct steps. We just need to tell the server that we really mean what we have just done.
Allmost finished.

systemd
For some time now many Linux distributions have replaced the boot sequence from using initd to
systemd.
As opposed to the System V init boot process (initd) which was serial the system boot process most
linux systems use now is a multithreaded parallel boot process which you will see boot your system a
lot faster. However, when you start a lot of processes in parallel you may risk that there may be
some dependencies that may be violated in the boot process unless you are carefull with the
configuration. By being carefull I mean that the processes must be configured with the necessary
dependencies in their startup configuration files.
This is exactly what happened with the iscsid process with the default installation. With my iscsi
installation. There is a missing dependency, which in some cases tries to log into your iscsi targets
before the network service is online.
Let me show you how this create errors in the systems, and how to correct the situation.

ISCSI
Install iscsi package
Install the needed iscsi pacakges.
To check the installed packages
If you would like more information about the installed packages
Take a look at the scripts used during installation

Naming the ISCSI Initiator

Now we need to modify your server as a unique iscis initiator
In an iSCSI network, each iSCSI element that uses the network has a unique and permanent iSCSI
name and is assigned an address for access.
What’s in a iSCSI name
ISCSI Qualified Names (IQN) names are prefixed with iqn, followed by the date the naming authority
was estabilished, and the reverse domain name of the naming authority, and postfixed with a unique
identifier for the device.
In my case I am my own naming authority, and I am using the domain name example.com.
So in my case the IQN name for my server objects are iqn.2017-08.com.example:<servername>
Remember to give access to your servers on the storage unit.

In my configuration my servers has a dedicated network for storage
Net Function Speed MTU

10.47.253.0/24 Public 1gbit 1500 Active
171.20.16.0/24 storage171 1gbit 1500 Active
172.20.16.0/24 cluster private 1gbit 1500 Active
For these servers I am using my 171 network for storage. So in order to be able to specify my storage
routing I need to configure a specific iscsi storage interfaces.
To find the needed information you need to extract the ip address and the hardware address for
your NIC.

Then we need to create a pseudo iscsi interface for the storage network.
If you want a more readable output. Try a different Print (info) level
The iser iscsi interface is a backward compatibility interface name. Don’t worry about it, and don’t
modify its settings. If you do so you will break backward compatibility.
In fact If you want to modify any iscsi interface you should create a new interface. If you modify the
settings of either the default or iser interface you will break backward compatibility.
Give the pseudo NIC the HARDWARE ADDRESS from the device config
And finally give the psudo NIC the IP address that corresponds to the NIC.
To show the config of the pseudo interface
To get a full listing of the interface parameters use print level 0 –P0

Now we are ready to connect to the ISCSI portal (the storage unit) to receive a list of the provided
ISCSI targets.
Remember that when you run the discover command against a ISCSI portal it will give you all the
resources of that ISCSI portal on all NIC’s that the portal has.
My storage unit has 4 NICS, and even if I have specified that all IPv6 is disabled on my storage unit it
is still creating a default storage gateway on IPv6. So when I am running the discover command
against my initiators I am are getting a list of containing multiple targets on all storage unit NIC’s.
ISCSI targets

On my storage unit I have defined three targets which I will use as fencing devices in my cluster. Each
of these targets are connected to a LUN of 1G ( which is the smallest unit I can create ). The LUN’s is
created on three different diskgroups on my storage unit so that if one of the diskgroups or LUN’s fail
there will still be two more to ensure the integrity of my cluster.
I have created one target which has is connected to a number of LUN’s. This target is going to be my
connection to the disks which I will create my clustered GFS2 filesystems.
Discover iscsi targets from the portal
This command is going to create a number of configuration files in your/var/lib/iscsi/nodes

directory
These are the targets that is defined on my ISCSI portal that I will use in my cluster
Each of these targets has a number of subdirectories
Under each of these directories there is a config file

In order to disable traffic over unwanted interfaces I am going to modify these files. In my
configuration I have specified that the ISCSI traffic is going over my iface171 ISCIS interface witch has
a specified HARDWARE ADDRESS. But even so if we let the ISCSI process run free it will try to log on
to the portal with inbound route on 171 network and request data on the outbound route of the
10,172, and 173 network ( and over the IPv6 network which is really making a mess of things ).
Making the needed changes :
Let’s take a look at the current startup config
So the default configuration is that the iscsi service is going to start each of the nodes on each of the
IP addresses that the storage unit is providing even if our interface is only providing a 171 address.
Therefor I am going to change all the config files to node.statrup to manual
I still need my targets on the storage network. So I am going to change the config files back to
automatic for the needed targets having outbound routing to the 171.20.16.3 address to
node.statrup=automatic
Connect to iscsi target
Now I am ready to log on to the ISCSI portal to get the ISCSI targets
Now I have logged in to the Portal and I have received NAS provided disk

Modify iscsi service
Now here comes the fun part. Iscsi “remember” the state of the session across boots. So If you have
an established iscsi session before a server boot that session will try to re-establish when the server
comes back up again.
However, after a boot my iscsi sessions were not re-established which ment I had to go error-
hunting.
I was able to re-establish the iscsi session manually if I ran a ISCSI login
Since I was able to log in to the portal I conclude that there was nothing wrong with the iscsi
configuration.
I found the error in the messages log of the server. There it was apparent that the iscsi process tried
to log in to the portal before the network was established. Therefor it was clear that in this version of
the ISCSI package a somewhat important omission has been made.

These are the unit files which is used for iscsi I am going to modify the controlling file in order to
control the boot sequence for the iscsi service.
I am going to modify the iscsi daemon configuration file by adding one line to the file
We need to insert a line in the iscsid.service file to specify thath the service requires the network to
be fully estabilished.
As you see there is already a “ ”, but there is a subtle difference between

starting after and requiring. Starting after simply means that the service may start immediately after,
but in the parallel processing of system there is no guarantee that the prior service has reached
established state in the right order.
So by specifying Requires in the unit file we introduce a slightly longer boot sequence. We are now
waiting for a specific service to reach an established state before we initiate the start of the second

service, but we have instead gained sufficient control of the boot sequence that we know that
networking is established before we start the iscsid service.
Vmware housekeeping
You are now ready to start setting up your cluster. This is where you should take a snapshot of your
servers.
[on ESXi servers]
Planning the cluster
The initial configuration of my cluster is that it is going to be a 4 node cluster for GFS2 using ISCSI as
storage.
I am going to add a fencing device to the cluster
I am going to add a watchdog process to the cluster.
When the cluster is up and running I am going to add two more nodes to the cluster.
I am going to set constraints to the two new nodes so that the GFS2 cluster resources are not
presented to the new nodes.
I am going to set one of the cluster nodes in standby mode
I am going to retire/remove one node from the cluster
Setting up the cluster
Setting up the Linux cluster after you have made sure that you network configuration is correct is a
matter of installing the packages, and distributing the configuration files. Setting up a Linux cluster
has become a lot easier over the years. Even if the core principle is much the same today as it was 10
years ago the process is now a lot less manual, and a lot less error prone.
Remember that defining and administrating a cluster is an exercise in democracy. In a democracy the
key principle is that the majority rules. To have majority in a cluster is defined by the latin word
quorum. The noun loosely translates to the ability to perfom a votation and arrive at a qualified
majority.
In a Linux cluster each cluster member can be viewed as a senator, and each has one vote. There are
some tricks we can perform to give one of the members an additional vote but in general there is
one vote per member.
But what must the President (the cluster) do if the senate becomes fractioned, and both fractions
claim to be the true senate? In a Linux cluster this situation would be called a split-brain, and to avoid

total chaos this situation must be avoided. This is where you would enter a Quorum device ( a Vice
Precident ) who will carry an extra vote. If the senate then becomes fractioned in two equal parts.
Whichever side has the Vice Precident on their side wins.
If you have a system with multiple cluster members the Quorum device will be inefficient because
the Vice President would be overworked by constantly needing to keep updated with each and every
one of the senators. Therefor we establish the Republican Guard (the fencing device) who will be
responsible for kicking out misbehaving senators.
Number of nodes in a cluster

In a two node cluster there is no possibility of quorum if one node fails. Critical state (CS). A two node
custer is allways in a Critical state even if all nodes up and running. If one node goes down, the
cluster must go down to protect the integrity of the resources. Describing such a cluster you could
say that it was a N-1 cluster (CS=N).
Therefor in two node cluster you would add a quorum device. This is a special device which also has
one vote. A two node cluster with a quorum device can be described as CS=(N+Q)-1.
In a cluster with more than 2 nodes you would normally not add a Quorum device because it does
not scale very well. Instead you incorporate a fencing device which will be responsible for evicting
faulty cluster members out of the cluster, and usually initiate a reboot of the offending node.
Members in cluster Quorum Critical State

3 2 CS=N-1
4 3 CS=N-1
5 3 CS=N-2
6 4 CS=N-2
7 4 CS=N-3
8 5 CS=N-3
9 5 CS=N-4
From this table it is obvious that a odd number of cluster nodes is preferred. And in my opinion the
nodes in the cluster should be higher that 4 to avoid having a situation where you are at a critical
state if you loose or do maintenance on one single node in your cluster.
There is one parameter auto_tie_breaker which can be used to set up the cluster so that you are
able to loose up to 50% of the nodes and still be operational. This parameter has a nasty history, and
in my opinion should not be used in a production environment.
If you have a small expensive even numbered hardware cluster. You could instead consider adding a
non-expensive virtual node into your cluster and create constraints so that the cluster node does not
carry any cluster resources. It’s only purpose would be to increase the quorum level. A server like this
is often known as a quorum server.

Cluster related package installation
Installing the needed packages is very easy. There is basicly one pacakage that has all the
dependencies listed, so the only package you need to install is the pcs package. This package will take
care of all the other packages you need.
Executing this command is going to give you a list of some +85 dependencies that needs to be
installed.
In addition to the basic cluster packages this cluster is going to be a GFS2 cluster so we need to install
a few more packages
Set up the cluster
Encrypting the cluster sync traffic

Sure. We can do that, but it comes at a cost.
Before you start setting up encryption remember that this encryption of the control traffic between
the nodes. The traffic you are going to encrypt is the “Hey, I still here” traffic. This will ensure that
there are no possibilities for a man-in-the-middle attacker being able to fence your nodes, but it will
not encrypt the data stored on any of your filesystem nodes.
There is a small penalty in CPU cycles, but that is not really where it is going to hit you the most. By
encrypting the control traffic between the cluster nodes you are going to step away from the most of
the standard documentation you will find online, and chances are that you are going to forget a step
when you are creating new nodes in your cluster.
Most of the standard cluster documentation describes a procedure for adding new nodes to the
cluster where when you start the pcsd service on the new node the system will automatically
synchronize the configuration files to the new node.
Well guess what. The encryption key file is not part of the configuration which is going to be synced.
So in your routines you need to make sure that you copy the authkey file from one of the existing
clusternodes to the new node before you start the cluster services on the new node.
Generating the authkey file:

When generating the authkey the process is dependent on input from you to generate the entropy of
for the key generation. This means that you need to start typing, moving the mouse, and jump up
and down in a random fashion while the corosync-keygen Is running. As an alternative, you could
start a process on the machine where the process will generate the needed random traffic.
Open a new terminal window to the same server and start:
In your other terminal window start the key generation
Now go back to the terminal window where you opened the dd process, and break that process by
pressing Ctrl+C.

Go ahead and delete the file
You need to copy this file to the other nodes that will be part of your cluster.
Now comes the fun part. Actually creating the cluster
[on all nodes]
Enable and start the pcsd.service daemon
You can now check to see if the process started ok
When you install the cluster packages ( yum install pcs ) the installation created the hacluster user
with a disabled password.
This user is used for cluster inter communication so you need to set the same password for the user
on all nodes in your cluster.

[on one node]

Verify the cluste configuration
Verify the cluster configuration after the change
No output means that the configuration is OK

So, how does my corosync configuration look now?

Remember the authkey that we created?
As you can see from the corosync.conf cluster configuration file the cluster has no knowledge of that
file. We need to modify the configuration file, and push it out to the nodes.
On one of the cluster nodes. Add two lines to the configuration file.
On the cluster node where you altered the corosync file.

Redistribute your corosync.conf file
And restart your cluster

Enable the cluster to make sure it starts after boot
The high load of the system is that the corosync process is set at a very high priority on the system.
So when a situation occurs where the cluster processes clvmd and libqb are spinning out of the
corosync has such high priority that it gets priority before other system tasks like network. It may
even block out critical kernel tasks.
The workaround to avoid this situation is to instruct the corosync process to run with standard
realtime priority instead of the highest realtiem priority by adding –p to the /etc/sysconfig/corosync
file.
Setting up fencing
Fencing agents
You also need to know what type of fencing mechanism you are going to use in your cluster.
Naturally you may install all fencing agents to avoid having to take a decision now, or you can install
only the agents you need.

To install all fencing agents
To install only the fencing agents you need you can list the fencing agents
Or if you already know the agents you are going to install install only that agent. I am only going to
install the scsi fencing agent.

In my configuration I will be using scsi_fencing. Scsi fencing is based on a distributet TOTEM where
each node will continually update a key in order for the system to establish that the cluster node is
still present.

If you run the command vgdisplay you will see the devices with status “clustered yes”. An empty list
in the definition of the scsi fencing device will protect all devices which are clustered.
If you opt to name the devices in the definition of your fencing device and you miss one or more
devices you risk that the filesystems on these devices will be corrupt.
Creating the fencing device
If you are going to use multipath for your storage there are some additional steps you should take.

1. Each of your cluster nodes must be identified by a cluster unique key ( I use the cluster node
number as key i.e privcl01 has key 1, privcl02 has key 2 etc)
The key is set in the /etc/multipath.conf file in the defaults section.
2. You must change the path_checker from the default value of “tur” to readsector0 in the
devices section.
In the cluster documentation, they are a little cavalier with the use of the term “device”. This can
lead to some misunderstanding. The documentation say you need a separate device for each node.
The word device in this case means the software fencing thingy. You will need to create the software
fencing gizmo on the same shared disk devices for all nodes, but the fencing devices will be unique
for each of the nodes based on the value of the key and pcmk_host_list for each of the defined
fencing devices. The value of the pcmk_host_list parameter must be the name of the cluster node.
So for each of your cluster nodes you would define ( the commands can be run on one of the nodes):
Explaining the parameters used. There are additional parameters that can be set, but keeping it
simple is a good advice.
Parameter Description
pcmk_host_list A space separated host name list of the nodes that will be controlled
by this fencing device. This parameter is listed as an optional
parameter, but if your cluster traffic is running on a secondary
network like I am doing in this example your cluster may have
problems finding the shortname of the cluster nodes. Remember to
update this parameter if you are adding nodes to your cluster.
meta provides unfencing With the fence_scsi agent you must specify “meta provides
unfencing” to make sure the node is allowed to re-enter the cluster
after a reboot.
Notes for some additional parameters you often find in other documents online:

Because the fence-scsi device is limited to soft removal of the disk access of the node from the
cluster resources, and not rebooting the server I also need to install a watchdog service which will be
my kill-switch. If you do not implement a separate kill-switch you risk getting filesystem corruption
on your GFS2 filesystem. In addition there is no way to re-enter the node into the cluster without a
reboot.
This is especially true if you have an I/O intensive system and you use the default mount option of
the GFS resource “withdraw”. When using the withdraw option it can take a long time for the file
access to removed. Long after the node has been fenced and out of the cluster control.
Watchdog
Note: There are hardware devices on the marked which are known as watchdog devices.
They are not to be mistaken with the software watchdog that I am setting up here.
Install the watchdog service

[On all cluster nodes]

Verify that selinux is not running in enforcing mode
Make sure that you have set the kernel panic parameters correct.
Once more we need to modify the unit file of a service.

Add the two lines to make sure the service is started after the required services has reached
complete state.
In order for the watchdog.service to be able to monitor the fence_scsi process we need to create a
symlink into the /etc/watchdog.d directory.
The file we need to link is located in the /usr/share/cluster/ directory.

There are a number of scripts there

And they are all the same
will show that there are no differences, but according to the documentation you should link the
fence_scsi_check script.
Reload the deamon after we have changed the unit file
Change the configuration of the cluster to enable fencing.
Lets take a look at the cluster properties
Remember the note earlier about hardware watchdog fencing devices
The fencing mechanism I am setting up here is a software watchdog service. This is set up outside of
the cluster and is not known to the cluster. Just accept that the property say there is no watchdog.
Test the fencing mechanism.

Going all the way down
Coming back up again
And all cluster nodes are back up and running again
If you want to verify the incompatibility between the watchdog service for the fence_scsi and selinux
in enforcing mode you may do that now.
Open two terminals to your server.
In one of the terminals to start a view of the logs
In the other window temporarily set selinux in enforcing mode
Red Hat is investigating this issue in Bugzilla #1430909.
The current selinux-policy does not allow the watchdog daemon to execute a script with the
context that exists on the fence_scsi watchdog scripts, and SELinux also does not allow several

actions carried out by those scripts. The result is watchdog will fail to run the enabled script, which
will trigger a reboot.
If you are affected by the bug you should see immediate results by in your log terminal showing that
the fence_scsi_check_hardreboot returns 1 followed by an immediate reboot of the server.
I know I said that the fence-agents-vmware-soap does not work on the free version of the VMware
ESXi which I have. However, having a fencing agent that returns an error is a good thing. Now I can
show you how to use fencing levels.
Fencing levels is the method of setting up a cascading fencing mechanism. If the first fencing device
fail, then continue to the next level. As mentioned the fence-agents-vmware-soap will not work, so I
will set this up as my first fencing mechanism. And that will fail (every time), which means the
fencing will continue to the next level. The next level in this case is my fence_scsi devices which I
know works as designed.
Install the fence-agents-vmware-soap
[on all nodes]
Now there are two fencing agents installed on my cluster nodes.

On both my ESXi servers I have defined a user which has administrative privileges on these servers.
Username : cerebrum
Password: MyLittleSecret
Test the fencing mechanism outside of the cluster. Just a call to check status to verify that I am able
to log in to the systems using my cerebrum user.
Create the two new fence_vmware_soap fencing devices.

Since I am a secondary network with a shortname that does not correspond to neither the hostname
of the server noor the name of the vmware guest as it is registered on my ESXi hosts I need to create
a mapping between the shortname and the vmware guest name. To do that use the pcmk_host_map
property.
Because I have two ESXi servers I am setting some constraints just to be able to have the fencing
agent running on a server which is not on the same VMware host as the guest is running.
So now, in theory I should be able to fence one of my nodes, and it would use the power-off by the
fence_vmware_soap api
To verify that the vmware-shooters fail I need to remove the scsi-shooter so that it does not just
complete and interfere with my reasoning.
So now I have two vmware shooters which are configured to handle the vmware guests on each of
the ESXi servers.
Lets try it out.
Hmm strange, no route to host….

[On my ESXi host]
Find the ID of the vmware guest.
My poc-gfs-cl03 has ID 16.

From the logfile /scratch/log/hostd.log
From this first stage of the experiment I can see that finding the VM guest return Success, but turning
off power returns an error. The error is caused because the SOAP API is a restricted version (the free
version). So because the fence-vmware-soap fails my test is successful.
Now I re-create the scsi-shooter.
By the way, You may notice that the cluster resources stay online when I deleted the scsi-shooter. If
there was only one fencing device, and you delete that. All the cluster resources will stop.
Let’s add some levels to the fencing
As it is configured now. If I fence any of my nodes the system will first try the vmware-shooter, and if
that fail it will continue with the scsi_shooter.

So this works as expected. I still get the “Throw vim.fault.RestrictedVersion” error message in the
logfiles on my ESXi server, but the node that I am fencing does go down
GFS2
Time to add GFS2 resources to the cluster.
[on all nodes]
Make sure the lvm locking mechanism is set to 3 on the nodes
Create the distributed lock manager resource

Create the cluster lvm resource
Set up the start constraint for the two resources we have created. First the lock manager then the
clmv.
Create a constraint to make sure the two resources are co-located.
These are my disks that I am going to use as my GFS2 cluster storage.
These disks are presented over iscsi to all nodes in my cluster.
[on one node]
Create physical volumes of the disks

Let’s see how that turned out by running pvscan or pvs commands on any/all of the nodes in the
cluster.
Create the volumgroup. This is done on the operate system level and is not very different than what
you have always done.
The exceptions might be the –autobackup. Which you are normally adviced to switch off. I have not
found any documentation as to why this is set to yes, but this might be because this is going to be a
clustered volumegroup. I will not comment on why you should use the -–clustered y parameter.
I am going to use all available space in the volumegroup when creating the logical volume.
Creating the GFS2 filesystem

In the GFS2 cluster we are using the dlm locking mechanism [ -p lock_dlm] [–t
clustername:locktablename] [-j number of journals ].
The name of the lock table must be the name of the cluster. Otherwise you will not be able to mount
the file system in the cluster.
The number of journals. Set this to the number of nodes in your cluster. We are going to add some
nodes to our cluster later, so we will add more journals to the feilsystem when we add the nodes
For someone who is used to work with oracle databases the Journals in the filesystem can be seen as
redo logs. The journals is the mechanisms which will keep the filesystem consistent and intact after a
crash. With GFS2 the journals are taking up space inside your filesystem, and default they are taking
up 128MB pr journal. So with 4 nodes you are taking up 128MBx4=512MB for internal handling. You
can deviate from standard and reduce the journal size down to the minimum of 32MB. For a small
filesystem like mine that might make some sense, but I will just keep it at the default value.
This is the exiting point where you actually add the GFS2 resource to your cluster.
Explain the options parameters

This is the moment of truth. This is where you will see that you have a GFS2 cluster.
Add more disk to you GFS cluster
You may have noticed that in the previous section I did not use all of my physical volumes when I
created the gfsvolfs resource. That was intentional. I did that so that I can now show you how to add
more disk to your cluster right away.
As you see here the physical volume /dev/sdi is not assigned to any Volume Group (VG). Ican now
choose to create a new volume group, or extend my already existing volume group. Let’s just go
ahead and extend the existing volume group.
Extend the VG

Create a new logical volume
Create a new GFS2 filesystem

Create a new GFS2 cluster resource
Add the constraint so that the cvlm start before the GFS filesystem.
Moifying the custer
Add Nodes to the cluster

Before we start to add nodes to the cluster we need to make sure that the filesystems we use will
handle the additional nodes.
Remember that we set the number of journals on the filesystems to the number of nodes that exist
in our cluster.
You can find the number of journals defined in a gfs2 filesystem with the commands
[on one of the existing nodes]

Now we need to add journals before we add the nodes.
This must be done for each of the filesystems that are managed by the cluster.
Checklist prior to adding the node to the cluster.
 Minimal installation
 Updated to correct level
 Network configuration
 Firewall configuration
 Time synchronization
 Installed packages iscsi, pcs, gfs2-utils, lvm-cluster, fence-agents-scsi, fence-agents-vmware-
soap ,watchdog
 Configured the services unit files
 Locking type = 3
Adding the new node
Right now my cluster contains 4 cluster nodes.
We need to set the password of the hacluster user to the same password that is used on the other
clusternodes.
[on the new node]

Copy the /etc/corosync/authkey from an existing server to the new server
Make sure the lvm locking mechanism is set to 3 on the new node
if the locking_type is not set to three, you need to change the parameter.
Now you must authorize the new node into the cluster.
Adding the new node to the cluster


Check to see the current quorum state of the cluster
Because the new cluster node is not included in any fencing mechanism . The services will not start

Remember we set the pcmk_host_list when we generated the fencing mechanism.
There are possibilities to modify the fencing device. The downside of doing that is that all services
that are currently protected by the fencing device will be stopped.
In my case I have multiple fencing devices which are protecting my resources. What I can do, and this
will not stop the resources, is to delete the fencing devices and recreate them one by one with the
new node included.
Check to see the current state of the cluster.

That was fun. Let’s do it again. To add node number 6.
What I am going to do is to go through the routine one more time, and I will just show you that there
is a node number 6 in my cluster when I am done.
There we go. A 6 node cluster.
Setting a cluster node in standby mode
Even if one of my cluster nodes is now out of the cluster the number 6 node still takes part in the
quorum calculation.

Taking a cluster node out of standby mode

Banning a resource from a cluster node

In stead of setting a node in standby mode we are able to ban certain resources from running on a
specific node.
If I would like to make sure that the GFS2 filesystems will never run on my number 6 node I could
ban the GFS2 resources. But since we have already set p that the dlm resource must start first on the
node before nay GFS2 resources are started I am going to ban the dlm resource.
The warnign text is telling you that you have banned he resource forever, and this may have some
consequences if all other nodes goes down.
Removing the ban
To get the resource id we need to enter the constraint command with the –full parameter

Now you can remove the ban like it is a normal constraint
Moving a cluster resource.
Since my cluster is a opt-in cluster where all nodes are able to run all resources there are not very
many resources to move. But to show how it is done I am going to move the scsi-shooter resource.

You should take care with moving the resoures because they will become INFINITY.
So let’s have some fun. I have moved the scsi-shooter resource to my privcl04 node.
What will happen if I fence the number 4 node ?
Rest easy, your cluster remains operational, the fencing resource is still relocated to a different
cluster node, but what happens is that when the number 4 node is re-introduced to the cluster the
fencing device is relocated back to the number 4 node.
Therefor I am going to delete the move constraint for the fencing device.
Remove nodes from the cluster
My cluster now consist of 6 nodes.
When I examine the cluster I see that 6 nodes is one too many. So I am going to remove the number
6 node from the cluster.

To check that the cluster node is now really gone.
That’s it
Thank you for reading. I hope it was useful.
Kjell Erik Furnes


POC Linux7 Multinode GFS2 Cluster On CentOS 7.3

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

POC Linux7 Multinode GFS2 Cluster On CentOS 7.3

Hochgeladen von

Copyright:

Verfügbare Formate

Proof of concept Linux high availability GFS2 cluster

Building a multi-node High Availability

By: Kjell Erik Furnes

Written by Kjell Erik Furnes Page 1 of 69

Written by Kjell Erik Furnes Page 2 of 69

Vmware housekeeping .............................................................................................................. 31

Written by Kjell Erik Furnes Page 3 of 69

About the writer:

I am an Oracle database administrator, Linux administrator and Commvault Backup administrator

In my opinion, a two-node cluster is not production ready.

The cluster I am going to set up will be a cluster for GFS2.

Written by Kjell Erik Furnes Page 4 of 69

This document is not meant to replace the official installation documentation or

The document use the domain name example.com as defined at

RedHat is a registered trademark of Red Hat Inc

CentOS is a registered trademark of The CentOS Project

VMware is a registered trademark of Vmware in the United States and other

Written by Kjell Erik Furnes Page 5 of 69

My home test environment

Net Function Speed MTU

Host IP Function NIC’S

Written by Kjell Erik Furnes Page 6 of 69

Into my basic environment I am going to place a Linux cluster

Cluster name: cerebrum

poc-gfs-cl01 poc-gfs-cl02 poc-gfs-cl03 poc-gfs-cl04 poc-gfs-cl05

Public host name Public IP Private name Private IP

Common server configuration

Written by Kjell Erik Furnes Page 7 of 69

Written by Kjell Erik Furnes Page 8 of 69

find / -name core –exec rm –rf {} \;

find / -exec rm –rf {} \;

which is a good way to say find anything and delete it.

Lesson learned: Don’t cut and paste from a word document.

Written by Kjell Erik Furnes Page 9 of 69

Restrict the server, (part 1), ssh access

I have done this by installing a local yum repository server.

Update the yum config

Written by Kjell Erik Furnes Page 10 of 69

Update the server

and let the command complete.

Written by Kjell Erik Furnes Page 11 of 69

Written by Kjell Erik Furnes Page 12 of 69

Restrict the server, (part 2), Restrict host access

Linux jump host

Restrict the server (part 3), Restrict sudo

Written by Kjell Erik Furnes Page 13 of 69

Preparing the server

Written by Kjell Erik Furnes Page 14 of 69

Written by Kjell Erik Furnes Page 15 of 69

Estabilsh old notation of network interfaces

Alter the network interface card configuratig files

To get a list of which interface has which address

Written by Kjell Erik Furnes Page 16 of 69

Change ifcfg-ens192 to ifcfg-eth2

Change ifcfg-ens224 to ifcfg-eth1

Change ifcfg-ens256 to ifcfg-eth0

We need to append the following to the GRUB_CMDLINE_LINUX

Written by Kjell Erik Furnes Page 17 of 69

Written by Kjell Erik Furnes Page 18 of 69

Written by Kjell Erik Furnes Page 19 of 69

sudo chronyc sourcestats -v

Written by Kjell Erik Furnes Page 20 of 69

Written by Kjell Erik Furnes Page 21 of 69