Beruflich Dokumente
Kultur Dokumente
Proof of concept
February 2018
Contents
About the writer: ................................................................................................................................. 4
Prefix.................................................................................................................................................... 4
Disclaimer ............................................................................................................................................ 5
Credits ................................................................................................................................................. 5
Current environment........................................................................................................................... 6
My home test environment........................................................................................................ 6
Initial installation ................................................................................................................................. 7
Common server configuration .................................................................................................... 7
Initial warnings ................................................................................................................................ 9
Restrict the server, (part 1), ssh access ......................................................................................... 10
Repo server................................................................................................................................ 10
Update the yum config .................................................................................................................. 10
Update the server.......................................................................................................................... 11
Restrict the server, (part 2), Restrict host access .......................................................................... 13
TCP_WRAPPER........................................................................................................................... 13
Linux jump host ......................................................................................................................... 13
Restrict the server (part 3), Restrict sudo ..................................................................................... 13
Preparing the server .......................................................................................................................... 14
Estabilsh old notation of network interfaces ............................................................................ 16
Uninstall NetWorkManager ...................................................................................................... 16
Alter the network interface card configuratig files ....................................................................... 16
Modify grub ................................................................................................................................... 17
Disable IPv6 ............................................................................................................................... 17
Time synchronization .................................................................................................................... 18
Setting up chrony ...................................................................................................................... 18
sudo chronyc sourcestats -v ...................................................................................................... 20
Reboot ........................................................................................................................................... 20
Configure firewalld ........................................................................................................................ 20
systemd ......................................................................................................................................... 22
ISCSI ............................................................................................................................................... 23
Install iscsi package ....................................................................................................................... 23
Naming the ISCSI Initiator ......................................................................................................... 24
Connect to iscsi target ............................................................................................................... 28
Modify iscsi service.................................................................................................................... 29
I have worked on most of the normal platforms (HP-UX, AIX, TRU64, Linux and Windows) as a system
administrator and as an Oracle database administrator since 1991 (Oracle database version 6), and
SAP Basis administrator since 1998. I have worked in Norwegian governmental organizations and in
the Norwegian private sector.
When I am not clonking on a keyboard chances are that I might be out on some bike-ride.You can
follow me on STRAVA.
https://www.strava.com/athletes/2233700
Prefix
When I have written documents before that I plan to publish on the Internet I like to write about
something that has not been written much about before.
This time I am writing about Linux cluster, and that is a topic where there exist a very large quantity
of documents already on the Internet. However I am writing this document because clustering is
something that is constantly evolving. Also many of the documents you can find online is related to
two-node clusters, and there is some differences between a two-node cluster and a cluster that
consist of more than two nodes.
I am going to show you how to build a four node cluster which I am going to expand to a six node
cluster, and finally I am going to remove one node from the cluster.
In this document, I have altered some output to make it more readable. Mostly this means that I
have removed whitespace on lines to make the line fit on a single line where that has been possible.
Disclaimer
This document was written as a proof of concept to show the possibilities of a certain
configuration. The writer or anyone associated with the writer cannot be made liable for any
damage to your systems nor can they be made liable for any loss of data if you choose to
follow the concept shown in this document.
Trademarked names, logos, and images may appear in this document. Rather than to use a
trademark symbol with every occurrence of a trademarked name, logo or image, I use the names,
logos or images only in an editorial fashion and to the benefit of the trademark owner, with no
intention or infringement of the trademark.
The use in this document of trade names, trademarks, service marks, and similar terms, even if they
are not identified as such, is not to be taken as an expression of opinion as to whether or not they are
subject to proprietary rights.
While the advice and information in this document are believed to be true and accurate at the date of
writing, neither the writer nor anyone associated with the writer can accept any legal responsibility for
any errors or omissions that may be made.
The writer or anyone associated with the writer makes no warranty, expressed or implied, with respect
to the material contained within this document.
The comments and statements made in this document are solely those of the
writer and does not reflect the views or opinion of the writers employer or anyone
associated with the writer.
Credits
Linux is a registered trademark of Linux Torvalds
Current environment
NTP
DNS
amygdala cerebellum
sinister dextro
In my configuration I have chosen to separate the different types of network traffic. If you are
running your cluster on Fiber-Channel it is obvious that you would reduce the number of network
interfaces. But in addition I have separated the traffic so that I have cluster intercommunication
running on a separate network. This is not necessary. You are able to run the cluster private traffic
over the same network as your client traffic. In older days the cluster private interconnect was called
a heartbeat, and there was even a process called heartbeat. The cluster private interconnect is now
handled by the corosync process.
The cluster interconnect is quite chatty, and it is very time critical for the cluster responses. If your
system is I/O intensive to such an extent that you risk creating latency over a single network I advise
that you separate the cluster private interconnect on a separate network.
Initial installation
NTP
DNS
amygdala cerebellum
sinister dextro
All my servers that will be included in this document are vmware guest servers located on my ESXi
host servers.
Odd number guests are located on the sinister host, and even numbered guests are located on the
dexter host.
Both the public name and the private name are registered in my dns.
The IP address of the additional NICs will follow the last octet address of the public IP address. So a
server which has IP address of 10.47.253.101 will also have IP addresses, 171.20.16.101 and
172.16.20.101.
On each of my servers I have created one additional user apart from root. The user was created
during the installation, and the user is an administrator (member of the wheel group).
During the installation and configuration I will keep to the default settings as much as possible. I have
made some exceptions where it has been necessary to implement the configuration I want to
demonstrate, and some alterations in order to restrict the access to the server.
Initial warnings
I will repeat these warnings at the appropriate location within the document. I just list some of them
here to give you a heads up before you start reading.
History lesson: I once tried to remove all core dump files from a running production system. I had a
word document with the command
characterset difference caused the – before the name parameter to be void. The resulting command
turned out to be
I almost put a company out of business that day. The system was saved because the find started in
the /bin directory making the find command unable to fork and delete the rest of the system.
Warning:
I am not a big fan of disabling security measures. So because I am going to reduce security by running
selinux in Permissive mode I am going to increase security on my servers by other means.
Repo server
If you are planning to have an environment where you will be adding new servers into your cluster
configuration it is wise to be able to control the configuration and versions of packages to be
installed on your systems in order to have consistency over time.
If you don’t have a local yum repository server you can skip the configuration changes I am outlining
here.
13 [base]
14 name=CentOS-$releasever - Base
15 mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=os&infra=$infra
16 #baseurl=http://mirror.centos.org/centos/$releasever/os/$basearch/
17 gpgcheck=1
18 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
19
20 #released updates
21 [updates]
22 name=CentOS-$releasever - Updates
23 mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=updates&infra=$infra
24 #baseurl=http://mirror.centos.org/centos/$releasever/updates/$basearch/
25 gpgcheck=1
26 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
27
28 #additional packages that may be useful
29 [extras]
30 name=CentOS-$releasever - Extras
31 mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=extras&infra=$infra
32 #baseurl=http://mirror.centos.org/centos/$releasever/extras/$basearch/
33 gpgcheck=1
34 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
35
36 #additional packages that extend functionality of existing packages
37 [centosplus]
38 name=CentOS-$releasever - Plus
39 mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=centosplus&infra=$infra
40 #baseurl=http://mirror.centos.org/centos/$releasever/centosplus/$basearch/
41 gpgcheck=1
42 enabled=0
43 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
Change each occurance of the mirrorlist url to the url of your local server
baseurl=http://<reposerver>/centos/$releasever/updates/$basearch/
Since I am in control of my package repository I now know that the installed pacakges on my server is
the same for all of my installations, and it is time to start the modifications to prepare the server to
become a cluster server.
TCP_WRAPPER
As part of the update from the vanilla installation (from the DVD) our system now also has
TCP_WRAPPERS installed which will allow me to restrict which host are allowed to connect to my
server.
With these settings the only ssh connection in to my server is from my jump host, and I have disabled
a direct root acces over ssh.
Do a reboot now to start the server on the new kernel (there was a new kernel in the update )
Now I will restrict so that no user is allowed to become root by using su or sudo –i. I am also
restricting so that no users will be able to modify the /etc/sudoers file. Users must enter password
for sudo commands if they enter a command more than 3 minutes since last time they entered the
password ( set it to 0 to enforce password for every command ( make sure you are either good
friends with administrator colleges or not a friend at all before you set this to 0)). And all sudo
commands are logged.
To view the log of activities where the sudo command has been used run
In RHEL7 and clones there was a change in how the interfaces on the server was named. Traditionally
the network interface was named eth[0123]. With the release of version 7 there are five different
ways (schemes) to naming the interface as long as biosdevname is enabled (default). The system
process will do a cascading loop from 1 to 5.
Sheme 1: BIOS or firmware indexing for onboard devices (eno)
Scheme 2: BIOS or firmware indexing for PCI devices (ens)
Scheme 3: Naming by physical location of the device (enp)
Scheme 4: Naming by MAC address (enx)
Scheme 5: If all else fail, use traditional naming (eth).
From this list you see that even if I don’t have any IPv6 in my network the default is still to configure
local link IPv6 addresses for each NIC. These link local addresses can be viewed as localhost IP
addresses just to ensure that some local processes has a IPv6 address to talk to. I don’t want IPv6 on
my network so one of the first things I will do is to disable IPv6 completely. This will be done when
modifying the grub configuration file a little later.
To disable the new naming schemes of the interface you need to modify the parameters used during
the boot sequence.
NOTE: Prior to release 7.3 there was a compatibility issue between the system process and VMware
causing the interface naming to get an acpi index numbers during the boot sequence. (Actually you
would get a non-initialized parameter result of the acpi index). If you were lucky your interface would
get the same number after a boot, but you could also get a different interface name after a boot.
This is fixed in release 7.3
Source: https://thornelabs.net/2016/07/23/kickstart-centos-7-with-eth0-instead-of-predictable-
network-interface-names.html
Uninstall NetWorkManager
In order to replace the new notation of the network interface we need to remove the
NetworkManager package from the server
Copy and alter the interface scripts accordingly. I also create a backup file of the original file. (I also
remove the " basicly because it looks out of place)
Modify grub
In order to re-establish the old notation of the network interfaces we need to modify the
commandline for the boot loader.
The textfile for the configuration of the bootloader is located in the file
/etc/defaults/grub
net.ifnames=0 biosdevname=0
Disable IPv6
Since I also want to disable IPv6 I need to append the following to the same line
ipv6.disable=1
modify grub line GRUB_CMDLINE_LINUX append at end inside the quotation marks
when you are done the GRUB_CMDLINE_LINUX should look something like this:
Now you are done with modifying the grub textfile. Now you need to create a binary file which will
be read during the boot sequence.
Time synchronization
When setting up a cluster it is vital that the nodes in the cluster are time synchronized. To avoid
performance degredation with unnecessary inode updates because the nodes are unsynchronized.
RedHat recommend that you use the precision time protocol (PTP) provided in the linuxptp package,
but they also state that the cluster nodes must be time synchronized within a few minutes. I like to
think that this must be a fault in the documentation. If I am unable to keep my cluster nodes to
within the fraction of a second with normal NTP I would say that I have a network issue that needs to
be investigated and solved.
Since I am using VMware for my servers my network interfaces does not support PTP.
Setting up chrony
In my network I have set up one NTP server (10.47.253.9). This server is synchronized from the
outside world using a number of public NTP servers, but internally all my servers are synchronized by
just one server.
The configuration file for chrony is /etc/chrony.conf, and there is just a few lines I need to change for
my needs.
Verify that you are now using your local server for time synchronization.
To get an explanation of the different fields used in the output. Use the –v parameter
Reboot
Configure firewalld
If you are still not getting the correct zones. That is a known issue. Don’t worry you have done the
correct steps. We just need to tell the server that we really mean what we have just done.
Allmost finished.
systemd
For some time now many Linux distributions have replaced the boot sequence from using initd to
systemd.
As opposed to the System V init boot process (initd) which was serial the system boot process most
linux systems use now is a multithreaded parallel boot process which you will see boot your system a
lot faster. However, when you start a lot of processes in parallel you may risk that there may be
some dependencies that may be violated in the boot process unless you are carefull with the
configuration. By being carefull I mean that the processes must be configured with the necessary
dependencies in their startup configuration files.
This is exactly what happened with the iscsid process with the default installation. With my iscsi
installation. There is a missing dependency, which in some cases tries to log into your iscsi targets
before the network service is online.
Let me show you how this create errors in the systems, and how to correct the situation.
ISCSI
In an iSCSI network, each iSCSI element that uses the network has a unique and permanent iSCSI
name and is assigned an address for access.
ISCSI Qualified Names (IQN) names are prefixed with iqn, followed by the date the naming authority
was estabilished, and the reverse domain name of the naming authority, and postfixed with a unique
identifier for the device.
In my case I am my own naming authority, and I am using the domain name example.com.
For these servers I am using my 171 network for storage. So in order to be able to specify my storage
routing I need to configure a specific iscsi storage interfaces.
To find the needed information you need to extract the ip address and the hardware address for
your NIC.
Then we need to create a pseudo iscsi interface for the storage network.
If you want a more readable output. Try a different Print (info) level
The iser iscsi interface is a backward compatibility interface name. Don’t worry about it, and don’t
modify its settings. If you do so you will break backward compatibility.
In fact If you want to modify any iscsi interface you should create a new interface. If you modify the
settings of either the default or iser interface you will break backward compatibility.
Give the pseudo NIC the HARDWARE ADDRESS from the device config
And finally give the psudo NIC the IP address that corresponds to the NIC.
To get a full listing of the interface parameters use print level 0 –P0
Now we are ready to connect to the ISCSI portal (the storage unit) to receive a list of the provided
ISCSI targets.
Remember that when you run the discover command against a ISCSI portal it will give you all the
resources of that ISCSI portal on all NIC’s that the portal has.
My storage unit has 4 NICS, and even if I have specified that all IPv6 is disabled on my storage unit it
is still creating a default storage gateway on IPv6. So when I am running the discover command
against my initiators I am are getting a list of containing multiple targets on all storage unit NIC’s.
ISCSI targets
On my storage unit I have defined three targets which I will use as fencing devices in my cluster. Each
of these targets are connected to a LUN of 1G ( which is the smallest unit I can create ). The LUN’s is
created on three different diskgroups on my storage unit so that if one of the diskgroups or LUN’s fail
there will still be two more to ensure the integrity of my cluster.
I have created one target which has is connected to a number of LUN’s. This target is going to be my
connection to the disks which I will create my clustered GFS2 filesystems.
These are the targets that is defined on my ISCSI portal that I will use in my cluster
In order to disable traffic over unwanted interfaces I am going to modify these files. In my
configuration I have specified that the ISCSI traffic is going over my iface171 ISCIS interface witch has
a specified HARDWARE ADDRESS. But even so if we let the ISCSI process run free it will try to log on
to the portal with inbound route on 171 network and request data on the outbound route of the
10,172, and 173 network ( and over the IPv6 network which is really making a mess of things ).
So the default configuration is that the iscsi service is going to start each of the nodes on each of the
IP addresses that the storage unit is providing even if our interface is only providing a 171 address.
I still need my targets on the storage network. So I am going to change the config files back to
automatic for the needed targets having outbound routing to the 171.20.16.3 address to
node.statrup=automatic
Now I am ready to log on to the ISCSI portal to get the ISCSI targets
Now I have logged in to the Portal and I have received NAS provided disk
Now here comes the fun part. Iscsi “remember” the state of the session across boots. So If you have
an established iscsi session before a server boot that session will try to re-establish when the server
comes back up again.
However, after a boot my iscsi sessions were not re-established which ment I had to go error-
hunting.
I was able to re-establish the iscsi session manually if I ran a ISCSI login
Since I was able to log in to the portal I conclude that there was nothing wrong with the iscsi
configuration.
I found the error in the messages log of the server. There it was apparent that the iscsi process tried
to log in to the portal before the network was established. Therefor it was clear that in this version of
the ISCSI package a somewhat important omission has been made.
These are the unit files which is used for iscsi I am going to modify the controlling file in order to
control the boot sequence for the iscsi service.
I am going to modify the iscsi daemon configuration file by adding one line to the file
We need to insert a line in the iscsid.service file to specify thath the service requires the network to
be fully estabilished.
So by specifying Requires in the unit file we introduce a slightly longer boot sequence. We are now
waiting for a specific service to reach an established state before we initiate the start of the second
service, but we have instead gained sufficient control of the boot sequence that we know that
networking is established before we start the iscsid service.
Vmware housekeeping
You are now ready to start setting up your cluster. This is where you should take a snapshot of your
servers.
The initial configuration of my cluster is that it is going to be a 4 node cluster for GFS2 using ISCSI as
storage.
When the cluster is up and running I am going to add two more nodes to the cluster.
I am going to set constraints to the two new nodes so that the GFS2 cluster resources are not
presented to the new nodes.
Setting up the Linux cluster after you have made sure that you network configuration is correct is a
matter of installing the packages, and distributing the configuration files. Setting up a Linux cluster
has become a lot easier over the years. Even if the core principle is much the same today as it was 10
years ago the process is now a lot less manual, and a lot less error prone.
Remember that defining and administrating a cluster is an exercise in democracy. In a democracy the
key principle is that the majority rules. To have majority in a cluster is defined by the latin word
quorum. The noun loosely translates to the ability to perfom a votation and arrive at a qualified
majority.
In a Linux cluster each cluster member can be viewed as a senator, and each has one vote. There are
some tricks we can perform to give one of the members an additional vote but in general there is
one vote per member.
But what must the President (the cluster) do if the senate becomes fractioned, and both fractions
claim to be the true senate? In a Linux cluster this situation would be called a split-brain, and to avoid
total chaos this situation must be avoided. This is where you would enter a Quorum device ( a Vice
Precident ) who will carry an extra vote. If the senate then becomes fractioned in two equal parts.
Whichever side has the Vice Precident on their side wins.
If you have a system with multiple cluster members the Quorum device will be inefficient because
the Vice President would be overworked by constantly needing to keep updated with each and every
one of the senators. Therefor we establish the Republican Guard (the fencing device) who will be
responsible for kicking out misbehaving senators.
Therefor in two node cluster you would add a quorum device. This is a special device which also has
one vote. A two node cluster with a quorum device can be described as CS=(N+Q)-1.
In a cluster with more than 2 nodes you would normally not add a Quorum device because it does
not scale very well. Instead you incorporate a fencing device which will be responsible for evicting
faulty cluster members out of the cluster, and usually initiate a reboot of the offending node.
From this table it is obvious that a odd number of cluster nodes is preferred. And in my opinion the
nodes in the cluster should be higher that 4 to avoid having a situation where you are at a critical
state if you loose or do maintenance on one single node in your cluster.
There is one parameter auto_tie_breaker which can be used to set up the cluster so that you are
able to loose up to 50% of the nodes and still be operational. This parameter has a nasty history, and
in my opinion should not be used in a production environment.
If you have a small expensive even numbered hardware cluster. You could instead consider adding a
non-expensive virtual node into your cluster and create constraints so that the cluster node does not
carry any cluster resources. It’s only purpose would be to increase the quorum level. A server like this
is often known as a quorum server.
Installing the needed packages is very easy. There is basicly one pacakage that has all the
dependencies listed, so the only package you need to install is the pcs package. This package will take
care of all the other packages you need.
Executing this command is going to give you a list of some +85 dependencies that needs to be
installed.
In addition to the basic cluster packages this cluster is going to be a GFS2 cluster so we need to install
a few more packages
Before you start setting up encryption remember that this encryption of the control traffic between
the nodes. The traffic you are going to encrypt is the “Hey, I still here” traffic. This will ensure that
there are no possibilities for a man-in-the-middle attacker being able to fence your nodes, but it will
not encrypt the data stored on any of your filesystem nodes.
There is a small penalty in CPU cycles, but that is not really where it is going to hit you the most. By
encrypting the control traffic between the cluster nodes you are going to step away from the most of
the standard documentation you will find online, and chances are that you are going to forget a step
when you are creating new nodes in your cluster.
Most of the standard cluster documentation describes a procedure for adding new nodes to the
cluster where when you start the pcsd service on the new node the system will automatically
synchronize the configuration files to the new node.
Well guess what. The encryption key file is not part of the configuration which is going to be synced.
So in your routines you need to make sure that you copy the authkey file from one of the existing
clusternodes to the new node before you start the cluster services on the new node.
When generating the authkey the process is dependent on input from you to generate the entropy of
for the key generation. This means that you need to start typing, moving the mouse, and jump up
and down in a random fashion while the corosync-keygen Is running. As an alternative, you could
start a process on the machine where the process will generate the needed random traffic.
Now go back to the terminal window where you opened the dd process, and break that process by
pressing Ctrl+C.
You need to copy this file to the other nodes that will be part of your cluster.
When you install the cluster packages ( yum install pcs ) the installation created the hacluster user
with a disabled password.
This user is used for cluster inter communication so you need to set the same password for the user
on all nodes in your cluster.
As you can see from the corosync.conf cluster configuration file the cluster has no knowledge of that
file. We need to modify the configuration file, and push it out to the nodes.
On one of the cluster nodes. Add two lines to the configuration file.
The high load of the system is that the corosync process is set at a very high priority on the system.
So when a situation occurs where the cluster processes clvmd and libqb are spinning out of the
corosync has such high priority that it gets priority before other system tasks like network. It may
even block out critical kernel tasks.
The workaround to avoid this situation is to instruct the corosync process to run with standard
realtime priority instead of the highest realtiem priority by adding –p to the /etc/sysconfig/corosync
file.
Setting up fencing
Fencing agents
You also need to know what type of fencing mechanism you are going to use in your cluster.
Naturally you may install all fencing agents to avoid having to take a decision now, or you can install
only the agents you need.
To install only the fencing agents you need you can list the fencing agents
Or if you already know the agents you are going to install install only that agent. I am only going to
install the scsi fencing agent.
In my configuration I will be using scsi_fencing. Scsi fencing is based on a distributet TOTEM where
each node will continually update a key in order for the system to establish that the cluster node is
still present.
If you run the command vgdisplay you will see the devices with status “clustered yes”. An empty list
in the definition of the scsi fencing device will protect all devices which are clustered.
If you opt to name the devices in the definition of your fencing device and you miss one or more
devices you risk that the filesystems on these devices will be corrupt.
If you are going to use multipath for your storage there are some additional steps you should take.
1. Each of your cluster nodes must be identified by a cluster unique key ( I use the cluster node
number as key i.e privcl01 has key 1, privcl02 has key 2 etc)
2. You must change the path_checker from the default value of “tur” to readsector0 in the
devices section.
In the cluster documentation, they are a little cavalier with the use of the term “device”. This can
lead to some misunderstanding. The documentation say you need a separate device for each node.
The word device in this case means the software fencing thingy. You will need to create the software
fencing gizmo on the same shared disk devices for all nodes, but the fencing devices will be unique
for each of the nodes based on the value of the key and pcmk_host_list for each of the defined
fencing devices. The value of the pcmk_host_list parameter must be the name of the cluster node.
So for each of your cluster nodes you would define ( the commands can be run on one of the nodes):
Explaining the parameters used. There are additional parameters that can be set, but keeping it
simple is a good advice.
Parameter Description
pcmk_host_list A space separated host name list of the nodes that will be controlled
by this fencing device. This parameter is listed as an optional
parameter, but if your cluster traffic is running on a secondary
network like I am doing in this example your cluster may have
problems finding the shortname of the cluster nodes. Remember to
update this parameter if you are adding nodes to your cluster.
meta provides unfencing With the fence_scsi agent you must specify “meta provides
unfencing” to make sure the node is allowed to re-enter the cluster
after a reboot.
Notes for some additional parameters you often find in other documents online:
Because the fence-scsi device is limited to soft removal of the disk access of the node from the
cluster resources, and not rebooting the server I also need to install a watchdog service which will be
my kill-switch. If you do not implement a separate kill-switch you risk getting filesystem corruption
on your GFS2 filesystem. In addition there is no way to re-enter the node into the cluster without a
reboot.
This is especially true if you have an I/O intensive system and you use the default mount option of
the GFS resource “withdraw”. When using the withdraw option it can take a long time for the file
access to removed. Long after the node has been fenced and out of the cluster control.
Watchdog
Note: There are hardware devices on the marked which are known as watchdog devices.
They are not to be mistaken with the software watchdog that I am setting up here.
Make sure that you have set the kernel panic parameters correct.
In order for the watchdog.service to be able to monitor the fence_scsi process we need to create a
symlink into the /etc/watchdog.d directory.
will show that there are no differences, but according to the documentation you should link the
fence_scsi_check script.
The fencing mechanism I am setting up here is a software watchdog service. This is set up outside of
the cluster and is not known to the cluster. Just accept that the property say there is no watchdog.
If you want to verify the incompatibility between the watchdog service for the fence_scsi and selinux
in enforcing mode you may do that now.
The current selinux-policy does not allow the watchdog daemon to execute a script with the
context that exists on the fence_scsi watchdog scripts, and SELinux also does not allow several
actions carried out by those scripts. The result is watchdog will fail to run the enabled script, which
will trigger a reboot.
If you are affected by the bug you should see immediate results by in your log terminal showing that
the fence_scsi_check_hardreboot returns 1 followed by an immediate reboot of the server.
I know I said that the fence-agents-vmware-soap does not work on the free version of the VMware
ESXi which I have. However, having a fencing agent that returns an error is a good thing. Now I can
show you how to use fencing levels.
Fencing levels is the method of setting up a cascading fencing mechanism. If the first fencing device
fail, then continue to the next level. As mentioned the fence-agents-vmware-soap will not work, so I
will set this up as my first fencing mechanism. And that will fail (every time), which means the
fencing will continue to the next level. The next level in this case is my fence_scsi devices which I
know works as designed.
On both my ESXi servers I have defined a user which has administrative privileges on these servers.
Username : cerebrum
Password: MyLittleSecret
Test the fencing mechanism outside of the cluster. Just a call to check status to verify that I am able
to log in to the systems using my cerebrum user.
Since I am a secondary network with a shortname that does not correspond to neither the hostname
of the server noor the name of the vmware guest as it is registered on my ESXi hosts I need to create
a mapping between the shortname and the vmware guest name. To do that use the pcmk_host_map
property.
Because I have two ESXi servers I am setting some constraints just to be able to have the fencing
agent running on a server which is not on the same VMware host as the guest is running.
So now, in theory I should be able to fence one of my nodes, and it would use the power-off by the
fence_vmware_soap api
To verify that the vmware-shooters fail I need to remove the scsi-shooter so that it does not just
complete and interfere with my reasoning.
So now I have two vmware shooters which are configured to handle the vmware guests on each of
the ESXi servers.
From this first stage of the experiment I can see that finding the VM guest return Success, but turning
off power returns an error. The error is caused because the SOAP API is a restricted version (the free
version). So because the fence-vmware-soap fails my test is successful.
By the way, You may notice that the cluster resources stay online when I deleted the scsi-shooter. If
there was only one fencing device, and you delete that. All the cluster resources will stop.
As it is configured now. If I fence any of my nodes the system will first try the vmware-shooter, and if
that fail it will continue with the scsi_shooter.
So this works as expected. I still get the “Throw vim.fault.RestrictedVersion” error message in the
logfiles on my ESXi server, but the node that I am fencing does go down
GFS2
Set up the start constraint for the two resources we have created. First the lock manager then the
clmv.
Let’s see how that turned out by running pvscan or pvs commands on any/all of the nodes in the
cluster.
Create the volumgroup. This is done on the operate system level and is not very different than what
you have always done.
The exceptions might be the –autobackup. Which you are normally adviced to switch off. I have not
found any documentation as to why this is set to yes, but this might be because this is going to be a
clustered volumegroup. I will not comment on why you should use the -–clustered y parameter.
I am going to use all available space in the volumegroup when creating the logical volume.
In the GFS2 cluster we are using the dlm locking mechanism [ -p lock_dlm] [–t
clustername:locktablename] [-j number of journals ].
The name of the lock table must be the name of the cluster. Otherwise you will not be able to mount
the file system in the cluster.
The number of journals. Set this to the number of nodes in your cluster. We are going to add some
nodes to our cluster later, so we will add more journals to the feilsystem when we add the nodes
For someone who is used to work with oracle databases the Journals in the filesystem can be seen as
redo logs. The journals is the mechanisms which will keep the filesystem consistent and intact after a
crash. With GFS2 the journals are taking up space inside your filesystem, and default they are taking
up 128MB pr journal. So with 4 nodes you are taking up 128MBx4=512MB for internal handling. You
can deviate from standard and reduce the journal size down to the minimum of 32MB. For a small
filesystem like mine that might make some sense, but I will just keep it at the default value.
This is the exiting point where you actually add the GFS2 resource to your cluster.
This is the moment of truth. This is where you will see that you have a GFS2 cluster.
You may have noticed that in the previous section I did not use all of my physical volumes when I
created the gfsvolfs resource. That was intentional. I did that so that I can now show you how to add
more disk to your cluster right away.
As you see here the physical volume /dev/sdi is not assigned to any Volume Group (VG). Ican now
choose to create a new volume group, or extend my already existing volume group. Let’s just go
ahead and extend the existing volume group.
Extend the VG
Add the constraint so that the cvlm start before the GFS filesystem.
Remember that we set the number of journals on the filesystems to the number of nodes that exist
in our cluster.
You can find the number of journals defined in a gfs2 filesystem with the commands
This must be done for each of the filesystems that are managed by the cluster.
Minimal installation
Updated to correct level
Network configuration
Firewall configuration
Time synchronization
Installed packages iscsi, pcs, gfs2-utils, lvm-cluster, fence-agents-scsi, fence-agents-vmware-
soap ,watchdog
Configured the services unit files
Locking type = 3
We need to set the password of the hacluster user to the same password that is used on the other
clusternodes.
Make sure the lvm locking mechanism is set to 3 on the new node
if the locking_type is not set to three, you need to change the parameter.
Now you must authorize the new node into the cluster.
Because the new cluster node is not included in any fencing mechanism . The services will not start
There are possibilities to modify the fencing device. The downside of doing that is that all services
that are currently protected by the fencing device will be stopped.
In my case I have multiple fencing devices which are protecting my resources. What I can do, and this
will not stop the resources, is to delete the fencing devices and recreate them one by one with the
new node included.
What I am going to do is to go through the routine one more time, and I will just show you that there
is a node number 6 in my cluster when I am done.
Even if one of my cluster nodes is now out of the cluster the number 6 node still takes part in the
quorum calculation.
If I would like to make sure that the GFS2 filesystems will never run on my number 6 node I could
ban the GFS2 resources. But since we have already set p that the dlm resource must start first on the
node before nay GFS2 resources are started I am going to ban the dlm resource.
The warnign text is telling you that you have banned he resource forever, and this may have some
consequences if all other nodes goes down.
To get the resource id we need to enter the constraint command with the –full parameter
Since my cluster is a opt-in cluster where all nodes are able to run all resources there are not very
many resources to move. But to show how it is done I am going to move the scsi-shooter resource.
You should take care with moving the resoures because they will become INFINITY.
So let’s have some fun. I have moved the scsi-shooter resource to my privcl04 node.
Rest easy, your cluster remains operational, the fencing resource is still relocated to a different
cluster node, but what happens is that when the number 4 node is re-introduced to the cluster the
fencing device is relocated back to the number 4 node.
Therefor I am going to delete the move constraint for the fencing device.
When I examine the cluster I see that 6 nodes is one too many. So I am going to remove the number
6 node from the cluster.
That’s it