Int Eng Ilt Cmodetrbl Exerciseguide

MODULE 1: KERNEL
Exercise 1: Recovering from a boot loop

Time Estimate: 20 minutes
Step
1.
Action
Log in to the clustershell and execute the following command
cluster1::> cluster show
Node
Health
Eligibility
--------------------- ------- -----------cluster1-01
true
true
cluster1-02
false
true
cluster1-03
true
true
cluster1-04
true
true
4 entries were displayed.
2.
Note that the health of node clusterX-02 is false.

Try and log in to the nodeshell of clusterX-02 to find out the problem.
If unable to access nodeshell of clusterX-02, try and access it through its console.
What do you see?
3.
How do you fix this?
MODULE 2: M-HOST
Exercise 1: Fun with mgwd and mroot
Step
1.
Action
On a node which does not own epsilon log in as admin to your cluster via
console and go into systemshell.
::> set diag
::*> systemshell local
2.
Execute the following:

% ps -A|grep mgwd
913
??
Ss
0:11.76 mgwd -z
2794
p1
DL+
0:00.00 grep mgwd
The above listing shows that the process id of the running instance of mgwd on this
node is 913
Kill mgwd as follows
%sudo kill <pid of mgwd as obtained from above>
3.
You see the following? Why?

server closed connection unexpectedly: No such file or
directory
login:
Login as admin again as shon below:
server closed connection unexpectedly: No such file or
directory
login:admin
Password:
What happens ?
4.
You are now in clustershell. Drop to systemshell as follows:

::> set diag
::*> systemshell local

In systemshell execute the following:
% cd /etc
% sudo ./netapp_mroot_unmount
% exit
logout
When would we expect the node to use/need this script?
5.
Now you are back in clustershell. Execute the following:

cluster1::> set diag
Warning: These diagnostic commands are for use by NetApp

personnel only.
Do you want to continue? {y|n}: y
cluster1::*> cluster show

Node
Health
Eligibility
-------------------- ------- ------------
Epsilon
------------
cluster1-01
true
true
true
cluster1-02
true
true
false
cluster1-03
true
true
false
cluster1-04
true
true
false

cluster1::*> vol modify -vserver studentX -volume studentX_nfs
-size 45M
(volume modify)
Error: command failed: Failed to queue job 'Modify

studentX_nfs'. IO error in
local job store

Node
Health
Eligibility
Epsilon
-------------------- ------- ------------
------------
cluster1-01
true
false
true
cluster1-02
false
true
false
cluster1-03
false
true
false
cluster1-04
false
true
false
Do we see a difference in cluster show? If so, why? Whats broken?
6.
To fix this without rebooting and without manually re-mounting /mroot restart mgwd.
7.
Which phase in the boot process could we see this behavior occurring?
Exercise 2: Configuration backup and recovery

Action
1.
Run the following commands:

::> set advanced
::*> man system configuration backup create
::*> man system configuration recovery node
::*> man system configuration recovery cluster
::*> system configuration backup show node nodename
What do each of the commands show?
2.
Where in systemshell can you find the files listed above?
3.
Create a new system configuration backup of the node and the cluster as follows:
cluster1::*> system configuration backup create -node
cluster1-01 -backup-type
node -backup-name cluster1-01.node
[Job 164] Job is queued: Local backup job.
::*> job private show
::*> job private show id [Job id given as output of the
backup create command above]
::*> job private show -id [id as above] -fields uuid
::*> job store show -id [uuid obtained from the command above]
cluster1::*> system configuration backup create -node

cluster1-01 -backup-type
cluster -backup-name cluster1-01.cluster
[Job 495] Job is queued: Cluster Backup OnDemand Job.
::>job show
4.
The following KB shows how to scp the backup files you created, as well as one of
the system-created backups off to the Linux client:
https://kb.netapp.com/support/index?page=content&id=1012580
Use the following to install p7zip on your Linux client and use it to unzip the backup
files.
# yum install p7zip
This is the recommended practice on live nodes however for vsims scp does not
work.
So in the current lab setup ,drop to the systemshell and cd to
/mroot/etc/backups/config
Unzip the system created backup file by doing the following:
% 7za e [system created backup file name]
What is in this file?
cd into one of the folders created by the unzip. There will be another 7z file. Extract
it:
% 7za e [file name]
Whats in this file?
Extract the file:
% 7za e
[file name]
Whats inside of it?
Compare it to what is in /mroot/etc of one of the cluster nodes. What are some of the
differences?
5.
cd into cluster_config in the backup. What is different from

/mroot/etc/cluster_config on the node?
6.
cd into cluster_replicated_records at the root of the folder you originally extracted

the backup to and issue an ls command.
What do you see?
7.
Unzip the node and cluster backups you created. What do you notice about the
contents of these files?
Exercise 3: Moving mroot to a new aggregate

Step
1.
Action
Move a nodes root volume to a new aggregate.
Work with your lab partners and do this on only one node.
For live nodes the following KB contains the steps to do this:
https://kb.netapp.com/support/index?page=content&id=1013350&actp=LIST
However for vsims the root volume that is created by default is only 20MB and
too small to hold the cluster configuration information.
Hence follow the steps given below:
2.
Run the following command to create a new 3-disk aggregate on the desired node :
cluster1::> aggr create -aggregate new_root -diskcount 3 nodes local
[Job 276] Job succeeded: DONE
cluster1::> aggr show -nodes local
Aggregate
Size Available Used% State
RAID Status
#Vols Nodes
--------- -------- --------- ----- ------- ------ --------------- -----------aggr0_cluster1_02_0

900MB
15.45MB
98% online
1 cluster1-02
raid_dp,
normal
student2
raid_dp,
900MB
467.4MB
48% online
8 cluster1-02
normal
3.
Ensure that the node does not own an epsilon. If it does, run the following command
to move it to another node in the cluster:
Warning: These diagnostic commands are for use by NetApp

personnel only.

Node
Health
Eligibility
Epsilon
-------------------- ------- ------------
------------
cluster1-01
true
true
false
cluster1-02
true
true
true
cluster1-03
true
true
false
cluster1-04
true
true
false
Run the following command to move the epsilon and modify it to 'false' on the
owning node:
::*> cluster modify -node cluster1-02 -epsilon false
Then, run the following command to modify it to 'true' on the desired node:
::*> cluster modify -node cluster1-01 -epsilon true
::*> cluster show

Node
Health
Eligibility
Epsilon
-------------------- ------- ------------
------------
cluster1-01
true
true
true
cluster1-02
true
true
false
cluster1-03
true
true
false
cluster1-04
true
true
false
4.
Run the following command to set the cluster eligibility on the node to 'false':
::*> cluster modify -node cluster1-02 -eligibility false
Note: This action must be performed on a node that is not to be marked as ineligible.
5.
Run the following command to reboot the node into maintenance mode
cluster1::*> reboot local
(system node reboot)
Warning: Are you sure you want to reboot the node? {y|n}: y
login:
Waiting for PIDS: 718.
Waiting for PIDS: 695.
Terminated
.
Uptime: 2h12m14s
System rebooting...
\
Hit [Enter] to boot immediately, or any other key for command
prompt.
Booting...
x86_64/freebsd/image1/kernel data=0x7ded08+0x1376c0
syms=[0x8+0x3b7f0+0x8+0x274a
8]
x86_64/freebsd/image1/platform.ko size 0x213b78 at 0xa7a000
NetApp Data ONTAP 8.1.1X34 Cluster-Mode
Copyright (C) 1992-2012 NetApp.
All rights reserved.
md1.uzip: 26368 x 16384 blocks
md2.uzip: 3584 x 16384 blocks
*******************************
* Press Ctrl-C for Boot Menu. *

*
*******************************
^CBoot Menu will be available.
Generating host.conf.
Please choose one of the following:
(1) Normal Boot.

(2) Boot without /etc/rc.
(3) Change password.
(4) Clean configuration and initialize all disks.
(5) Maintenance mode boot.
(6) Update flash from backup config.
(7) Install new software first.
(8) Reboot node.
Selection (1-8)? 5
.
WARNING: Giving up waiting for mroot
Tue Sep 11 11:23:27 UTC 2012

*> Sep 11 11:23:28 [cluster1-02:kern.syslog.msg:info]: root
logged in from SP NONE
*>
6.
Run the following command to set the options for the new aggregate to become the
new root:
Note: It might be required to set the aggr options to CFO instead of SFO:
*> aggr options new_root root
aggr options: This operation is not allowed on aggregates with sfo HA
Policy
*> aggr options new_root ha_policy cfo

Setting ha_policy to cfo will substantially increase the client
outage during giveback for cluster volumes on aggregate new_root.
Are you sure you want to proceed? y
*> aggr options new_root root
Aggregate 'new_root' will become root at the next boot.
*>
7.
Run the following command to reboot the node:

*> halt
Sep 11 11:27:49 [cluster1-02:kern.cli.cmd:debug]: Command line
input: the command is 'halt'. The full command line is 'halt'.
.
Uptime: 6m26s
The operating system has halted.

Please press any key to reboot.
System halting...
\
Hit [Enter] to boot immediately, or any other key for command
prompt.
Booting in 1 second...
8.
Once the node is booted, a new root volume named AUTOROOT will be created. In
addition, the node will not be in quorum yet. This is because the new root volume
will not be aware of the cluster.
login: admin
Password:
***********************
**
SYSTEM MESSAGES
**
***********************
A new root volume was detected.

operational. Contact
This node is not fully
support personnel for the root volume recovery procedures.
cluster1-02::>
9.
Increase the size of AUTOROOT on the node by doing the following

Log in to the systemshell of a node which is in quorum and execute the following dblade zapis to
a) Get the uuid of volume AUTOROOT of the node where root volume was
changed
b) Increase the size of the same AUTOROOT volume by 500m
c) Check if the size is successfully changed
% zsmcli -H <cluster ip address of the node where new root
volume was created> d-volume-list-info-iter-start desiredattrs
=name,uuid
<results status="passed">
<next-tag>cookie=0,desired_attrs=name,uuid</next-tag>
</results>
% zsmcli -H <cluster ip address of the node where new root
volume was created> d-volume-list-info-iter-next maximumrecord
s=10
tag='cookie=0,desired_attrs=name,uuid'
<volume-attrs>
<d-volume-info>
<name>vol0</name>
<uuid>014df353-bbc1-11e1-bb4c123478563412</uuid>
</d-volume-info>
<d-volume-info>
<name>student2_root</name>
<uuid>044f53fa-e784-11e1-ab6e123478563412</uuid>
</d-volume-info>
<d-volume-info>
<name>student2_LS_root</name>
<uuid>0ea7ae4c-e790-11e1-ab6e-
123478563412</uuid>
</d-volume-info>
<d-volume-info>
<name>AUTOROOT</name>
<uuid>30d8f742-fc04-11e1-bbf5123478563412</uuid>
</d-volume-info>
<d-volume-info>
<name>student2_cifs</name>
<uuid>b8868843-e788-11e1-ab6e123478563412</uuid>
</d-volume-info>
<d-volume-info>
<name>student2_cifs_child</name>
<uuid>c07f13ce-e788-11e1-ab6e123478563412</uuid>
</d-volume-info>
<d-volume-info>
<name>student2_nfs</name>
<uuid>c861f83b-e788-11e1-ab6e123478563412</uuid>
</d-volume-info>
% zsmcli -H 192.168.71.33 d-volume-set-info desired-attrs=size
id=30d8f742-fc04-11e1-bbf5-123478563412 volume-attrs='[dvolume-info=[size=+500m]]'
<results status="passed"/>
% zsmcli -H 192.168.71.33 d-volume-list-info id=30d8f742-fc0411e1-bbf5-123478563412 desired-attrs=size
<volume-attrs>
<d-volume-info>
<size>525m</size>
</d-volume-info>
</volume-attrs>
</results>
10.
Clear the root recovery flags if required by doing the following:

Log in to the systemshell of the node where the new root volume was created and
check if the bootarg.init.boot_recovery bit is set
% sudo kenv bootarg.init.boot_recovery

If a value is returned, and it is not kenv: unable to get
bootarg.init.boot_recovery, clear the bit.
% sudo sysctl kern.bootargs=--bootarg.init.boot_recovery

kern.bootargs:
->
Check that the bit is cleared

% sudo kenv bootarg.init.boot_recovery
kenv: unable to get bootarg.init.boot_recovery
%
11.
From a healthy node, with all nodes booted, run the following command:
::*> system configuration recovery cluster rejoin -node <the node
where new root volume was created>
Warning: This command will rejoin node "cluster1-02" into the

local cluster, potentially overwriting critical cluster
configuration files. This command should only be used
to recover from a disaster. Do not perform any other
recovery
operations while this operation is in progress. This
command will cause node "cluster1-02" to reboot.
Node "cluster1-02" is rebooting. After it reboots, verify that
it joined the new cluster.
12.
After a boot, check the cluster to ensure that the node is back and eligible:
cluster1::> cluster show
Node
Health
Eligibility
--------------------- ------- -----------cluster1-01
true
true
cluster1-02
true
true
cluster1-03
true
true
cluster1-04
true
true
13.
If the cluster is still not in quorum, run the following command:

::*> system configuration recovery cluster sync <node where new root
volume was created>

Warning: This command will synchronize node "cluster1-02" with the
cluster configuration, potentially overwriting critical cluster
configuration files on the node. This feature should only be used to
recover from a disaster. Do not perform any other recovery
operations while this operation is in progress. This command will
cause all the cluster applications on node "node4" to restart,
interrupting administrative CLI and Web interface on that node.
All cluster applications on node "cluster1-02" will be restarted.
Verify that the cluster applications go online.
14.
After the node is in quorum, run the following command to add the new root vol to
VLDB. This is necessary because it is a 7-Mode volume and will not be
displayed until it is added:
cluster1::*> vol show -vserver cluster1-02
(volume show)
Vserver
Volume
Available Used%
Aggregate
State
Type
Size
--------- ------------ ------------ ---------- ---- ---------- --------- ----cluster1-02

vol0
aggr0_cluster1_02_0
online
283.3MB
RW
851.5MB
66%
cluster1::*> vol add-other-volumes -node cluster1-02

(volume add-other-volumes)

(volume show)
Vserver
Volume
Available Used%
Aggregate
State
Type
Size
--------- ------------ ------------ ---------- ---- ------------------- ----cluster1-02

AUTOROOT
379.2MB
27%
new_root
online
RW
525MB
RW
851.5MB
cluster1-02
vol0
aggr0_cluster1_02_0
online
283.3MB
66%
15.
Run the following command to remove the old root volume from VLDB
cluster1::*> vol remove-other-volume -vserver cluster1-02 volume vol0
(volume remove-other-volume)

(volume show)
Vserver
Volume
Available Used%
Aggregate
State
Type
Size
--------- ------------ ------------ ---------- ---- ------------------- ----cluster1-02

379.2MB
16.
AUTOROOT
27%
new_root
online
RW
525MB
Destroy the old root vol by running the following command from the node shell of the
node where the new root volume has been created
cluster1::*> node run local
Type 'exit' or 'Ctrl-D' to return to the CLI
cluster1-02> vol status vol0
Volume State
vol0 online
Status
Options
raid_dp, flex
nvfail=on
64-bit
Volume UUID: 014df353-bbc1-11e1-bb4c123478563412
Containing aggregate: 'aggr0_cluster1_02_0'
cluster1-02> vol offline vol0
Volume 'vol0' is now offline.
cluster1-02> vol destroy vol0
Are you sure you want to destroy volume 'vol0'? y
Volume 'vol0' destroyed.
And the old root aggr can be destroyed if desired:
From cluster shell:
cluster1::*> aggr show -node <node where new root vol was
created>
Aggregate
Size Available Used% State
RAID Status
#Vols Nodes
--------- -------- --------- ----- ------- ------ --------------- -----------aggr0_cluster1_02_0

900MB
899.7MB
0% online
0 cluster1-02
900MB
371.9MB
59% online
1 cluster1-02
900MB
467.2MB
48% online
8 cluster1-02
raid_dp,
normal
new_root
raid_dp,
normal
student2
raid_dp,
normal
cluster1::*> aggr delete -aggregate <old root aggregate name>
Warning: Are you sure you want to destroy aggregate

"aggr0_cluster1_02_0"?
{y|n}: y
[Job 277] Job succeeded: DONE
17.
Use the following KB rename the root volume(AUTOROOT) to vol0

https://kb.netapp.com/support/index?page=content&id=2015985
18.
What sort of things regarding the root vol did you observe during this?
Exercise 4: Locate and Repair Aggregate Issues

Action
1.
Login to clustershell of clusterX and execute the following:

::> aggr show -aggregate VLDBX (team member 1 use X=1 and team
member 2 use X = 2)
There are no entries matching your query.
One aggregate is showing as missing from the cluster shell:

::> aggr show -aggregate WAFLX -instance
Aggregate: WAFLX
Size: Used Size: Used Percentage: Available Size: State: unknown
Nodes: cluster1-02
Another aggregate is showing as unknown:
Fix the issue.
2.
Issue the following command. Do you see anything wrong?

::*> debug vreport show aggregate
3.
What nodes do the aggregates belong to? How do you know?
4.
Use the debug vreport fix command to resolve the problem.
5.
List some of the reasons why customers could have this problem.
6.
Was any data lost? If so, which aggregate?
Exercise 5: Replication failures

Action
1.
Note:Participants working with cluster2 should replace student1 with student3

and student2 with student4 in all the steps of this exercise
Log in to systemshell clusterX-02 (make sure it does not own epsilon)

Unmount mroot and clus and prevent mgwd from being monitored by spmctl, as
follows:
% sudo umount -f /mroot
% sudo umount -f /clus
% spmctl -d -h mgwd
2.
Login to ngsh on clusterX-02 and execute the following:

cluster1::*> volume create -vserver student1 -volume test aggregate
Info: Node cluster1-01 that hosts aggregate aggr0 is offline
Node cluster1-03 that hosts aggregate
aggr0_cluster1_03_0 is offline
Node cluster1-04 that hosts aggregate
aggr0_cluster1_04_0 is offline
Node cluster1-01 that hosts aggregate student1 is
offline
aggr0
aggr0_cluster1_04_0
new_root
aggr0_cluster1_03_0
student1
student2
cluster1::*> volume create -vserver student1 -volume test aggregate student2

Error: command failed: Replication service is offline
cluster1::*> net int create -server student1 -lif test -role
data -home-node cluster1-02 -home-port e0c -address
10.10.10.10 -netmask 255.255.255.0 -status-admin up

(network interface create)
Info: An error occurred while creating the interface, but a
new routing group
d10.10.10.0/24 was created and left in place
Error: command failed: Local unit offline
cluster1::*> vserver create -vserver test -rootvolume test aggregate student1 -ns-switch file -rootvolume-securitystyle unix
Info: Node cluster1-01 that hosts aggregate student1 is
offline
Error: create_imp: create txn
failed
command failed: Local unit offline
3.
Login to ngsh on clusterX-01 and execute the following:

cluster1::> volume create test -vserver student2 -aggregate
Info: Node cluster1-02 that hosts aggregate new_root is
offline
Node cluster1-02 that hosts aggregate student2 is
offline
aggr0
aggr0_cluster1_04_0
new_root
aggr0_cluster1_03_0
student1
student2

student2 -size 20MB
offline
Error: command failed: Failed to create the volume because
cannot determine the
state of aggregate student2.
student1 -size 20MB
[Job 368] Job succeeded: Successful
Note: when a volume is created on an aggregate not hosted on clusterX-02 ,
the volume create succeeds
cluster1::> net int create -vserver student1 -lif data2 -role
data -data-protocol nfs,cifs,fcache -home-node cluster1-02
-home-port e0c -address 10.10.10.10 -netmask 255.255.255.0

Info: create_imp: Failed to create virtual interface
Error: command failed: Routing group d10.10.10.0/24 not found
cluster1::> net int create -vserver student1 -lif data2 -role
data -data-protocol nfs,cifs,fcache -home-node cluster1-01
-home-port e0c -address 10.10.10.10 -netmask 255.255.255.0
Note: when an interface is created on port not hosted on clusterX-02 the
interface create succeeds
offline
Error: create_imp: create txn
failed
command failed: Local unit offline

Note: when a vserver is created and its root volume is created an aggregate
that is not hosted on clusterX-02 the vserver create succeeds
4.
Log in to systemshell of clusterX-02.

cluster1-02% mount
/dev/md0 on / (ufs, local, read-only)
devfs on /dev (devfs, local)
/dev/ad0s2 on /cfcard (msdosfs, local)
/dev/md1.uzip on / (ufs, local, read-only, union)
/dev/md2.uzip on /platform (ufs, local, read-only)
/dev/ad3 on /sim (ufs, local, noclusterr, noclusterw)
/dev/ad1s1 on /var (ufs, local, synchronous)
procfs on /proc (procfs, local)
/dev/md3 on /tmp (ufs, local, soft-updates)

/mroot/etc/cluster_config/vserver on /mroot/vserver_fs
vserverfs, union)
Note that /mroot and /clus are not mounted
From systemshell of clusterX-02 run following commands:

% rdb_dump
What do you see?

%tail -100 /mroot/etc/mlog/mgwd.log |more
What do you see?
Log in to systemshell of cluster-01 and run the following command
%tail -100 /mroot/etc/mlog/mgwd.log |more
What do you see?
6.
From systemshell of clusterX-02 run:

%spmctl
What do you see?
6.
7.
What happened?
Fixing these issues:

a) Re-add mgwd to spmctl with:
% ps aux | grep mgwd
root 779 0.0 17.6 303448 133136 ?? Ss 1:53PM 0:44.12 mgwd -z
diag 3619 0.0 0.2 12016 1204 p2 S+ 4:39PM 0:00.00 grep mgwd
% spmctl -a -h mgwd -p 779
b) Then restart mgwd which will mount /mroot and /clus
% sudo kill <PID>
Exercise 6: Troubleshooting Autosupport

Action
1.
From clustershell of each node send a test autosupport as follows: (y takes the
values 1,2,3,4)
::*> system autosupport invoke -node clusterX-0y -type test
You will see an error such as:
Error: command failed: RPC: Remote system error - Connection refused
2.
Lets find out Why?

Connection refused means that we couldn't talk to the application for some reason.
In this case, notifyd is the application.
When we look at systemshell for the process, it's not there:
cluster1-01% ps aux | grep notifyd
diag 5442 0.0 0.2 12016 1160 p0 S+ 9:20PM 0:00.00 grep notifyd
3.
spmctl manages notifyd

We can check to see why spmctl didn't start notifyd back up:
cluster-1-01% cat spmd.log | grep -i notify
0000002e.00001228 0002ba73 Tue Aug 09 2011 21:26:31 +00:00
[kern_spmd:info:739]
0x800702d30: INFO: spmd::ProcessController:
sendShutdownSignal:process_controller.cc:186 sending SIGTERM to
5498:
0000002e.00001229 0002ba73 Tue Aug 09 2011 21:26:31 +00:00
0x8007023d0: INFO: spmd::ProcessWatcher: _run:process_watcher.cc:152
kevent
returned: 1
0000002e.0000122a 0002ba73 Tue Aug 09 2011 21:26:31 +00:00
0x8007023d0: INFO: spmd::ProcessControlManager:
dumpExitConditions:process_control_manager.cc:732 process
(notifyd:5498) exited on
signal 15
0000002e.0000122b 0002ba7d Tue Aug 09 2011 21:26:32 +00:00
0x8007023d0: INFO: spmd::ProcessWatcher: _run:process_watcher.cc:148
wait for
incoming events.
And then we check spmctl to see if it's still monitoring notifyd:
cluster-1-01% spmctl | grep notify
In this case, it looks like notifyd got removed from spmctl and we need to re-add it:
cluster-1-01% spmctl -e -h notifyd
cluster-1-01% spmctl | grep notify
Exec=/sbin/notifyd -n;Handle=56548532-c334-4633-8cd877ef97682d3d;Pid=15678;State=Running
cluster-1-01% ps aux | grep notify
root 15678 0.0 6.7 112244 50568 ?? Ss 4:06PM 0:02.42 /sbin/notifyd
diag 15792 0.0 0.2 12016 1144 p2 S+ 4:06PM 0:00.00 grep notify
4.
Try to send a test autosupport.

::*> system autosupport invoke -node clusterX-0y -type test
What happens?
MODULE 3: SCON
Exercise 1: Vifmgr and MGWD interaction
Step
1.
Action
Try to create an interface:
clusterX::*> net int create -vserver studentY -lif test -role
data -data-protocol nfs,cifs,fcache -home-node clusterX-02 home-port
You see the following error:
Warning: Unable to list entries for vifmgr on node clusterX02. RPC: Remote
system error - Connection refused
{<netport>|<ifgrp>}
2.
Home Port
Ping interfaces of clusterX-02 the node whose ports seem inaccessible

clusterX::*> cluster ping-cluster
-node clusterX-02
What do you see?
3.
Perform data access:

Attempt cifs access to \\student2\student2(cluster1) or \\student4\student4(cluster2)
from the windows machine
What happens?
4.

clusterX::*> net int show
What do you see?
5.
6.
Run net port show

clusterX::*> net port show
What do you see?
Check the system logs:
clusterX::*> debug log files modify -incl-files vifmgr,mgwd
clusterX::*> debug log show node clusterX-02 timestamp Mon
Oct 10*
What do you see?
7.
Log in to systemshell on clusterX-02 and run ps to see if vifmgr is running:

clusterX-02% ps -A |grep vifmgr
8.
9.
Run rdb_dump from systemshell of clusterX-02

clusterX-02% rdb_dump
What do you see?
Run the following from systemshell of clusterX-02:
clusterX-02% spmctl | grep vifmgr
What do you see
10.
In cluster shell execute cluster ring show

clusterX::*> cluster ring show
11.
What is the Issue?

How do you fix it?
Exercise 2: Duplicate lif IDs

Step
Action
From the clustershell create a new network interface as follows: Y E {1,2,3,4}
1.
clusterX::*> net int create -vserver studentY -lif data1 role data -data-protocol nfs,cifs,fcache -home-node
clusterX-0Y -home-port e0c -address 192.168.81.21Y -netmask
255.255.255.0 -status-admin up
Info: create_imp: Failed to create virtual interface

Error: command failed: Duplicate lif id
2.

clusterX::*> net int show
What do you see?
3.
View the mgwd log file on the node where you are giving the net int create
command and determine the lifid which is eing reported as duplicate
4.
clusterX::*>debug smdb table vifmgr_virtual_interface show
-node clusterX-0* -lif-id [lifid/vifid determined from step
3]
What do you see?
5.

clusterX::*> debug smdb table vifmgr_virtual_interface delete
-node clusterX-0Y lif-id <the duplicate id >
clusterX::*> debug smdb table vifmgr_virtual_interface show node clusterX-0Y -lif-id <the duplicate id>
There are no entries matching your query.
5.
Create new lif:

clusterX::*> net int create -vserver studentY -lif testY role data -data-protocol
nfs,cifs,fcache -home-node clusterX-0Y -home-port e0c address 192.168.81.21Y -netmask 255.255.255.0 -status-admin
up
MODULE 4: NFS
Exercise 1: Mount issues
Step
1.
Action
From the Linux Host execute the following:
#mkdir /cmodeY
#mount studentY:/studentY_nfs /cmodeY
You See the following:
mount: mount to NFS server 'studentY' failed: RPC Error:
Program not registered.
2.
Find out the node being mounted:

From the Linux Host execute the following to find the IP address being accessed:
#ping studentY
PING studentY (192.168.81.115) 56(84) bytes of data.
64 bytes from studentY (192.168.81.115): icmp_seq=1 ttl=255
time=1.09 ms
From the clustershell use the following to find out the current node and port on which
the above IP address is hosted
clusterX::*> net int show -vserver studentY -address
192.168.81.115 -fields curr-node,curr-port
(network interface show)
vserver
lif
curr-node
curr-port
-------- -------------- ----------- --------studentY studentY_data1 clusterX-01 e0d
3.
Execute the following to start a packet trace from the nodeshell of the node that was
being mounted and attempt the mount once more
clusterX::*> run -node clusterX-01
Type 'exit' or 'Ctrl-D' to return to the CLI
clusterX-01> pktt start e0d
e0d: started packet trace
From the Linux Host attempt the mount once more as shown below:
# mount student1:/student1_nfs /cmode1

Back in the nodeshell of the node that was mounted dump and stop the packet trace
clusterX-01> pktt dump e0d
clusterX-01> pktt stop e0d
e0d: Tracing stopped and packet trace buffers released.
From the systemshell of the node where the packet trace was captured view the
packet trace using tcpdump
clusterX-01> exit
logout
clusterX::*> systemshell -node clusterX-01

clusterX-01% cd /mroot
clusterX-01% ls
e0d_20120925_131928.trc home
etc
vserver_fs
trend
clusterX-01% tcpdump r e0d_20120925_131928.trc
What do you see? Why?
4.
How do you fix the issue?
5.
After fixing the issue check that the mount is successful.

Note:If the mount succeeds please unmount.This step is very important or the
rest of the exercises will be impacted
Exercise 2: Mount and access issues

Step
1.
Action
From the Linux Host attempt to mount volume studentX_nfs.
# mount studentX:/studentX_nfs /cmode

mount: studentX:/studentX_nfs failed, reason given by server:
Permission denied
2.
From clustershell execute the following to find the export policy associated with the
volume studentX_nfs:
cluster1::*> vol show -vserver studentX -volume studentX_nfs
instance
Next use the export-policy rule show to find the properties of the export policy
associated with the volume studentX_nfs
Why did you get an access denied error?
How will you fix the issue
3.
Now once again attempt to mount studentX_nfs from the Linux Host
mount: studentX:/studentX_nfs failed, reason given by server:
No such file or directory
What issue is occurring here?
4.
Now once again attempt to mount studentX_nfs from the Linux Host
Is the mount successful?
If yes, cd into the mount point
#cd /cmode
-bash: cd: /cmode: Permission denied
How do you resolve this?
Note: Depending on how you resolved the issue with the export-policy in step
1 you may not see any error here.In that case move on to step 4
If you unmount and remount, does it still work?
5.
Try to write a file into the mount
[root@nfshost cmode]# touch f1
What does ls la show?
[root@nfshost cmode]# ls -la

total 16
drwx------
2 admin admin 4096 Sep 25 08:06 .
drwxr-xr-x 26 root
-rw-r--r--
root
1 admin admin
drwxrwxrwx 12 root
root
4096 Sep 25 06:03 ..

0 Sep 25 08:06 f1
4096 Sep 25 08:05 .snapshot
What do you see the file permissions as?

Why are the permissions and owner set the way they are?
6.
From clustershell Execute:

clusterX::> export-policy rule modify -vserver studentY policyname studentY -ruleindex 1 -rorule any -rwrule any
(vserver export-policy rule modify)
Exercise 3: Stale file handle

Step
1.
Action
From the Linux Host execute:
# cd /nfsX
-bash: cd: /nfsX: Stale NFS file handle
2.
Unmount the volume from the client and try to re-mount. What happens?
3.
From the Linux Host:

# ping studentX
PING studentX (192.168.81.115) 56(84) bytes of data.
The underlined IP above is the IP of vserver being mounted.
Find the node in the cluster that is currently hosting this IP
From your clustershell
::*> net int show -address 192.168.81.115 -fields curr-node
(network interface show)
vserver
lif
curr-node
-------- -------------- ----------studentX studentX_data1 clusterY-0X

The node underlined above is the node that is currently hosting the IP.
Log in to the systemshell of this node and view the vldb logs
cluster1::*> systemshell -node clusterY-0X
cluster1-01% tail /mroot/etc/mlog/vldb.log
What do you see?
4.
Look for volumes with the MSID in the error shown in the vldb log as follows:
From clustershell execute the following to find the aggregate where the volume
being mounted(nfs_studentX) lives and on which node that aggregate lives:
cluster1::*> vol show -vserver studentX -volume nfs_studentX fields aggregate
(volume show)
vserver
volume
aggregate
-------- ------------ --------studentX nfs_studentX studentX

cluster1::*> aggr show -aggregate studentX -fields nodes
aggregate nodes
--------- ----------studentx
clusterY-0X
Go to nodeshell of the node (underlined above) that hosts the volume and its
aggregate and use the showfh command and convert the msid from hex.
::>run node clusterY-0X
>priv set diag
*>showfh /vol/nfs_studentX
flags=0x00 snapid=0 fileid=0x000040 gen=0x5849a79f
fsid=0x16cd2501 dsid=0x0000000000041e msid=0x00000080000420
0x00000080000420 converted to decimal is 2147484704

Exit from nodeshell back to clustershell abd execute debug vreport show in diag
mode:
cluster1-01*> exit
logout
cluster1::*> debug vreport show

What do you see?
5.
What is the issue here?
6.
How would you fix this?
MODULE 5: CIFS
Instructions to Students:
As mentioned in the lab handout the valid windows users in the domain
Learn.NetApp.local are:
a) Administrator
b) Student1
c) Student2
Exercise 1: Using diag secd

Step
1.
Action
Find the node where the IP(s) for vserver studentX is hosted
From the RDP machine do the following to start a command window
Start->Run->cmd
In the command window type
ping studentX
From the clustershell find the node on which the IP is hosted (Refer to NFS Exercise
3)
Login to the console of that node and execute the steps of this exercise
2.
Type the following:

::> diag secd
What do you see and why?
3.
Note: for all the steps of this exercise clusterY-0X should be the name of the
local node
Type the following to verify the name mapping of windows user student1 ,.
::diag secd*> name-mapping show -node local -vserver

studentX -direction win-unix -name student1
4.
From the RDP machine do the following to access a cifs share

Start -> Run -> \\studentX
Type the following to query for the Windows SID of your windows user name
cluster1::diag secd*> authentication show-creds -node local
-vserver studentX -win-name <username that you have used
to RDP to the windows machine>
DC Return Code: 0
Windows User: Administrator Domain: LEARN Privs: a7
Primary Grp: S-1-5-21-3281022357-2736815186-1577070138-513
Domain: S-1-5-21-3281022357-2736815186-1577070138
Rids: 500, 572, 519, 518, 512, 520, 513
Domain: S-1-5-32 Rids: 545, 544
Domain: S-1-1 Rids: 0
Domain: S-1-5 Rids: 11, 2
Unix ID: 65534, GID: 65534
Flags: 1
Domain ID: 0
Other GIDs:
cluster1::diag secd*> authentication translate -node local vserver student1 -win-name <username that you have used to RDP
to the windows machine>
S-1-5-21-3281022357-2736815186-1577070138-500
5.
Type the following to test a Windows login for your user windows name in diag secd
cluster1::diag secd*> authentication login-cifs -node
local -vserver studentX -user <username that you have
used to RDP to the windows machine>
Enter the password: <your windows password i.e Netapp123>

Windows User: Administrator Domain: LEARN Privs: a7
Primary Grp: S-1-5-21-3281022357-2736815186-1577070138-513
Domain: S-1-5-21-3281022357-2736815186-1577070138
Rids: 500, 513, 520, 512, 518, 519, 572
Domain: S-1-1 Rids: 0
Domain: S-1-5 Rids: 11, 2
Domain: S-1-5-32 Rids: 544
Unix ID: 65534, GID: 65534

Flags: 1
Domain ID: 0
Other GIDs:
Authentication Succeeded.
6.
Type the following to view active CIFS connections in secd

cluster1::diag secd*> connections show -node clusterY-0X vserver studentX
[ Cache: NetLogon/learn.netapp.local ]
Queue> Waiting: 0, Max Waiting: 1, Wait Timeouts: 0, Avg
Wait: 0.00ms
Performance> Hits: 0, Misses: 1, Failures: 0, Avg
Retrieval: 24505.00ms
(No connections active or currently cached)
[ Cache: LSA/learn.netapp.local ]
Wait: 0.00ms
[ Cache: LDAP (Active Directory)/learn.netapp.local ]

Wait: 0.00ms

Type the following to clear active CIFS connections in secd
cluster1::diag secd*> connection clear -node clusterY-0X
vserver studentX
Test connections on vserver student1 marked for removal.

NetLogon connections on vserver student1 marked for
removal.
LSA connections on vserver student1 marked for removal.
LDAP (Active Directory) connections on vserver student1
marked for removal.
LDAP (NIS & Name Mapping) connections on vserver student1
marked for removal.
NIS connections on vserver student1 marked for removal.
7.
Type the following to view the server discovery information

cluster1::diag secd*> server-discovery show-host -node
clusterY-0X
Host Name: win2k8-01

Cifs Domain:
AD Domain:
IP Address: 192.168.81.10
Host Name: win2k8-01

Cifs Domain:
AD Domain:
IP Address: 192.168.81.253
Type the following to achieve the same result as ONTAP 7Gs cifs resetdc
cluster1::diag secd*> server-discovery reset -node
clusterY-0X -vserver studentX
Discovery Reset succeeded for Vserver: student1
To verify type the following:
cluster1::diag secd*> server-discovery show-host -node
clusterY-0X
Discovery Reset succeeded for Vserver: studentX
Type the following to achieve the same result as ONTAP 7Gs
cifs testdc?
cluster1::diag secd*> server-discovery test -node clusterY0X -vserver studentX

Discovery Global succeeded for Vserver: studentX
8.
Type the following to view current logging level in secd

cluster1::diag secd*> log show -node clusterY-0X
Log Options
---------------------------------Log level:
Debug
Function enter/exit logging:
OFF
Type the following to set and view the current logging level in secd
cluster1::diag secd*> log set -node clusterY-0X -level err
Setting log level to "Error"
cluster1::diag secd*> log show -node clusterY-0X

Log Options
---------------------------------Log level:
Function enter/exit logging:
9.
Error
OFF
Type the following to enable tracing in secd to capture the logging level specified
cluster1::diag secd*> trace show -node local
Trace Spec
--------------------------------------Trace spec has not been set.
cluster1::diag secd*> trace set -node cluster1-01 -traceall yes
Trace spec set successfully for trace-all.
cluster1::diag secd*> trace show -node cluster1-01

Trace Spec
---------------------------------------
TraceAll:
10.
Tracing all RPCs
Type the following to check secd configuration for comparison with the ngsh
settings?
cluster1::diag secd*> config query -node local -source-name
cifs-server
account
kerberos-realm
machine-
nis-domain
to-name
vserver
vserverid-
unix-group-membership local-unix-user
group
local-unix-
kerberos-keyblock
client-config
ldap-config
ldap-
ldap-client-schema
kerberos
name-mapping
nfs-
cifs-server-security
dns
virtual-interface
routing-
cifs-server-options
cifs-preferred-dc
group-routes
secd-cache-config
cluster1::diag secd*> configuration query -node local source-name machine-account

vserver: 5
cur_pwd:
0100962681ce82e2d6da20df35ce86964fea2c495d9609d395a51994
31d3d4531144f845fcfd675e15143fe76932ced271ddcf57c9d8fe59
a63b0bc68f717077fc88ca28aa0fdbba4b8d8509bb25ebe2
new_pwd:
installdate: 1345202770
sid: S-1-5-21-3281022357-2736815186-1577070138-1609
vserver: 6
cur_pwd:
01433517c8acbbf66c2e287b4bee56f5d8b707cfb69710737bfb2061
6ebe61fc31163acde2b5a827f3c2d395b89fef15f28a8f514c147906
580cbaa30b4a1361444f76036d2c590222ce1a0feaa56779
new_pwd:
installdate: 1345202787
sid: S-1-5-21-3281022357-2736815186-1577070138-1610
11.
Type the following to clear the cache(s) one at a time

cluster1::diag secd*> cache clear -node clusterY-0X vserver studentX -cache-name
ad-to-netbios-domain
delivery
netbios-to-ad-domain
ems-
ldap-groupid-to-name
userid-to-creds
ldap-groupname-to-id
ldap-
ldap-username-to-creds
to-sid
log-duplicate
name-
sid-to-name
groupname-to-id
nis-groupid-to-name
nis-
nis-userid-to-creds
group-membership
nis-username-to-creds
nis-
netgroup
bad-route-to-target
schannel-key
lif-
cluster1::diag secd*> cache clear -node clusterY-0X vserver studentX -cache-name ad-to-netbios-domain
Type the following to clear all caches together
cluster1::diag secd*> restart -node clusterY-0X
You are attempting to restart a process in charge of

security services. Do not
restart this process unless the system has generated a
"secd.config.updateFail"
event or you have been instructed to restart this process
by support personnel.
This command can take up to 2 minutes to complete.
Are you sure you want to proceed? {y|n}: y
Restart successful! Security services are operating correctly.
12.
From the RDP machine close the cifs share \\studentX opened in windows explorer
Exercise 2: Authentication issues

Step
1.
Action
From the RDP machine access the cifs share \\studentX
Start->Run->\\studentX
What error message do you see?
2.
Refer to step 1 of exercise 1 and

Find the node where the IP(s) for vserver studentX is hosted
Login to the console of that node and execute the steps of this exercise
From clustershell of the node , run the following commands:
::> set diag
::*> diag secd authentication translate -node local -vserver
studentX -win-name <your windows username>
::*> diag secd authentication sid-to-uid -node local -vserver
studentX -sid <sid from previous command>
::*> diag secd authentication show-creds -node local -vserver
studentX -win-name <username>
Does the user seem to be functioning properly? If not, what error do you get?
3.
Run the following command:

::> event log show
What message do you see?
4.

::> diag secd name-mapping show -node local -vserver
student1 -direction win-unix

-name <your windows username>
::> vserver name-mapping show -vserver studentX direction
win-unix position *
::> cifs options show vserver studentX
5.
Which log in systemshell can we look at to see errors for this problem?
6.
What issues did you find?
7.
cluster1::*> unix-user create -vserver studentX -user

pcuser -id 65534 -primary-gid 65534
(vserver services unix-user create)
cluster1::*> cifs option modify -vserver studentX -defaultunix-user pcuser
8.
The Windows Explorer window which opens when you navigate to Start->Run>\\studentX shows 2 shares .
a) studentX
b) studentX_child
Try to access the shares
What happens?
Do the following:
Enable debug logging for secd on the node that owns your data lifs
cluster1::*> diag secd log set -node local -level debug

Setting log level to "Debug"
cluster1::*> trace set -node local -trace-all yes
(diag secd trace set)
Trace spec set successfully for trace-all.
Close the CIFS session on the Windows host and run net use /d * from
cmd to clear cached sessions and retry the connection
Enter systemshell and cd to /mroot/etc/mlog
Type tail f secd.log

What do you see?
9.
Given the results of the previous tests, what could the issue be here?
10.
From ngsh(custershell) run:

cluster1::> vserver show -vserver studentX -fields
rootvolume
vserver
rootvolume
-------- ------------studentX studentX_root

The value highlighted in bold is the root volume of the vserver you are acessing
cluster1::>vserver cifs share show -vserver studentX share-name studentX
Vserver: studentX
Share: studentX
CIFS Server NetBIOS Name: STUDENTX
Path: /studentX_cifs
Share Properties: oplocks
browsable
changenotify
Symlink Properties: File Mode Creation Mask: Directory Mode Creation Mask: Share Comment: Share ACL: Everyone / Full Control
File Attribute Cache Lifetime: cluster1::*> vserver cifs share show -vserver studentX share-name studentX_child
Vserver: studentX
Share: studentX_child
CIFS Server NetBIOS Name: STUDENTX
Path: /studentX_cifs_child
Share Properties: oplocks
browsable
changenotify
Symlink Properties: File Mode Creation Mask: Directory Mode Creation Mask: Share Comment: Share ACL: Everyone / Full Control
File Attribute Cache Lifetime: From the above commands obtain the name of the volumes being accessed via the
shares
11.
Now that you know the volumes you are trying to access use fsecurity show to view
permissions on these.
cluster1::*> vol show -vserver studentX -volume studentX_cifs
instance
Find on which node the aggregate where studentX_cifs lives is

hosted on
From node shell of that node run:
cluster1-01> fsecurity show /vol/studentX_cifs
What do you see?
cluster1::*> vol show -vserver studentX -volume
studentX_cifs_child instance
Find on which node the aggregate where studentX_cifs_child

lives is hosted on
From node shell of that node run:
cluster1-01> fsecurity show /vol/studentX_cifs_child
What do you see?

Find on which node the aggregate where studentX_root lives is
hosted on
From node shell of that node run
.
cluster1-01> fsecurity show /vol/studentX_root
What do you see?
12.
From ngsh run::

cluster1::*> volume modify -vserver studentX -volume
studentX_root -unix-permissions 755
Queued private job: 167
Are you able to access both the shares now?
13.
From ngsh run::

cluster1::*> volume modify -vserver studentX -volume
studentX_cifs -security-style ntfs
Does this resolve the issue?
Exercise 3: Authorization issues

Step
1.
Action
From a client go Start -> Run -> \\studentX\studentX
What do you see?
2.
Try to view the permissions on the share. What do you see?
3.
From the nodeshell of the node where the volume and its aggregate is hosted run:
cluster1-01> fsecurity show /vol/student1_cifs
[/vol/student1_cifs - Directory (inum 64)]
Security style: NTFS
Effective style: NTFS
DOS attributes: 0x0010 (----D---)
Unix security:
uid: 0
gid: 0
mode: 0777 (rwxrwxrwx)
NTFS security descriptor:

Owner: S-1-5-32-544
Group: S-1-5-32-544
DACL:
Allow - S-1-5-21-3281022357-2736815186-1577070138-500 0x001f01ff (Full Control)
4.
From the above command, obtain the sid of the owner of the volume.
From ngsh run:
cluster1::*> diag secd authentication translate -node local vserver studentX -sid S-1-5-32-544
What do you see?
5.
How do you resolve this issue?
Exercise 4: Export Policies

Step
1.
Action
Try to access \\studentX\studentX
What do you see?
2.
3.
4.
What error do you see?
What does the event log show? What about the secd log? (Exercise 2 steps 3 and 8)
From nodeshell of the node that hosts the volume and its aggregate run:
fsecurity show /vol/studentX_cifs
Do the permissions show that access should be allowed?
5.
From clustershell obtain the name of the export-policy associated with the volume as
follows:
cluster1::> volume show -vserver studentx -volume
student1_cifs -fields policy
Now view details of the export-policy obtained in the previous command
cluster1::> export-policy rule show -vserver studentX policyname <policy name obtained from the above command>
cluster1::> export-policy rule show -vserver studentX policyname <policy name obtained from the above command> ruleindex <rule index applicable>
What do you see?
How do you fix the issue?
MODULE 6: SCALABLE SAN

Exercise 1: Enable SAN features and create a LUN and connect via ISCSI
Step
1.
Action
Review your SAN configuration on the cluster.
-
Licenses
SAN protocol services
Interfaces
2.
Create a lun in your studentX_san volume.
3.
Create an igroup and add the ISCSI IQN of your host to the group.
4.
Configure the ISCSI initiator
5.
Map the lun and access from lab host. Format the lun and write data to it.
6.
From clustershell
cluster1::*> iscsi show
What do you see?
cluster1::*> debug seqid show
What do you see?
7.
1. Locate the UUIDs of your iSCSI LIFs

::> debug smdb table vifmgr_virtual_interface show -lifname <iscsi_lif>
2. Display the statistics for these LIFs

cluster1::statistics*> show -node cluster1-01 -object
iscsi_lif -counter iscsi_read_ops -instance <UUID obtained
from the above command
EXERCISE 2
TASK 1: TROUBLESHOOT QUORUM ISSUES
In this task, you experience quorum failure on a node of the cluster.
STEP ACTION
1.
Team member 1 login to console of clusterY-01 as admin

From here on this will be referred to as Node1
2.
Team member 2 login to console of clusterY-02 as admin

From here on this will be referred to as Node2
3.
Team member 1 on the Node 1 console ngsh

::> set diag
4.
Team member 2 on the Node 2 console ngsh

::> set diag
5.
Team member 2 on the Node 2 ngsh , verify cluster status

::*> cluster show
6.
Team member 2 on the Node 2 ngsh, view the current LIFs:

::*> net int show
7.
Team member 2 on the Node 2 ngsh, view the current cluster kernel status:
::*> cluster kernel-service show -instance
8.
Team member 2 on the Node 2 ngsh, bring down the cluster network LIFs on the interface:
::*> net int modify -vserver clusterY-02 -lif clus1,clus2 status-admin down
STEP ACTION
9.
10.
11.
On the Node 2 PuTTY interface, enable the cluster network LIFs on the interface:
::*> net int modify -vserver cluster1-02 -lif clus1,clus2 status-admin up
12.
What do you see?
13.
What do you see?
14.
cluster1::*> debug smdb table bcomd_info show

What do you see?
STEP ACTION
15.
Team member 1on the Node 1 ngsh, view the current bcomd information:
cluster1::*> debug smdb table bcomd_info show
What do you see?
16.
Team member 2 reboot Node2 to have it start participating in SAN quorum again:
::*> reboot node clusterY-02
17.
Team member 2 console log in on Node2 as admin
18.
Team member 2 on Node2, verify cluster health:
::> cluster show
19.
Team member 2 on Node2

::> set diag
20.
Verify the cluster kernel to verify both nodes have a status of in quorum (INQ):
::*> cluster kernel-service show instance
::*>debug smdb table bcomd_info show
TASK 2: TROUBLESHOOT LOGICAL INTERFACE ISSUES
In this task, you bring down the LIFs that are associated with a LUN.
STEP ACTION
Console login as admin on clusterY-0X, view the current LIFs:
1.
::*> net int show
On your own, disable LIFs that are associated with studentX_iscsi and determine how this action
impacts connectivity to your LUN on the Windows host.
2.
END OF EXERCISE
Exercise 3: Diag level SAN debugging

Step
Action
1.
What are two ways we can see where the nvfail option is set on a volume?
2.
How would we clear an nvfail state if we saw it?
3.
How would we show virtual disk object information for a lun?
4.
How do you manually dump a rastrace?
MODULE 7: SNAPMIRROR
Exercise 1: Setting up Intercluster SnapMirror
Step
1.
Action
From clustershell of cluster1 run:
cluster1::> snapmirror create -source-path
cluster1://student1/student1_snapmirror -destination-path
cluster2://student3/student3_dest -type DP -tries 8 throttle unlimited
Error: command failed: Volume

"cluster2://student3/student3_dest" not found.
(Failed to contact peer cluster with address
192.168.81.193. No
intercluster LIFs are configured on this node.)
2.

::>set diag
cluster1::*> cluster peer address stable show
What do you see?
cluster1::*> net ::>int show -role intercluster
What do you see?
cluster1::*> cluster peer show -instance
What do you see?
cluster1::*> cluster peer show health instance
What do you see?
.
3.
::*> cluster peer ping -type data

What do you see?
4.

::*> cluster peer ping -type icmp
What do you see now? What addresses, if any, seem to be having issues?
5.

::> job history show -event-type failed
What jobs are failing?
To examine why they are failing:
cluster1::*> event log show -node cluster1-01 -messagename
cpeer*
Why are the jobs failing?
6.
Try to modify the cluster peer. What happens?

cluster1::*> cluster peer modify -cluster cluster2 -peeraddrs 192.168.81.193,192.168.81.194 -timeout 60
7.
How did you resolve the issue?
Exercise 2: Intercluster DP mirrors

Step
1.
Action
cluster1::*> snapmirror create -source-path
What error do you see?What might he be doing wrong?
2.

cluster2://student3/student3_dest -type DP -tries 8 -throttle
unlimited
What do you see?Why?
3.
After correcting the issue, run the following command in clustershell of cluster2:
Does the command complete?

How do you verify the snapmirror exists?
::>snapmirror show
What do you see? Is the snapmirror functioning?
How do you get the mirror working if its not?
4.
After the snapmirror is confirmed as functional, check to see how long it has been
since the last update (snapmirror lag).
Exercise 3: LS Mirrors
Step
1.
Action
Create two LS mirrors that point to your studentX_snapmirror volume.
clusterY::*> volume create -vserver studentX -volume
studentX_LS_snapmirror -aggregate studentX -size 100MB state online -type DP
clusterY::*> volume create -vserver studentX -volume

studentX_LS_snapmirror2 -aggregate studentX -size 100MB state online -type DP
clusterY::*> snapmirror create -source-path
clusterY://studentX/studentX_snapmir
ror -destination-path
clusterY://studentX/studentX_LS_snapmirror2 -type LS
[Job 273] Job is queued: snapmirror create the relationship
with destination clu
[Job 273] Job succeeded: SnapMirror: done
clusterY::*> snapmirror create -source-path

clusterY://studentX/studentX_snapmir
ror -destination-path
clusterY://studentX/studentX_LS_snapmirror -type LS
[Job 275] Job is queued: snapmirror create the relationship
with destination clu
[Job 275] Job succeeded: SnapMirror: done
What steps did you have to consider? Check the MSIDs and DSIDs for the source
and destination volumes. What do you notice?
clusterY::*> volume show -vserver studentX -fields msid,dsid
2.
Attempt to initialize one of the mirrors using the snapmirror initialize command.
cluster1::*> snapmirror initialize -destination-path
cluster1://student1/student1_LS_snapmirror
[Job 276] Job is queued: snapmirror initialize of destination
cluster1://student1/student1_LS_snapmirror.
cluster1::*> snapmirror initialize -destination-path

cluster1://student1/student1_LS_snapmirror2
[Job 277] Job is queued: snapmirror initialize of destination
cluster1://student1/student1_LS_snapmirror2.
cluster1::*> job show

What happens? How would you view the status of the job? If it didnt work, how
would you fix it? Why didnt it work?
cluster1::*> job history show -id 276

What do you see?
How do you fix it?
3.
After initializing the LS mirrors, try to update the mirrors using snapmirror update.
clusterY::*> snapmirror update -destination-path
clusterY://studentX/studentX_LS_snapmirror
[Job 279] Job is queued: snapmirror update of destination
clusterY://studentX/studentX_LS_snapmirror.
clusterY::*> job show
What happens? How do you view the status of the job?

What is the issue?
4.

::> vol show -vserver studentX -fields junction-path
What do you see?

.
Mount the volume from the cluster shell.
::> vol nmount -vserver studentX -volume studentX_snapmirror

junction-path /student1_snapmirror
What do you see?
Run the following:

What do you see now?
Then remount the volume to a new junction path studentX_snapmirror.
::> vol mount -vserver studentX -volume studentX_snapmirror junction-path /studentX_snapmirror
Now what do you see?
Unmount the volume from the cluster shell.
::> vol unmount -vserver studentX -volume studentX_snapmirror
Run the following:
What do you see now?
Then remount the volume to a new junction path studentX_snapmirror.
::> vol mount -vserver studentX -volume studentX_snapmirror junction-path /studentX_snapmirror
Now what do you see?
5.
clusterY::*> snapmirror update-ls-set -source-path

clusterY://studentX/studentX_snapmirror
clusterY://studentX/studentX_root
clusterY::*> volume modify -vserver studentX -volume

studentX_snapmirror -unix-permissions 000
clusterY::*> volume show -vserver studentX -fields unixpermissions

What do you see?
Mount the volume from your Linux host using o nfsvers=3:

[root@nfshost DATAPROTECTION]# mount -o nfsvers=3
student1:/student1_snapmirror /cmode
[root@nfshost DATAPROTECTION]# cd /cmode
[root@nfshost cmode]# ls
[root@nfshost cmode]# cd
[root@nfshost ~]# ls -latr /cmode
Now execute:
[root@nfshost ~]# umount /cmode
From clustershell run:
From Linux Host run:
[root@nfshost ~]# mount -o nfsvers=3
student1:/student1_snapmirror /cmode
[root@nfshost ~]# ls -latd /cmode
What do you see?
Modify the volume back to 777 on the cluster (using vol modify)
Check permissions on the unix host again.
[root@nfshost ~]# ls -latd /cmode
ls: /cmode: Permission denied
[root@nfshost ~]# cd /cmode
What do you see?
Are you able to cd into the mount now?
Update the LS mirror set.

What do you see in ls on the host? Why?
Modify the source volume to 000
What do you see in ls on the host? Why?

Int Eng Ilt Cmodetrbl Exerciseguide

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Int Eng Ilt Cmodetrbl Exerciseguide

Hochgeladen von

Copyright:

Verfügbare Formate

MODULE 1: KERNEL

Exercise 1: Recovering from a boot loop

--------------------- ------- -----------cluster1-01

4 entries were displayed.

Note that the health of node clusterX-02 is false.

How do you fix this?

Execute the following:

0:00.00 grep mgwd

You see the following? Why?

You are now in clustershell. Drop to systemshell as follows:

::*> systemshell local

Now you are back in clustershell. Execute the following:

Warning: These diagnostic commands are for use by NetApp

cluster1::*> cluster show

-------------------- ------- ------------

4 entries were displayed.

Error: command failed: Failed to queue job 'Modify

cluster1::*> cluster show

-------------------- ------- ------------

4 entries were displayed.

Do we see a difference in cluster show? If so, why? Whats broken?

Exercise 2: Configuration backup and recovery

Run the following commands:

Where in systemshell can you find the files listed above?

cluster1::*> system configuration backup create -node

% 7za e [system created backup file name]

What is in this file?

Whats inside of it?

cd into cluster_config in the backup. What is different from

cd into cluster_replicated_records at the root of the folder you originally extracted

Exercise 3: Moving mroot to a new aggregate

--------- -------- --------- ----- ------- ------ --------------- -----------aggr0_cluster1_02_0

Warning: These diagnostic commands are for use by NetApp

cluster1::*> cluster show

-------------------- ------- ------------

4 entries were displayed.

::*> cluster show

-------------------- ------- ------------

4 entries were displayed.

* Press Ctrl-C for Boot Menu. *

Please choose one of the following:

(1) Normal Boot.

Tue Sep 11 11:23:27 UTC 2012

*> aggr options new_root ha_policy cfo

Run the following command to reboot the node:

The operating system has halted.

A new root volume was detected.

This node is not fully

support personnel for the root volume recovery procedures.

Increase the size of AUTOROOT on the node by doing the following

Clear the root recovery flags if required by doing the following:

check if the bootarg.init.boot_recovery bit is set

% sudo kenv bootarg.init.boot_recovery

% sudo sysctl kern.bootargs=--bootarg.init.boot_recovery

Check that the bit is cleared

Warning: This command will rejoin node "cluster1-02" into the

--------------------- ------- -----------cluster1-01

4 entries were displayed.

If the cluster is still not in quorum, run the following command:

volume was created>

--------- ------------ ------------ ---------- ---- ---------- --------- ----cluster1-02

cluster1::*> vol add-other-volumes -node cluster1-02

cluster1::*> vol show -vserver cluster1-02

--------- ------------ ------------ ---------- ---- ------------------- ----cluster1-02