Sie sind auf Seite 1von 22

EMC ISILON CUSTOMER TROUBLESHOOTING GUIDE

ONEFS UPGRADE FAILURES

Abstract
This guide helps you troubleshoot OneFS upgrade failures and error
messages received during upgrades.
September 15, 2015

1 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures


We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.

Contents and overview


Note
Follow all of these steps, in order, until you reach a resolution.

1. Follow these
steps.

Before you begin


Page 3

Best practices and useful information


Page 4

2. Perform
troubleshooting
steps in order.

Start Troubleshooting
Page 5

Nodes did not all come back online


Page 8

Simultaneous Upgrade
Page 11

Rolling Upgrade
Page 12

3. Appendices

Appendix A
If you need further assistance

Appendix B
How to use this flow chart

2 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures


We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.

Before you begin

CAUTION!
If the node, subnet, or pool you are working on goes down during the course of
troubleshooting and you do not have any other way to connect to the cluster, you could
experience data unavailability.

Therefore, make sure you have more than one way to connect to the cluster before you
start this troubleshooting process. The best method is to have a serial cable available.
That way, if you are unable to connect through the network, you will still be able to
connect to the cluster physically.
For specific requirements and instructions for making a physical connection to the
cluster, see article 16744 on the EMC Online Support site.
Before you begin troubleshooting, confirm that you can either connect through another
subnet or pool, or that you have physical access to the cluster.

Configure logging through SSH


We recommend configuring screen logging to log all session input and output during your troubleshooting session . This log
file can be shared with EMC Isilon Technical Support if you require assistance at any point during troubleshooting .
Note: The screen session capability does not work in OneFS 7.1.0.6 and 7.1.1.2. If you are running either of these versions,
please configure logging using your local SSH client's logging feature.
1. Open an SSH connection to the cluster and log in using the root account. Note: If the cluster is in compliance mode, use
the compadmin account to log in. All compadmin commands must be preceded by the sudo prefix.
2. Change the directory to /ifs/data/Isilon_Support by running:
cd /ifs/data/Isilon_Support

3. Run the following command to capture all input and output of the session:
screen -L

This will create a file called screenlog.0 that will be appended to during your session.
4. Perform troubleshooting.

3 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures


We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.

Best practices and useful information


Introduction
This page explains why upgrades often fail
and how to prevent upgrade problems in the
future.

Most upgrade problems occur during rolling upgrades that are initiated
from the OneFS web administration interface.
For best results, do the following:

Use the command-line interface (CLI) to perform upgrades.


Initiate the upgrade from the highest-numbered node in the cluster,
unless the highest-numbered node is an Accelerator.
If the highest-numbered node is an Accelerator, then initiate the
upgrade from node 1.

Use the command-line interface


It is best to initiate the upgrade from the command-line interface. The
CLI displays more detailed information than the web interface, and is not
reliant on the WebUI services running in order to function. You can also
launch a screen session, which enables you to resume from where you
left off if you get disconnected.

Initiate the upgrade from the highest-numbered node


The node that you initiate the upgrade from is called the "master node."
During an upgrade, each node is upgraded and rebooted in turn, in
ascending numerical order, starting with the lowest-numbered node.
When the master node is the highest-numbered node, the upgrade
starts with node 1, and the last node to be rebooted is the master node.
The system should always upgrade and reboot the master node last,
regardless of which numbered node it is, but this does not always
happen. Sometimes, when the master node is not the highest-numbered
node, the system starts upgrading with node 1 as usual, but when it
reaches the master node, it upgrades and reboots that node in its
numerical order. This stops the upgrade process because, after it is
rebooted, the master node can no longer tell the rest of the nodes to
upgrade. Therefore, you should always initiate the upgrade from the
highest-numbered node in the cluster (unless, as stated above, the
highest-numbered node is an Accelerator; in this case, you should
initiate the upgrade from node 1).

4 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures


We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.

Troubleshooting
Analysis
Introduction
Start troubleshooting here. If you need
help understanding the flow chart
conventions used in this guide, see
Appendix B: How to use this flow chart.

Note
Most upgrade problems
occur during rolling
upgrades that are initiated
from the OneFS web
administration interface.
Therefore, we will use the
command-line interface
exclusively to troubleshoot
your issue and get your
upgrade restarted. For
more information, see
"Best practices and useful
information" on page 4.

Start

If you have not done so already, log in to


the cluster and configure logging through
SSH, as described on page 3.

Did the
upgrade fail with a
specific error displayed
on the screen?

No

Go to Page 6

No

Go to Page 6

Yes

Follow the prompts


and onscreen
instructions.

Can the
upgrade be completed
successfully now?

Yes

End troubleshooting

5 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures


We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.

Troubleshooting, continued
Analysis, continued
You could have arrived here from:
Page
5 - Analysis
______________
Page
8 - Nodes did not all come back online
____________________________________

Page
6

Run the following command to see which nodes were successfully upgraded.

isi_for_array -s "uname -a"


The output provides a list of all the nodes and indicates which version of
OneFS each is running. For an example of the output, see Appendix
C.
_________
Note: If the node did not fully reboot or is down, it will not show up. Also, if
the upgrade was a rolling upgrade, an error might appear stating a node did
not come back online.

After
running the command,
do you see this error?
ERROR Client connected from
an unprivileged port number
50230. Refusing the connection
[Errno 54] RPC session
disconnected

Yes

Install a patch as described in


the following article:

No

OneFS: After a failed or paused


upgrade, commands sent from
nodes that are not yet upgraded
might fail, article 198906.
Then continue troubleshooting.

Go to Page 7

6 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures


We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.

Troubleshooting, continued
Analysis, continued
You could have arrived here from:

Page
7

Page 6 - Analysis, continued

Run the following command:


isi status -q
In the output, look at the Health DASR column to see if
any nodes report -D- (Down). For an example of the
output, see __________
Appendix D.

Do any
nodes report as down?
A down node means that it
failed to join the cluster
following the
upgrade.

Yes

Go to Page 8

No

Go to Page 9

No

Using the
output of the
isi_for_array -s "uname -a"
command from Page 6,
are all the nodes running
the new version
of OneFS?

Yes
End troubleshooting

7 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures


We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.

Troubleshooting, continued
Nodes did not all come back online
You could have arrived here from:

Page 7 - Analysis, continued


Page
8

Has it been
at least 15 minutes since the
nodes rebooted as part of
the upgrade?

No

Yes

Wait 15 minutes

Note the page number that you


are currently on.
Upload log files and contact Isilon Technical
Support, as instructed in Appendix A.

8 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures


We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.

Go back to Page 6

Troubleshooting, continued
Analysis, continued
You could have arrived here from:

Page 7 - Analysis, continued


Page
9

Did you
follow the steps in the
"Planning an Upgrade" and
"Completing pre-upgrade tasks"
sections of the OneFS Upgrade
Planning and Process Guide
before beginning the
upgrade?

Yes

No

Follow the steps in the "Planning an


Upgrade" and the "Completing preupgrade tasks" sections of the
OneFS Upgrade Planning and
Process Guide.

Go to Page 10

9 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures


We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.

Troubleshooting, continued
Analysis, continued
You could have arrived here from:
Page 9 - Analysis, continued

Page
10

Did you
perform a simultaneous
upgrade or a rolling
upgrade?

Simultaneous

Go to Page 11

10 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures


We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.

Rolling

Go to Page 12

Troubleshooting, continued
Simultaneous upgrade
Page
11

You could have arrived here from:

Page 10 - Analysis, continued

Run the following command:


isi status -q
In the output, look at the Health DASR column to see if
any nodes report -D- (Down). For an example of the
output, see __________
Appendix D.

Yes

Do any
nodes report as down?
A down node means that it
failed to join the cluster
following the
upgrade.

Note the page number that you


are currently on.
Upload log files and contact Isilon Technical
Support, as instructed in Appendix A.

11 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures


We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.

No

Go to Page 14

Troubleshooting, continued
Rolling upgrade
You could have arrived here from:

Page
12

________________________
Page 10 - Analysis, continued

Run the following command to determine which nodes did not get upgraded:
isi_for_array -s "uname -a"
The output provides a list of all the nodes and indicates which version of
OneFS each is running. For an example of the output, see Appendix
C.
_________

For each node that did not get upgraded, run the following command to check
that node's /var/log/messages file to see if there are errors with a timestamp
that occurred during the upgrade. In the command, replace <YYYY-MM-DD>
with the date of the upgrade:
grep '^<YYYY-MM-DD>' /var/log/update_engine*
For example:

grep '^2015-04-15' /var/log/update_engine*

Are there
errors on a node that did
not get upgraded?

No

Go to Page 14

Yes
Is the
following error present?
Unable to claim upgrade
daemon on one or
more nodes.

Yes

Go to Page 13

12 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures


We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.

No

Go to Page 14

Troubleshooting, continued
Rolling upgrade, continued
Page
13

You could have arrived here from:

Page
12 - Rolling upgrade, continued
_____________________________

Run the following command:


isi services -a isi_upgrade_d

Is the
service enabled or
disabled?

Disabled

Enabled

You are still in the middle of an upgrade and unable


to proceed.
Disable the service by running the following
command:
isi services -a isi_upgrade_d disable
Note the page number that you
are currently on.
Upload log files and contact Isilon Technical
Support, as instructed in Appendix A.

Go to Page 14

13 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures


We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.

Troubleshooting, continued
Restart the upgrade
You could have arrived here from:

Page
14

__________________________
Page
11 - Simultaneous upgrade
Page
12 - Rolling upgrade, continued
______________________________
Page 13 - Rolling upgrade, continued
______________________________
Open an SSH connection to the
highest-numbered node in the cluster,
and log in using the root account.

Open a screen session by running the following command, where <session name> is a name that
you provide. Record the name in case you need to use it later. The screen session enables you to
easily reconnect to the upgrade process if the session gets disconnected during the upgrade.
screen -S <session name>
If you get disconnected, you can use the following command to reconnect:
screen -x <session name>
Note: If you are running OneFS 7.1.1.2 or 7.1.0.6, skip this step. The screen session feature
does not work in OneFS 7.1.1.2 or 7.1.0.6.

Restart the upgrade by running one of


the following commands:
For a rolling upgrade:
isi update --rolling
For a simultaneous upgrade:
isi update

Yes

Did the
upgrade
restart?

No

Wait for the upgrade


to complete.

Go to Page 15

Note the page number that you


are currently on.
Upload log files and contact Isilon Technical
Support, as instructed in Appendix A.

14 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures


We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.

Troubleshooting, continued
Restart the upgrade, continued
You could have arrived here from:

Page
15

Page 14 - Restart the upgrade, continued

Run the following command to determine whether


any more nodes were upgraded:
isi_for_array -s "uname -a"
The output provides a list of all the nodes and indicates which version of
OneFS each is running. For an example of the output, see Appendix
C.
_________

Yes

Have all
of the nodes been
upgraded?

Go to Page 16

No

Note the page number that you


are currently on.
Upload log files and contact Isilon Technical
Support, as instructed in Appendix A.

15 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures


We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.

Troubleshooting, continued
Post-upgrade checks
You could have arrived here from:

Page
16

__________________________________
Page 15 - Restart the upgrade, continued

Run the following command:


isi status -q
In the output, look at the Health DASR column to see if
any nodes report -D- (Down). For an example of the
output, see __________
Appendix D.

Yes

Do any
nodes report as down?
A down node means that it
failed to join the cluster
following the
upgrade.

Go to Page 17

16 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures


We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.

No

End troubleshooting

Troubleshooting, continued
Nodes did not all join the cluster
Page
17

You could have arrived here from:


Page 16 - Post-upgrade checks

If you want to determine root cause, please contact Isilon Technical


Support before continuing. If you do not want root cause analysis,
then continue.

Reboot each down node as follows:


1. If possible, use a serial console to connect to the node.
Otherwise, log in to the node by using SSH.
For instructions about connecting through a serial console, see
article 16744 on the EMC Online Support site.
___________
2. After you are connected to the node, run the following command
to reboot the node:
shutdown -r now
3. Wait for the rebooted nodes to come back online.

Run the following command:

isi status -q
In the output, look at the Health DASR column to see if any nodes
report -D- (Down). For an example of the output, see__________
Appendix D.

Yes

Do any
nodes report as down?
A down node means that it
failed to join the cluster
following the
reboot.

Note the page number that you


are currently on.
Upload log files and contact Isilon Technical
Support, as instructed in Appendix A.

17 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures


We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.

No

End troubleshooting

Appendix A: If you need further assistance


Contact EMC Isilon Technical Support
If you need to contact Isilon Technical Support during troubleshooting, reference the page or step that you need help on.
This information and the log file will help Isilon Technical Support staff resolve your case more quickly .

Upload node log files and the screen log file to EMC Isilon Technical Support
1. When troubleshooting is complete, type exit to end your screen session.

2. Gather and upload the node log set and include the SSH screen log file by using the command appropriate for your
method of uploading files. If you are not sure which method to use, then use FTP.
ESRS:
isi_gather_info --esrs --local-only -f /ifs/data/Isilon_Support/screenlog.0
FTP:
isi_gather_info --ftp --local-only -f /ifs/data/Isilon_Support/screenlog.0
HTTP:
isi_gather_info --http --local-only -f /ifs/data/Isilon_Support/screenlog.0
SMTP:
isi_gather_info --email --local-only -f /ifs/data/Isilon_Support/screenlog.0
SupportIQ:
Copy and paste the following command.
Note: When you copy and paste the command into the command-line interface, it will appear on multiple lines (exactly
as it appears on the page), but when you press Enter the command will run as it should.
isi_gather_info --local-only -f /ifs/data/Isilon_Support/screenlog.0 --noupload \
--symlink /var/crash/SupportIQ/upload/ftp
3. If you receive a message that the upload was unsuccessful, refer to ___________
article 16759 on the EMC Online Support site for
directions for uploading files over FTP.

18 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures


We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.

Appendix B: How to use this flow chart


Introduction
Describes what the section helps you to
accomplish.

You could have arrived here from:


Page # - Page title

Note

Page
#

Provides context and additional


information. Sometimes a note is
linked to a process step with a
colored dot.

Directional arrows indicate


the path through the
process flow.

Yes

No

Decision diamond

Process step with command:


Process step
command xyz

CAUTION!
Caution boxes warn that
a particular step needs
to be performed with
great care, to prevent
serious consequences.

Go to Page #

Optional process step

End point

Document Shape
Calls out supporting documentation
for a process step. When possible,
these shapes contain links to the
reference document.
Sometimes linked to a process step
with a colored dot.

19 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures


We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.

Appendix C: Output of the isi_for_array -s "uname -a"


command
You could have arrived here from:

_______________________
Page
6 - Analysis, continued
Page
12 - Rolling upgrade, continued
______________________________
Page
15 - Restart the upgrade, continued
__________________________________

Example output for


isi_for_array -s "uname -a"
cluster-1: Isilon OneFS cluster-1 v7.0.2.5 Isilon OneFS v7.0.2.5
B_7_0_2_216(RELEASE): 0x7000250005000D8:Mon Nov 25 20:16:16 PST 2013
root@fastbuild-08.west.isilon.com:/build/mnt/obj.RELEASE/build/mnt/src/sys/
IQ.amd64.release amd64

20 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures


We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.

Appendix D: Output of the isi status -q command


You could have arrived here from:
_______________________
Page 7 - Analysis, continued
___________________________
Page 11 - Simultaneous upgrade
__________________________
Page 16 - Post-upgrade checks
__________________________________
Page 17 - Nodes did not all join the cluster

Example out put for


isi status -q
Cluster Name: mycluster
Cluster Health: [ ATTN ]
Cluster Storage: HDD SSD
Size:
11G (23G Raw) 0 (0 Raw)
VHS Size:
11G
Used:
573M (5%) 0 (n/a)
Avail:
11G (95%) 0 (n/a)
Health Throughput (bps) HDD Storage
SSD Storage
ID |IP Address
|DASR | In
Out Total| Used / Size
|Used / Size
-------------------+-----+-----+-----+-----+-----------------+----------------1|192.168.146.128|-A-- | 396K| 828K| 1.2M| 144M/ 2.8G( 5%)| (No SSDs)
2|192.168.146.129|OK
| 49K| 3.2M| 3.2M| 145M/ 2.8G( 5%)| (No SSDs)
3|192.168.146.130|OK
| 3.5K| 162K| 165K| 142M/ 2.8G( 5%)| (No SSDs)
4|192.168.146.131|OK
| 49K| 356K| 405K| 143M/ 2.8G( 5%)| (No SSDs)
-------------------+-----+-----+-----+-----+-----------------+----------------Cluster Totals: | 498K| 4.5M| 5.0M| 573M/ 11G( 5%)| (No SSDs)
Health Fields: D = Down, A = Attention, S = Smartfailed, R = Read-Only

21 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures


We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.

2011 - 2013 EMC Corporation. All Rights Reserved.


EMC believes the information in this publication is accurate as of its publication date. The information is subject to change
without
notice.
THE INFORMATION IN THIS PUBLICATION IS PROVIDED AS IS. EMC CORPORATION MAKES NO
REPRESENTATIONS OR
WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND
SPECIFICALLY
DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Use, copying, and distribution of any EMC software described in this publication requires an applicable software license.
EMC2, EMC, and the EMC logo are registered trademarks or trademarks of EMC Corporation in the United State and other
countries.
All other trademarks used herein are the property of their respective owners.

Das könnte Ihnen auch gefallen