Beruflich Dokumente
Kultur Dokumente
Abstract
This document provides a summary of useful Isilon OneFS
commands you can run to examine the performance metrics
available on an Isilon cluster.
Unless otherwise noted, the commands in this document apply to Isilon OneFS 8.0.x.
Best practices
Make sure that you follow the best practice procedures during your pre-deployment of an Isilon OneFS cluster
to prevent issues from arising. Integrating changes after a system is deployed, to bring the system into best
practice compliance can be operationally difficult. For more information, refer to the following list of best
practices for Isilon OneFS use cases.
isi status
Disk failures
To observe disk drives that are marked as smartfail, empty, stalled, or down, run the following command:
Workload
Protocol traffic and balance
You can identify the balance of protocol traffic within a workflow by running the following isi statistics
command and viewing the busiest protocols as returned by NumOps:
isi statistics protocol list --totalby Op,Proto --protocols all --output Proto,Ops --
sort NumOps --long
Example output
NOTE: The Identify Registration Protocol (IRP) protocol runs across InfiniBand and should not be evaluated as
a client protocol.
Example output
The top section of the output shows the protocol command rates. The Read plus Write plus Metadata
operation rates equals the Total. Using the isi statistics pstat output from the example, subtract the
Read value (829 ops/s) and the Write value (333 ops/s) from the Total value (1256 op/s), which leaves 94
op/s for Metadata operations. In this example, the Read, Write, and Metadata ratio for this protocol is the
following:
Users by protocol
You can identify the top 20 users, and the external protocols they are using, by running the following
command:
isi statistics client --protocols external --no-footer | awk '{print $1 " "$6 " " $8}'
| head -22
To establish if any users are dominant or out of balance with other users of the same protocol, observe the
Ops count for each user. To determine the difference between a busy user and a non-busy user, you can
increase the command output to observe at what point the user operation counts begin to decrease.
Isilon stores 1024 user name records for each 15-second window. An UNKNOWN user name is not included in
the 1024 records for the window in which the command was issued. An asterisk “*” is displayed in the
UserName column for protocols that do not supply a user name.
The protocols are divided into external protocols and internal protocols, as shown in the following table.
External Internal
nfs4 smb2
isi statistics heat --nodes all --totalby path | awk '{print $1 " " $5}' | sort -n -r
| head -20
To determine the difference between a busy file and a non-busy file, you can increase the command output to
observe at what point the file operation rates begin to decrease.
• A system file
• A file with a path name that is too long
• A snapshot that no longer exists
• An unlinked file that is still referenced somewhere
Example output 1 displays entries for identical paths with different operation rates. Each entry for the same
path is a different event (for example, a read, getattr, lookup, or other operation). Multiple instances of the
same path aggregate to indicate the total operation rate for that path.
Run the following command to display all node operation rates and file paths.
isi statistics heat --nodes all --totalby path | awk '{print $1 " " $5}' | sort -n -r
| head -20
Example 1 output
419.4 /ifs
357.0 /ifs
182.7 /ifs/.ifsvar
145.1 /ifs/.ifsvar
76.9 /ifs/Test
66.3 /ifs
48.5 /ifs/Test/3
46.4 UNKNOWN
42.5 /ifs/.ifsvar/modules
41.4 /ifs/.ifsvar/modules/tardis
34.1 /ifs/Test/3/vdb.1_12.dir/vdb.2_23.dir/vdb.3_9.dir
33.1 /ifs/Test/3/vdb.1_12.dir/vdb.2_18.dir/vdb.3_9.dir
31.5 /ifs/.ifsvar/modules
30.2 /ifs/Test/3/vdb.1_12.dir/vdb.2_23.dir/vdb.3_9.dir
29.8 /ifs/Test/3/vdb.1_12.dir/vdb.2_19.dir/vdb.3_9.dir
29.0 /ifs/Test/3/vdb.1_12.dir/vdb.2_13.dir/vdb.3_9.dir
28.6 /ifs/Test/3/vdb.1_12.dir/vdb.2_5.dir/vdb.3_9.dir
28.3 UNKNOWN
To view the events that produce each path instance in Example output 1, run the command to display node-
by-node operation rates and file paths, and add the event column to the awk print statement. The following
example runs on node 1.
Example 2
Run the following command to display one node (node 1), the operation rate, events, and file paths:
isi statistics heat --nodes 1 | awk '{print $1 " " $4 " " $5}' | sort -n -r | head -20
The output displays the events that are associated with each instance of duplicate paths.
Example 2 output
As an alternative, you can navigate to the /ifs/.ifsvar/modules/fsa/pub/latest directory. List the files within
this directory and observe the size and date for the results.db file. Verify that the file contents exist and
the date that the last FSA job was run. If the file contents exist, use the following query to obtain information
about the 20 largest files in the following order: physical size in MB, logical size in MB, and the path.
Example output
15873.7 10563.6
data/pg/LS/reference/isaac2/iSAACIndex.20150312/Temp/neighbors.dat
15250.1 10148.6 data/pg/LS/reference/dbsnp_138.b37.vcf
10036 6678.19 SPEC/iobw.tst
5940.46 3953.25 data/pg/LS/src/parallel_studio_xe_2015_update2.tgz
4585.88 3051.67 data/pg/LS/reference/hg19.fa
4495.92 2991.83 data/pg/LS/reference/hg19.fa.bwt
2497.35 1661.91
data/pg/LS/reference/isaac2/iSAACIndex.20150312/hg19.fa-32mer-6bit
2497.35 1661.91
data/pg/LS/reference/isaac2/iSAACIndex.20150312/Temp/hg19.fa-32mer
To view the smallest 20 files from the 1,000 recorded files, change the desc parameter to asc, potentially
providing an indication of the file size distribution. The ratio of the physical to logical size can be used to gain
an indication of storage efficiency.
Example output
phys_size sum(file_cnt)
---------- -------------
0 3536
8192 599126
131072 14144
1048576 240301
10485760 63276
104857600 1177
1073741824 147
1073741824 10
OneFS 8.0.x records the FSA results into multiple databases for parallel access. The list of the 1,000 largest
files still exists. However, in OneFS 8.0.x, the list is in a separate database outside of the results.db file.
Navigate to the /ifs/.ifsvar/modules/fsa/pub/latest directory. List the files within this directory, and observe
the size and date for the list_top_n_files_by_phys_size.db. Verify if file contents exist and the date
that the last FSA job was run. If the file exists, run the following query to obtain information about the 20
largest files by physical size in MB, logical size in MB, and the path:
Example output
You must identify the most recently completed FSA job and then query it for file size results.
With OneFS 8.0.x, the Isilon OneFS Application Program Interface (API), you can interrogate the FSA database
by including the ID number of the completed FSA job.
Example output
"begin_time" : 1463695240,
"content_path" : "/ifs/.ifsvar/modules/fsa/pub/job.378/results.db",
"delete_link" : "https://localhost:8080/platform/3/fsa/results",
"end_time" : 1463702519,
"fsa_state" : "publish",
"id" : 378,
"job_state" : [ "9", "STATE_FINISHED" ],
…
"version" : 3
Note the number for the top-most entry with "STATE_FINISHED." In this example the ID number is 378.
To obtain a count of the file occurring within the 11 predefined file size buckets, insert the cluster root
password and the ID number of the most recently completed FSA job into the following.
1.5 GB
"value" : 54
60 MB
"value" : 405317
30 MB "value" : 314076
15 MB
"value" : 35231545
"value" : 47
7.5 MB
"value" : 57
5 MB "value" : 41
"value" : 16
2.5 MB
"value" : 2
.5 MB
"value" : 0
85 KB "value" : 0
4 KB
> 4 KB
The misalignment is at the file level, not at the file system level and results from a storage abstraction layer,
such as the virtual machine (VM) storage not matching the OneFS storage blocking. The cost of each
misaligned write request is dependent on many variables and causes additional I/O load, ranging from 10% to
20%.
Run the following command and record the misaligned write request counts per node. Wait 30 seconds and
then run the command again.
Example output
Calculate the difference between the misaligned write request counts from each command. Divide this
difference by the time between samples. The result is the rate-per-second of misaligned write requests.
The rate of the misaligned write requests must be weighed against the rate of write requests for the protocol
that is servicing the abstraction layer (probably VM) to see if an additional 10% to 20% I/O load is deemed
significant. You can run the following command to see write requests for a specific protocol.
For example, if the protocol write request rate is 200 writes per second, and the misaligned write requests are
30 per second, the overhead of misalignment might be significant and causing an impact. Conversely, if the
protocol write request rate is 200 writes per second, and the misaligned write request rate is 2 per second,
this is not a significant performance factor.
Blocked and Contended events tend to be correlated together. The new lock requester is blocked, and the
current lock holder gets the contended callback.
Deadlock events are very different, with no timeout, and deadlock events should be infrequent.
To obtain information on 50 recent lock events, you can run the following command:
Disk activity
You can capture an overview profile of disk drive activity by running the following command, and examining
the Max, Min, and Average values for the disk time in queue and the number in queue (queue depth). For
information on SAS drives, you can include SAS instead of SATA.
Time in Queue
isi statistics drive --nodes=all --degraded --no-header --no-footer | awk ' /SATA/
{sum+=$8; max=0; min=1000} {if ($8>max) max=$8; if ($8<min) min=$8} END {print "Min =
",min; print "Max = ",max; print "Average = ",sum/NR}'
Number in Queue
isi statistics drive --nodes=all --degraded --no-header --no-footer | awk ' /SATA/
{sum+=$9; max=0; min=1000} {if ($9>max) max=$9; if ($9<min) min=$9} END {print "Min =
",min; print "Max = ",max; print "Average = ",sum/NR}'
Network factors
A baseline of output values is essential prior to an in-depth performance investigation. Comparing the metrics
from a performance-acceptable timeframe can help you make decisions about network factors that might be
adversely affecting performance.
NOTE: Unless otherwise stated, you must determine the significance of network command output values
based on a known baseline.
The quality of a network connection to the cluster can be defined by Hop Count, Latency (Response Time or
Round Trip Time), Jitter (which is a variation in Latency), and Packet Loss. Maximum Transmission Unit (MTU)
and bandwidth are also important factors in assessing network health.
To display route and transit delays for packets over IP, run the traceroute command.
Example output
In the example, only one line of output exists, which means no hops were encountered. As requested, the
latency values of 5 connection attempts are displayed.
A successful run of this command indicates an MTU of 9000. Results of ping: sendto: Message too
long indicate that the MTU is 1500.
Bandwidth
To measure bandwidth, run the following iperf command. Setting up the target must be done first.
Results are available on either the source or target. Close iperf on the target using Ctrl-C.
Example output
------------------------------------------------------------
Client connecting to 10.245.108.26, TCP port 5001
TCP window size: 19.3 KByte (default)
------------------------------------------------------------
[ 3] local 10.245.108.81 port 60141 connected with 10.245.108.26 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 1.10 GBytes 942 Mbits/sec
Results are available on either the source or target node. Close iperf on the target using Ctrl-C.
Example output
------------------------------------------------------------
Client connecting to 10.245.108.27, UDP port 5001
Sending 1470 byte datagrams
UDP buffer size: 9.00 KByte (default)
------------------------------------------------------------
[ 3] local 10.245.108.26 port 44598 connected with 10.245.108.27 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 1.25 MBytes 1.05 Mbits/sec
[ 3] Sent 893 datagrams
[ 3] Server Report:
[ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams
[ 3] 0.0-10.0 sec 1.25 MBytes 1.05 Mbits/sec 0.003 ms 0/ 893 (0%)
Fast Ethernet
100 Mbit/s 12.5 MB/s
(100BASE-X)
Gigabit Ethernet
1 Gbit/s 125 MB/s
(1000BASE-X)
10 Gigabit Ethernet
10 Gbit/s 1.25 GB/s
(10GBASE-X)
40 Gigabit Ethernet
40 Gbit/s 5 GB/s
(40GBASE-X)
Example output
Interpret the retransmission rates as a percentage of the total transmission. Below < 0.1% (one tenth of a
percent) retransmission for total transmitted bytes is acceptable for a local network. From the example output:
Since this retransmission rate is < 0.1%, this retransmission can be interpreted as not significant and not an
issue.
Hostcache.list
The hostcache.list holds a cache of the recent host connection information allowing faster re-connection of
returning hosts. You can examine the IP address, Round Trip Time (RTT), and Variance in RTT (RTTVAR) to verify
whether the round trip time in the hostcache.list matches expectations for the IP address and its physical
location. High RTT or RTTVAR values compared to a known baseline can indicate a problem on that client or
subnet. To view the hostcache.list, run the following command.
Example output
net.inet.tcp.hostcache.list:
IP address MTU SSTRESH RTT RTTVAR BANDWIDTH CWND SENDPIPE RECVPIPE HITS UPD EXP
net.inet.tcp.hostcache.list:
IP address MTU SSTRESH RTT RTTVAR BANDWIDTH CWND SENDPIPE RECVPIPE HITS UPD EXP
192.168.70.158 0 0 18ms 32ms 0 32727 0 0 206290 3913 3600
10.245.108.28 0 0 18ms 31ms 0 32727 0 0 12838 3668 3300
127.0.0.1 0 0 4ms 8ms 0 39696 0 0 195836 54048 3600
net.inet.tcp.hostcache.list:
IP address MTU SSTRESH RTT RTTVAR BANDWIDTH CWND SENDPIPE RECVPIPE HITS UPD EXP
Notice: Do not use the clear, set, or interactive mode commands without instructions or knowledge
of the potential implications.
To observe response times (µs) for cluster DNS servers, run the following command. In the output, note large
variations in response times (or other metrics) between DNS servers.
Example output
To observe distributed cache response times (µs) from peer nodes in the cluster, run the following command.
In the output, note large variations in response times (or other metrics) between nodes.
Example output
To observe cache effectiveness (workload dependent), run the following command. In the output, observe the
hit rate, where above 70% is deemed effective use of cache. Check the count of expired entries, where fewer
entries shows that the workload is cache-friendly.
Example output
Cache:
entries: 10 - entries installed in the cache
max_entries: 33 - entries allocated, including for I/O and free list
expired: 12251 - entries that reached TTL and removed from the cache
probes: 56946 - count of attempts to match an entry in the cache
hits: 40879 (71%) - count of times that a match was found
updates: 3 - entries in the cache replaced with a new reply
response_time: 0.000004 - average turnaround time for cache hits
Protocol operations
Protocol operations most used
You can find what protocol operation is most used by running the following command.
Connection distribution
You can observe the balanced connection distribution across nodes of the cluster by running the following
command.
Slow authentication
You can detect slow or timed-out responses from Windows domain controllers by running the following
command.
Protocol latency
WARNING: In OneFS versions earlier than OneFS 8.x, SMB protocol latency (TimeAvg) numbers can be skewed
by Change Notify operations. Requests for change notifications receive an initial response but also a response
when a file changes in the folder that is being monitored. The response might be immediate, after 3 seconds,
after 3 hours, never, and so on. To verify if Change Notify is elongating latency numbers, report by class. If the
majority of time is spent in file_state, Change Notify is skewing the numbers. This issue is resolved in OneFS
8.0.x and later.
The following table outlines the common expectations about protocol latency times.
< 10 ms Good
20 ms < Bad
50 ms < Investigate
To observe which class of operation is taking the longest, run the following command. The output TimeAvg
must be converted to milliseconds (ms) if you are comparing to the standard expectations in the table. The
output of this command is only meaningful with active traffic.
Example output
To observe the free capacity on the cluster and then the storage pool, run the following commands.
CPU
To observe the 10 processes using the most CPU on each node, run the following command.
Memory
To observe the status of memory for each node, examine the output of the following commands.
To observe the consumption of memory by process, run the following command. Use the less up, down
controls to read the list before closing the less command.
isi_for_array -s 'uptime'
Example output
Node node.disk.xfers.rate.sum
---------------------------------
1 5.400000
2 7.000000
3 0.200000
average 4.200000
---------------------------------
Total: 4
To check the status of the nodes, run the following command and observe the Out column of Throughput to
determine the load of the nodes. Assess the throughput balance across the nodes in relation to the IP
connections.
isi status
Utilization/ Busy Average of time the disk was busy over the sample interval
Disk Percent Busy Average of time the disk was busy over an interval, expressed as a percentage
Service Time Time from the device controller request to the end of transfer , including delays
due to queuing and latency
Response Time Disk service time plus all other delays such as network, until data is at the host
Throughput Average amount of data transferred within a period of time (for example, MB/s)
Time in Queue Average time requests waited in queue to be processed by the disk
OneFS reports metrics in a standard way. However, in OneFS the I/O per second (IOPS) measurement is
measured where the file system dispatches requests to the device queue. I/O requests can be coalesced while
in the device queue, and, for SAS drives, multiple drive operations can be executed in parallel. Usually, the
end result is that the drive sees far fewer operations than are fed into the top of the device queue. OneFS,
however, reports from the top of the device queue, which can result in IOPS numbers that are higher than
those given by block devices and that are higher than expected. You should take this fact into account when
commenting on OneFS IOPS measurements.
The following IOPS measurements are typical for each drive type.
To obtain the maximum, minimum, and average values for disk time in queue for SATA drives, run the
following command. For information on SAS drives, you can include SAS instead of SATA.
isi statistics drive --nodes=all --degraded --no-header --no-footer | awk ' /SATA/
{sum+=$8; max=0; min=1000} {if ($8>max) max=$8; if ($8<min) min=$8} END {print "Min =
",min; print "Max = ",max; print "Average = ",sum/NR}'
To display time in queue for 30 drives sorted highest-to-lowest, run the following command:
To obtain the maximum, minimum, and average values for disk queue depth of SATA drives, run the following
command. For information on SAS drives, you can include SAS instead of SATA. If a large difference exists
between the maximum number and average number in the queue, conduct further investigation to see if an
individual drive is working excessively.
isi statistics drive --nodes=all --degraded --no-header --no-footer | awk ' /SATA/
{sum+=$9; max=0; min=1000} {if ($9>max) max=$9; if ($9<min) min=$9} END {print "Min =
",min; print "Max = ",max; print "Average = ",sum/NR}'
To display queue depth for 30 drives sorted highest-to-lowest, run the following command:
isi statistics drive --nodes=all --degraded --no-header --no-footer | awk ' /SATA/
{sum+=$10; max=0; min=1000} {if ($10>max) max=$10; if ($10<min) min=$10} END {print
"Min = ",min; print "Max = ",max; print "Average = ",sum/NR}'
To display disk percent busy for 30 drives sorted highest-to-lowest issue, run the following command.
How to use iperf between a client and server to measure basic network throughput
Help with Online Support For questions specific to EMC Online Support site
registration or access, email support@emc.com.