Beruflich Dokumente
Kultur Dokumente
Navisphere Analyzer
Purpose of the script: EMC Navisphere Analyzer allows you to view storage system performance statistics in various
types of charts. These charts can help you find and anticipate bottlenecks in the disk storage component of a computer
system. Today’s session will take a look at the Navisphere Analyzer User Interface. In particular, it will be covering the
different views available and provide some basic starting point in checking if your existing configuration is being stressed
or working in a well utilized manner. This script is designed for use with Navisphere Manager 6.x software.
Please do not alter any of the workstation or CLARiiON Storage System configuration details unless instructed
to do so by these instructions or by a member of the EMC presentation team.
In this session there are primarily two exercises covering archive retrieval, viewing and on-array real time analysis.
In addition to the instructions for these exercises, you will find more exercises and reference material in this handout.
If you have time to do so, please explore those additional sections during this session.
1
EMC World – May 2010
2
EMC World – May 2010
The newest file listed could be up to 5.5 hours old so you may need to
Create New to force the logger to create a new archive containing
recent statistical data from its buffer.
You have the option of retrieving archives from SP-A or SP-B, and
although they should contain almost identical data, it is worth
retrieving from both SP’s in case there’s any problem with viewing
one of the files, or one SP may have been rebooted during an archive
and will miss samples during that time.
3
EMC World – May 2010
Service is called NaviGovenor Note; this is only required for Off Array management and offline
Analyzer Archive file viewing.
Login <Enter the user as emcw and You will be presented with the standard Navisphere view of the
password emcw> Domain you logged into.
4
EMC World – May 2010
Customize is also
available for the
In the Archives TAB view, check the array environment so
default path for archives, and the when you set an
Performance Survey for the initial array option, it will
view, and check the Initially Check remain set for
All Tree Objects box is ticked. anyone logging into
the array for viewing
When you have many objects you may real-time data
wish to be more granular on covered in the next
selections and chose not to initially exercise.
check all objects.
5
EMC World – May 2010
____________________________
6
EMC World – May 2010
7
EMC World – May 2010
If none seen, good reason to re-check Although we don’t have any forced flushes here, write throughput is
that cache is on, although none is also the reported write cache hits combined with any forced flushes i.e. a
indicating write cache not being write causing page(s) of write cache to be flushed to make room for
worked too hard. the write do not count in the write cache hit total (unless the write size
satisfies the write-aside and bypasses cache – more performance
architecture understanding required if that wasn’t understood).
8
EMC World – May 2010
9
EMC World – May 2010
The disks at varying times are working very hard. We can see them
reaching over 350 IOPs per disk. Now, for small random IO we have a
rule of thumb (ROT) stating a 15K rpm disk can be used for 180 IOPs
for mixed random load with good response time. When running them
at higher loads we can expect an impact in the response time observed.
10
EMC World – May 2010
<Uncheck Queue>
You can see the Response Time follows the average busy queue depth
as the service time was pretty stable.
11
EMC World – May 2010
12
EMC World – May 2010
13
EMC World – May 2010
This exercise is to direct you around some of the views in Analyzer while the array is under a
simulated load from a Windows server. You’ll be directed to look at some of the key statistics
that indicate if a system is functioning within acceptable parameters. This exercise is to extend
your experience and expand upon some descriptions of those statistics you are looking at – select
and deselect components and statistics to overlay graphs – but consider the scale of selections
such that high IO/s on the same graph as disk queue length will not be easy to distinguish queue
variation. If you have time you’ll be directed to look at some specific statistics in order to
determine where there is a problem with the current load on the array.
Login <Enter the user as emcw and You will be presented with the standard Navisphere view of the
password emcw> Domain you logged into.
Also refer back to Exercise-A for You have already looked at the logging mechanism so now we want to
customize options required to be set start viewing real-time statistics.
on the array
If only the Local Domain is shown, Performance statistics can be viewed for individual components or you
expand the view by clicking on the + can select the storage system and then view a selection of components.
by the Domain icon. To get this window, you can select the array, SP, raid group, Thin
Pool, storage group, LUN, Thin LUN or disk to choose which analyzer
<Right Click on your array and move view to look at. Here we’ll select the array to present all objects
the pointer over the Analyzer available.
selection to see the expanded list of
options>
The Performance survey view will start to plot current statistics based
on a 60 second sample period – please wait until you have at least two
plots to continue i.e. wait at least 2 minutes for the plots to show.
14
EMC World – May 2010
15
EMC World – May 2010
Dirty Pages
Dirty pages are protected write cache To help when
data that hasn’t been committed to viewing a graph
disk yet. plot, you can click
To see appropriate value selections on the legend item
available in the lower left part of the in the lower right
detail view, you must select a window pane and it
component item in the upper left part will highlight that
of the view. Dirty Pages will only be statistic in the
an available option when you have graph. Also, you
clicked on a Storage Processor (SP). can customize the
Dirty pages that peak at 99% indicate graph views by
cache saturation resulting in force right-clicking on
flushing that can hurt performance. the graph and
We’ll look at LUN force flushes later selecting Chart
in this exercise. Configuration
.
LUN Bandwidth
Selecting both read and write Remember to
bandwidth tells us about the load on uncheck
the LUN however you will need to previous
check IO sizes and data locality to viewed
determine if the values seen are selections to
change the
expected based on the load.
graph scale,
We can check locality by looking at unless you
seek distances at the drive level later need to see
in this exercise. how one
statistic plots
against
another one.
16
EMC World – May 2010
Looking at the average LUN IO size for read or write in the detail
view can be misleading as it will be an average and a low write IO rate
will not be accurately shown. You really need to use the IO
Distribution Summary for the LUN to see the IO distribution.
17
EMC World – May 2010
You could look at the disk Average Cached writes may also result in bursty activity at the disk
Busy Queue Length and compare level as write cache flushes data. The trick is to not let that
with the Queue Length. If the activity lead you to think your host activity is bursty when it
Average is bigger than the reported isn’t.
Queue Length, this maybe an
indication of bursty activity.
LUN & Disks write size
LUN 50 will show this but LUN 2 CLARiiON cache is great at optimizing back-end disk access,
doesn’t – can you explain why? particularly of benefit to Raid 5 and Raid 6 options that have write
Tip: check the LUN 2 IO Distribution penalties associated with small block random write activity.
and write IO rate
18
EMC World – May 2010
In the overview view you can see more Don’t be fooled by settings
detailed properties of cache together that may have been changed
with 3 key statistics for the overall during the logging period
array- Throughput, Bandwidth and though i.e. it may show
Dirty Pages. One particularly useful cache as enabled or disabled
detail is the watermark settings as here, but you should verify
they are not visible anywhere else that with read cache statistics
when looking at an Analyzer NAR file and dirty pages in the other
offline to the array it came from. views
19
EMC World – May 2010
The 2 exercises have explored the options and views available to you, with some explanation and
guidance on what the statistics mean and how they help in characterizing your IO.
This following section, should you have time to look at it, will guide you to look at the loads on a
specific set of LUNs sharing the same set of disks.
You have explored the views, now the task is to analyze a specific area where we have an issue.
Now, please explore the interface and
look at the following attributes for this
load on the array with a focus on Raid
Group 0, LUNs 50 & 51. Look at the
following for each of these LUNs and
see if you can draw any conclusions
(make notes on the worksheet table at the back of
this handout);
20
EMC World – May 2010
Although these exercises cover the SnapView Snapshot sessions Background zeroing for bind operations
performance statistics from host SnapView clones Background verifying
MirrorView/S activity
accessing LUNs in the array, there
MirrorView/A activity
maybe additional load generated SAN Copy activity
internal to the array. This load could Raid Group rebuild activity
include those shown opposite; Hot spare equalize activity
LUN migration operations
21
EMC World – May 2010
This will start a command window Scope will be 1 if the account details used are local and not global.
that will go to the default installation
directory c:\Program Do not do this here but you can reset the statistical data by using the
Files\EMC\Navisphere CLI following command if you are looking to collect data for a specific
test period only and you are not interested in previous collected data;
(Username and password will be “naviseccli –user <username> -password <pwd> -scope <0¦1>
emcw for the following commands) -address <SP IP> analyzer –logging -reset”
<Retrieve the Navisphere archive files The username and password can be omitted if you have setup the
using the following command; security file for NaviSeccli.
“naviseccli –user <username> - The username used here does not have privileges to reset data logging
password <password> -scope 0 on the arrays being used.
–address <SP IP> analyzer –archive
-all” Note; the desktop shortcut used here is not created for you during
installation. You have to do that yourself if you want that shortcut
Be careful with this command as you available on your own systems.
may have many archives to download
when selecting all and it could take a Prior to release 24 you would need to use the java archiveretrieve
long time to complete. By omitting command to get the archive from the array;
the “all” you are presented with a
selection list where you can select “java –jar archiveretrieve.jar –User <username> -Password
one or more archives to retrieve. <password> -Scope 0 –Address <array IP> –File archive_emc.nar –
Location “C:\program files\emc\Navisphere cli” –Overwrite 1 –Retry 2
–v”>
22
EMC World – May 2010
This example outputs stats for SP’s, If you leave off the qualifier –object, it will output all statistics for all
LUNs and disks (-object s,l,d). To get objects.
stats for metaluns, etc, please consult
the document; Navisphere Analyzer The Navisphere UI has an Analyzer dump wizard that guides you
Administrator's Guide.pdf through device and attributes selection prior to dumping to a CSV file.
23
EMC World – May 2010
You can try the archivedump “naviseccli –user <username> -password <password> -scope 0
command and be more specific on –address <SP IP> analyzer -archivedump -data test1.nar -out
some qualifiers shown previously. test1.csv -object s –format u,dp”
You could also have a go at the dump This will output test1.csv containing SP statistics of utilization and
wizard from the Analyzer drop down write cache dirty pages.
in the Navisphere Manager GUI –
Another option is archivemerge; used to merge multiple NAR files
using off-array Navisphere.
together. We don’t use that in this session but remember that this is
useful if you want to view data access trends that span more than the
typical NAR file size of 5 hours.
It is not necessary to merge nar files from both SP’s as each SP has the
same data.
24
EMC World – May 2010
25
EMC World – May 2010
Supplemental notes
The following operations are executed at the disk level to provide data integrity features associated with
redundant RAID types as well as consistency of data stripes that could be at risk due to media issues.
Background zero; Before user data can be written to the physical disks within a LUN, the area has to
undergo a zero operation. New disks are initially supplied in a zero state where data can be written to the
disks immediately after binding LUNs, however if the disks have been used before i.e. bound and
unbound, they have to be re-zeroed.
You can zero the disks using a naviseccli command in readiness for grouping and binding LUNs later on
or the array will zero the disks when you create new LUNs on them. This zero operation results in
512KB SCSI write-same commands to the disks in a sequential manner, unless the array has to zero-on-
demand an area the user is writing too that is in the queue but hasn’t been zeroed yet. There is some
other small activity on the disks during zeroing as checkpoint operations keep track of progress.
Typically with no access to the LUNs any zeroing will complete in a matter of a few hours although a
busy array and activity to the disks being zeroed will delay completion. Also, the 512KB write-same
command will not consume back end bandwidth but will affect disk load and utilization.
Background verify; This operation is validating consistency of data protection at the disk level and is
automatically performed on newly created LUNs. The IO profile at the disk level is 64KB reads and like
zeroing, can take hours to complete and is also governed by array and disk activity.
Background zero, zero-on-demand, and background verify operations exhibit relatively large IO sizes
that can affect one’s analysis of the array. Also, if considering user testing its worth noting these
operations may affect the performance the array can present due to the parallel action of user data
access and these preliminary operations.
Also be aware these operations run in a sequential manner for any given raid group(RG) e.g. if you bind
5 LUNs on a RG 0 through 4, LUN 0 will start to zero and when complete will perform a background
verify. This is followed by the second LUN in that RG. Each LUN will zero then verify until all newly
created LUNs complete that process. Thereafter the only regular IO you will see at the disk level due to
internal operations will be SNiiFFER where you will see approximately 1 IO per second at 512KB in size
to each disk in a RG. SNiiFFER is a data checking operation that cycles through every block in every
LUN in the array to ensure data availability, even for data you might not have touched for months/years.
Any data inconsistency detected through SNiiFFER will automatically invoke recovery and remap of
affected blocks. RG’s will run through zero, verify and SNiiFFER operations independent to each-other.
Zeroing will have the most effect on performance so consider this when testing. Verify may have a small
effect and SNiiFFER will have a negligible effect on performance.
Always check disk stats to see what IO sizes are taking place at that level. With a RG idle, disk activity
showing 512KB writes indicate zeroing, 64KB reads indicate verifying and 512KB reads indicate sniffing.
{end}
26
EMC World – May 2010
LUN ID 50 51
Owner SP
SP Utilization
LUN Read IOPs
LUN Read size
LUN Write IOPs
LUN Write size
LUN Read MB/s
LUN Write MB/s
LUN response time
LUN Queue
Disk Read IOPs
Disk Read size
Disk Write IOPs
Disk Write size
Disk Queue
Average disk seek
Disk response time
LUN ID
Owner SP
SP Utilization
LUN Read IOPs
LUN Read size
LUN Write IOPs
LUN Write size
LUN Read MB/s
LUN Write MB/s
LUN response time
LUN Queue
Disk Read IOPs
Disk Read size
Disk Write IOPs
Disk Write size
Disk Queue
Average disk seek
Disk response time
27