Beruflich Dokumente
Kultur Dokumente
AND MANAGEMENT
Version 8.2
PARTICIPANT GUIDE
PARTICIPANT GUIDE
Dell Confidential and Proprietary
Copyright © 2019 Dell Inc. or its subsidiaries. All Rights Reserved. Dell Technologies,
Dell, EMC, Dell EMC and other trademarks are trademarks of Dell Inc. or its
subsidiaries. Other trademarks may be trademarks of their respective owners.
Course Introduction.................................................................................. 1
Module 1 ................................................................................................................... 10
Isilon Community ............................................................................................................... 11
Scenario ............................................................................................................................ 12
Summary................................................................................................................... 73
Module 2 - Networking............................................................................ 74
Module 2 ................................................................................................................... 75
Current Progression ........................................................................................................... 76
Module 2 Goal - Configure Features to Enable Access ...................................................... 77
Module 2 Structure............................................................................................................. 78
Scenario ............................................................................................................................ 79
Summary................................................................................................................. 144
Summary................................................................................................................. 280
Summary................................................................................................................. 349
This course takes you on a journey from a freshly installed cluster to a configured
cluster using Isilon's features and functions. During your journey, you will confront
challenges that need to be solved. Challenges include configuration and
administration tasks, participating in class discussions, and providing feedback
and answers the questions presented.
Course Objectives
Prerequisite Skills
Course Agenda
Module 1 - 4 Topics
Module 5 - 8 Topics
Introductions
Introduction
Module 1
Introduction
Isilon Community
The
Links
Isilon Info Hub: https://community.emc.com/docs/DOC-44304
Customer Troubleshooting Hub: https://community.emc.com/docs/DOC-49017
Self-Service Platform Hub: https://community.emc.com/docs/DOC-52103
SolVe: https://community.emc.com/community/support/solve
Support: https://community.emc.com/community/products/isilon#support
Scenario
Introduction
Scenario
Isilon clusters are a network attached storage or NAS solution. NAS has two
architectures, scale-up and scale-out. With a scale-up platform, if more storage is
needed, another independent NAS system is added to the network. Scale-up
storage is the traditional architecture that is dominant in the enterprise space.
Extremely high performance, high availability single systems that have a fixed
capacity ceiling characterize scale-up. A scale-up solution has controllers that
connect to trays of disks and provide the computational throughput. The two
controllers can run active-active or active-passive. For more capacity, add another
disk array. Each of these components is added individually. As more systems are
added, NAS sprawl becomes an issue. Traditional NAS is great for specific types of
workflows, especially those applications that require block-level access.
With a clustered NAS solutions, or scale-out architecture, all the NAS boxes, or
Isilon nodes, belong to a unified cluster with a single point of management. In a
scale-out solution, the computational throughput, disks, disk protection, and
management are combined and exist for a single cluster. Not all clustered NAS
solutions are the same. Some vendors overlay a management interface across
multiple independent NAS boxes. This gives a unified management interface, but
does not unify the file system. While this approach does ease the management
overhead of traditional NAS, it still does not scale well.
Scale-Out NAS
Scale-out NAS is now a mainstay in most data center environments and is growing
8 times faster than the NAS Technical Acceptance Model. The next wave of scale-
out NAS innovation has enterprises embracing the value of NAS and adopting it as
the core of their infrastructure. Enterprises want to raise the bar on enterprise
grade resilience, with a no tolerance attitude toward data loss and data unavailable
situations and support for features to simplify management. Organizations need to
see massive scale and performance with smaller data center rack footprints that
are driven by performance centric workloads. Enterprises have increased the need
for consistent high performance and infinite scale possibilities where organizations
can see an increase of 2 to 2 1/2 times by 2020.
With traditional NAS systems the file system, volume manager, and the
implementation of RAID are all separate entities. The file system is responsible for
the higher-level functions of authentication and authorization. The volume manager
controls the layout of the data while RAID controls the protection of the data. The
functions of each are clearly defined and separate. OneFS is the operating system
and the underlying file system that drives and stores data. OneFS creates a single
file system for the cluster. OneFS also performs the duties of the volume manager
and applies protection to the cluster as a whole. There is no partitioning, and no
need for volume creation. Because all information is shared among nodes, the
entire file system is accessible by clients connecting to any node in the cluster. All
data is striped across all nodes. As nodes are added, the file system grows
dynamically and content is redistributed. Each Isilon storage node contains globally
coherent RAM, meaning that, as a cluster becomes larger, it also becomes faster.
When adding a node, the performance scales linearly.
OneFS Architecture
Shown are clients connecting to the resources stored on an Isilon cluster using
standard file access protocols. Each cluster node is also connected to a back-end
GbE or InfiniBand network that enables communication and coordination.
Isilon Nodes
The basic building block of an Isilon cluster is a node. Nodes are the hardware on
which the OneFS runs. Every node is a peer to every other node in a cluster. Each
node in the cluster has the ability to handle a data request. No single node acts as
the controller or the filer. OneFS unites all the nodes into a globally coherent pool
of memory, CPU, and capacity. As new nodes are added to the cluster, the
aggregate disk, cache, CPU, and network capacity of the cluster increases. Gen 6
nodes have internal M.2 vault disks that are required for the node journal, and have
a battery backup. In Gen 6, a node mirrors its journal to its paired node.
Challenge
Introduction
Scenario
To address the challenge of predicting performance as the Isilon cluster scales, the
Gen 6 platform was designed to optimize hardware components in order to
maximize performance. The predefined compute bundles optimize memory, CPU
and cache to simplify configuration selection based on an organization's
performance, capacity and cost profile. In order to focus on scale, Isilon leverages
standard technologies to eventually target a greater than 400 node capacity. With
the OneFS 8.2.0 release, the cluster maximum node limit is 252 nodes. Changes to
the back-end infrastructure, such as adopting Ethernet for back-end communication
between nodes, allows us to push through the limitations enforced by older
technologies.
A good use case for performance and scale is media and entertainment, or M&E.
An M&E production house needs high single stream performance at PB scale that
is cost optimized. The organization requires cloud archive in a single name space,
archive optimized density with a low TCO solution. This environment typically has
large capacities and employs new performance technologies at will.
Data Protection
To improve cluster resilience Gen 6 nodes focused on removing any single point of
failure. For example, Gen 6 has no dependency on the flash boot drive. Gen 6
nodes boot from boot partitions on the data drives. These drives are protected
using erasure coding to remove the dependency on dedicated boot drives. Next,
Gen 6 uses SSD drives for the journal to remove the NVRAM dependency present
on Gen 5 nodes. There are now multiple distributed copies of the journal.
Along with changes to the boot partitions and the journal, Gen 6 decreased the size
of the failure domains. By creating smaller failure domains with significantly fewer
drives in each node pool and neighborhood, increases the reliability of the system
by reducing the spindle-to-CPU ratio. The increased reliability enables the cluster
to use larger capacity drives, without the risk of overburdening the system in the
event of a drive failure. A use case is an organization in the financial sector that
focuses on data protection and availability.
Sizing
To address the challenges of agility and lower TCO requires a predictable sizing,
planning and support environment. The ability to start small (with high storage
efficiency) and then grow performance and/or capacity easily and non-disruptively
is crucial. Gen 6 supports in-place compute upgrades and the ability to grow cache.
Gen 6 incorporates dedicated cache drives and offers one or two SSD
configurations in various capacities to maximize front end performance.
The Gen 6 family has six different offerings that are based on the need for
performance and capacity. Because Gen 6 is a modular architecture, you can scale
out compute and capacity separately. The F800 is the all-flash array with ultra
compute and high capacity. The F800 sits at the top of both the performance and
capacity. When the F800 pairs with 15.4-TB drives, it has the distinction of both the
fastest and densest node in the product line. Next, in terms of compute power, are
the H600 and H500 nodes. The H is for "hybrid" targeting both performance with a
level of capacity. The H600 and H500 are spinning media nodes with variable
compute. The H600 combines turbo compute and 2.5" SAS drives. The H500 is
comparable to a top of the line X410, a high compute bundle with SATA drives. The
H400 uses a medium compute bundle with SATA 4kN drives. The A200 uses the
low compute bundle, and the front-end network is only offered at 10 GbE. The
A2000 is a deep archive solution with the lowest cost per TB.
Gen 6 Components
Shown is the rear view and front view of a Gen 6 chassis. The chassis holds four
compute nodes and 20 drive sled slots. The chassis comes in two different depths,
the normal depth is about 37 inches and the deep chassis is about 40 inches.
Examining the compute nodes first, compute module bay 1 and 2 make up one
node pair and bay 3 and 4 make up the other node pair. Scaling out a cluster with
Gen 6 nodes is done by added node pairs. Each node can have 1 or 2 SSDs that
are used as L3 cache, global namespace acceleration (GNA), or other SSD
strategies. In the event of a compute module power supply failure, the power
supply from the peer compute module in the node pair will temporarily provide
power to both nodes. Gen 6 nodes do not have power buttons, both compute
modules in a node pair power on immediately when one is connected to a power
source. 10 GbE and 40 GbE are the connectivity for client and application. For
backend communication, a Gen 6 node supports 10 GbE, 40 GbE, and InfiniBand.
10 GbE backend is used in A2000 and A200 nodes that are members of an new
Gen 6 cluster. InfiniBand with Gen 6 nodes is only used when Gen 6 nodes are
added to a cluster that has, or had, older generation nodes.
Gen 6 nodes have an increased journal size that increases storage performance.
Larger journals offer more flexibility in determining when data should be moved to
disk. Each node has a dedicated M.2 vault drive for the journal. Node mirror their
journal to its peer node. The node writes the journal contents to the vault in the
event of power loss. A backup battery helps maintain power while data is stored in
the vault.
Each node has five corresponding slots for drive sleds in the chassis. Depending
on the length of the chassis and type of drive, each node can handle up to 30
drives or as few as 15. Nodes require a consistent set of drive types in each sled.
The sleds themselves are either the deep sled or a standard sled. For a standard
sled, the 3.5" SATA drives only fit 3 drives per sled, whereas the 2.5" SAS or Flash
drives fit 3 or 6 drives per sled. A long sled fits 4, 3.4" drives per sled.
Node Interconnectivity
There are two speeds for the back-end Ethernet switches, 10 GbE and 40 GbE.
Some nodes, such as archival nodes, might not need to use all of a 10 GbE port
bandwidth while other workflows might need the full utilization of the 40 GbE port
bandwidth. Ethernet has all the performance characteristics needed to make it
comparable to InfiniBand. Administrators should not see any performance
differences if moving from InfiniBand to Ethernet. Isilon nodes with different
backend speeds can connect to the same backend switch and not see any
performance issues. For example, an environment has a mixed cluster where A200
nodes have 10 GbE backend ports and H600 nodes have 40 GbE backend ports.
Both node types can connect to a 40 GbE switch without effecting the performance
of other nodes on the switch. The 40 GbE switch provides 40 GbE to the H600
nodes and 10 GbE to the A200 nodes. The Ethernet performance is the same so
there should be no performance or bottlenecks with mixed performance nodes in a
single cluster.
The port that the 40 GbE uses is the same as the one the InfiniBand uses. You
cannot identify the backend from looking at the node. If you plug Ethernet into the
InfiniBand NIC, it switches the backend NIC from one mode to the other and will
not come back to the same state. Do not plug a backend Ethernet topology into a
backend InfiniBand NIC. One slot will always be for the backend and one will
always be for the frontend. A new, all Gen 6 cluster, supports Ethernet only.
The Gen 6 back-end topology in OneFS 8.2.0 supports scaling an Isilon cluster to
252 nodes. Shown in the graphic is an example of a leaf-spine topology for a
cluster with 132 nodes. The new topology uses the maximum internal bandwidth
and 32-port count of Dell Z9100 switches. Leaf-spine is a two level hierarchy where
nodes connect to leaf switches, and leaf switches connects to spine switches. Leaf
switches do not connect to one another, and spine switches do not connect to one
another. Each leaf switch connects with each spine switch and all leaf switches
have the same number of uplinks to the spine switches. When planning for growth,
F800 and H600 nodes should connect over 40 GbE ports whereas A200 nodes
may connect using 4x1 breakout cables. Scale planning enables for nondisruptive
upgrades, meaning as nodes are added, no recabling of the back-end network is
required. Ideally, plan for three years of growth. The table shows the switch
requirements as the cluster scales. Maximum nodes indicate that each node is
connected to a leaf switch using a 40 GbE port.
Leaf-Spine Considerations
Challenge
Introduction
Scenario
Four options are available for managing the cluster. The web administration
interface (WebUI), the command-line interface (CLI), the serial console, or the
platform application programming interface (PAPI), also called the OneFS API. The
first management interface that you may use is a serial console to node 1. A serial
connection using a terminal emulator, such as PuTTY, is used to initially configure
the cluster. The serial console gives you serial access when you cannot or do not
want to use the network. Other reasons for accessing using a serial connection
may be for troubleshooting, site rules, a network outage, and so on. Shown are the
terminal emulator settings.
isi config
The isi config command, pronounced "izzy," opens the configuration console.
The console contains configured settings from when the Wizard ran, and
administrators can use the console to change initial configuration settings.
The changes command displays a list of changes to the cluster configuration that
are entered into the console but have not been applied to the system. joinmode
[<mode>] displays the current cluster add node setting, when run without any
argument. When joinmode appended with the manual argument, it configures the
cluster to add new nodes in a separate, manually run process. When appended
with the secure argument, it configures the cluster to disallow any new node from
joining the cluster externally. The version command shows details of the OneFS
version installed on the cluster. The output information is useful for interpreting
what is happening on a cluster, and for communication with technical support to
resolve a complex issue. When in the isi config console, other configuration
commands are unavailable. Type exit to get back to the default CLI.
The WebUI is a graphical interface that is used to manage the cluster. It requires
that at least one IP address is configured on one of the external Ethernet ports on
one of the nodes. The Ethernet port IP address is either configured manually or by
using the Configuration Wizard. To access the web administration interface from
another computer, use an Internet browser to connect to port 8080. Login using the
root account, admin account, or an account with log on privileges. After opening the
web administration interface, there is a four-hour login timeout.
In OneFS 8.2.0, the WebUI uses the HTML5 doctype, meaning it is HTML5
compliant in the strictest sense, but does not use any HTML specific features.
Previous versions of OneFS require Flash.
Access the CLI out of band using a serial cable connected to the serial port on the
back of each node. As many laptops no longer have a serial port, a USB-serial port
adapter may be needed. The CLI can be accessed in-band once an external IP
address has been configured for the cluster. Both ways are done using any SSH
client such as OpenSSH or PuTTY. Access to the interface changes based on the
assigned privileges. Because OneFS is built upon FreeBSD, you can use many
UNIX-based commands, such as cat, ls, and chmod.
Every node runs OneFS, including the many FreeBSD kernel and system utilities.
OneFS commands are code that is built on top of the UNIX environment and are
specific to OneFS management. The UNIX shell enables scripting and execution of
many UNIX and OneFS commands. The CLI command use includes the capability
to customize the base command with the use of options, also known as switches
and flags. A single command with multiple options result in many different
permutations, and each combination results in different actions performed.
Commands can be used together in compound command structures combining
UNIX commands with customer facing and internal commands. Follow guidelines
and procedures to appropriately implement the scripts to not interfere with regular
cluster operations. Improper use of a command or using the wrong command can
be potentially dangerous to the cluster, the node, or to customer data.
CLI Usage
The man isi or isi --help command is an important command for a new
administrator. The commands provide an explanation of the many isi commands
and command options available. You can also view a basic description of any
command and its available options by typing the -h option after the command.
Licensing
In OneFS versions prior to OneFS 8.1, each licensed feature was represented by
an individual license key. OneFS 8.1 introduces a single license file that contains
all the licensed feature information in a single location. Upgrading to OneFS 8.1,
automatically converts the individual keys present on a cluster to the license file.
This licensing process is seamless, except for clusters without internet access. In
environments with no Internet access, the administrator should consult Isilon
support for assistance in manually licensing the cluster.
Administrators can enable evaluation licenses directly from their cluster. License
management is available through the CLI or the GUI.
Two different numbers, the device ID and logical node number or LNN, identify
nodes. The status advanced command from the isi config sub menu shows
the LNNs and device ID. The lnnset command is used to change an LNN. When
a node joins a cluster, it is assigned a unique node ID number. A LNN is based on
the order a node is joined to the cluster. Device ID numbers are never repeated or
duplicated, and they never change. Unique device IDs make nodes easily
identifiable in logfile entries. For example, if node 3 is replaced with a new node,
the new node is assigned a new device ID, which in this case is 5. Also, if a node is
removed from the cluster and then rejoined, the node is assigned a new device ID.
You can change an LNN in the configuration console for a cluster. The scenario
shown in the graphic changes the LNN to maintain the sequential numbering of the
nodes. Use lnnset <OldNode#> <NewNode#>. The example shows changing
LNN 3 to LNN 5 to match the device ID.
When adding new nodes to a cluster, the cluster gains more CPU, memory, and
disk space. The methods for adding a node are: using the front panel, using the
configuration Wizard, the WebUI, or the CLI and running the isi devices
command. Join the nodes in the order that the nodes should be numbered. Nodes
are automatically assigned node numbers and IP addresses on the internal and
external networks. A node joining the cluster with a newer or older OneFS version
is automatically reimaged to match the OneFS version of the cluster. A reimage
may take up to 5 minutes.
Compatibility
Hardware compatibility is a concern when mixing Gen 4 and Gen 5 nodes. For
example, when adding a single S210 node to a cluster with S200 nodes, will the
S210 node be compatible? Without compatibility, a minimum of three S210 nodes
is required, which creates a separate node pool, meaning node pools from
additional S210 nodes cannot merge with the S200 node pools.
Node series compatibility depends upon the amount of RAM, the SSD size, number
of HDDs, and the OneFS version. The guide details the compatibility requirements
between Gen 4 and Gen 5 nodes. The Isilon Supportability and Compatibility Guide
covers software, protocols, and hardware.
Cluster Shutdown
Administrators can restart or shut down the cluster using the WebUI or the CLI. The
WebUI Hardware page has a tab for Nodes to shut down a specific node, or the
Cluster tab to shut down the cluster. Do not shut down nodes using the UNIX
shutdown –p command, halt command, or reboot command. Using the UNIX
command may result in NVRAM not flushing properly in Gen 5 nodes. Native UNIX
commands do not elegantly interact with OneFS, because the OneFS file system is
built as a separate layer on top of UNIX. The file system may show the node
mounts when it is not connected, and some services can be left with incomplete
operations, or stop responding.
Challenge
Introduction
Scenario
The cluster time property sets the date and time settings, either manually or by
synchronizing with an NTP server. After an NTP server is established, setting the
date or time manually is not allowed. After a cluster is joined to an AD domain,
adding an NTP server can cause time synchronization issues. The NTP server
takes precedence over the SMB time synchronization with AD and overrides the
domain time settings on the cluster. SMB time is enabled by default and is used to
maintain time synchronization between the AD domain time source and the cluster.
Nodes use NTP between themselves to maintain cluster time. When the cluster is
joined to an AD domain, the cluster must stay synchronized with the time on the
domain controller. If the time differential is more than five minutes, authentication
may fail.
The best case support recommendation is to not use SMB time and only use NTP if
possible on both the cluster and the AD domain controller. The NTP source on the
cluster should be the same source as the AD domain NTP source. If SMB time
must be used, disable NTP on the cluster and only use SMB time.
NTP Configuration
By default, if the cluster has more than three nodes, three of the nodes are
selected as chimers. If the cluster has 3 nodes or less, only 1 node will be selected
as a chimer. If no external NTP server is set, they will use the local clock instead.
Chimer nodes are selected by the lowest node number that is not excluded from
chimer duty. Administrators can configure specific chimer node by excluding other
nodes using the isi_ntp_config {add | exclude} <node#> command.
The list excludes nodes using their node numbers that are separated by a space.
Link:
https://edutube.emc.com/Player.aspx?vno=dt8syW/XF3A0nwMwoHFunA==&autopl
ay=true
Shown are the authentication providers that OneFS supports. Active Directory
authenticates and authorizes users and computers in a Windows domain.
Lightweight directory access protocol, or LDAP, is an application protocol for
accessing and maintaining distributed directory information services. Naming
information service, or NIS, provides authentication and identity uniformity across
local area networks. OneFS includes a NIS authentication provider to enable
cluster integration with NIS infrastructure. The local provider authenticates and
looks up facilities for user accounts that an administrator adds. Local authentication
is useful when Active Directory, LDAP, or NIS directory services are not configured
or when a user or application needs access to the cluster. A file provider enables a
third-party source of user and group information. A third party source is useful in
UNIX and Linux environments that synchronize etc/passwd, etc/group, and
etc/netgroup files across multiple servers.
During the process of joining the domain, a single computer account is created for
the entire cluster. If using the WebUI to join the domain, you must enable pop-up
windows in the browser.
OneFS 8.2 includes short names for AD to enable multiple connections to same
AD. The enhancement allows an administrator to create an AD instance, even if the
AD instance for the same domain is exists globally or in different access zone. Use
the -instance option to create different name than its domain name. For
example, isi auth ads create dees.lab –user=administrator –
instance=my-dees.
Commands can use the instance name to refer to the specific AD provider. For
example, isi auth ads modify my-dees –sfu-support=rfc2307. If the
instance names and machine accounts are different, administrators can create two
distinct AD instances that reference the same domain. For example:
Video Link:
https://edutube.emc.com/Player.aspx?vno=Xu/3IyDNSxbuNMOcLHrqBg==&autopl
ay=true
OneFS uses access zones to partition a cluster into multiple virtual containers.
Access zones support configuration settings for authentication and identity
management services. Access zones are discussed shortly.
LDAP Overview
LDAP uses a simple directory service that authenticates users and groups
accessing cluster. It supports Windows and Linux clients. It supports netgroups and
supports the ldapsam schema, which enables NTLM to authenticate over SMB.
LDAP is often used as a meta directory. It sits between other directory systems and
translates between them, acting as a sort of bridge directory service. It enables
users to access resources between disparate directory services or as a single sign-
on resource. It does not offer advanced features that exist in other directory
services such as Active Directory.
Each LDAP entry has a set of attributes. Each attribute has a name and one or
more values that are associated with it that is similar to the directory structure in
AD. Each entry consists of a distinguished name, or DN, which also contains a
relative distinguished name (RDN). The base DN is also known as a search DN
because a given base DN is used as the starting point for any directory search.
Link:
https://edutube.emc.com/Player.aspx?vno=JKBFLVJaUoqGz8DJmH4zqg==&autop
lay=true
Challenge
Introduction
Scenario
OneFS 8.2.0 includes support for Multi-Factor Authentication (MFA) with the Duo
service, configuring SSH using the CLI, and the storing of public SSH keys in
LDAP. The enhancements give a consistent configuration experience, greater
security, and tighter access control for SSH access.
Duo MFA supports the Duo App, SMS, Voice, and USB Keys. Duo requires an
account with the Duo service (duo.com). Duo provides the host, integration key
(ikey), and secret key (skey) needed for configuration. The ikey is a key for the
account, and the skey should be treated as a secure credential. Duo can be
disabled and re-enabled without reentering the host, ikey, and skey.
Duo MFA is on top of existing password and/or public key requirements. Duo
cannot be configured if the SSH authentication type is set to any. Specific users or
groups can bypass MFA if specified on the Duo server. Duo enables the creation of
one time or date/time limited bypass keys for a specific user. A bypass key does
not work if auto push is set to true as no prompt option is shown to the user.
Note that Duo uses a simple name match and is not AD aware. The AD user
‘DOMAIN\john’ and the LDAP user ‘john’ are the same user to Duo.
In the first step, the process generates 3 components to use on the Isilon cluster to
finalize integration with Duo service, the integration key, the secret key, and the
API hostname. When configuring on Duo services, the Isilon cluster is represented
as "unix application". The second steps adds users to the Duo service and
configures how the user gets Duo notifications. In addition to the phone number,
other devices can be linked with user account such as YUBI keys, hardware tokens
(must be plugged-in to the computer), and tablet/smartphone with Duo Mobile App.
The third step is to use the isi ssh modify command to configure on the
cluster.
Specify group option for use with the Duo service or for exclusion from the Duo
service. One or more groups can be associated. Shown are the three types of
groups you can configure. Administrators can create a local or remote provider
group as an exclusion group using the CLI. Users in this group are not prompted
for a Duo key. Note that zsh may require to escape the ‘!’. If using such an
exclusion group, precede it by an asterisk to ensure that all other groups require
the Duo One Time Key (“--groups=“*,!<group>”).
Note that OneFS checks the exclusion before contacting Duo. This is a method for
creating users that can SSH into the cluster when the Duo Service is not available
and failback mode is set to secure.
SSH now has CLI support to view and configure exposed settings, isi ssh
settings view and isi ssh settings modify. Also, public keys that are
stored in LDAP may now be used by SSH for authentication. An upgrade imports
the existing SSH configuration into gconfig. The upgrade includes settings
exposed and not exposed by the CLI.
Note that the current SSH session stays connected after configuration changes are
made. Keep the session open until the configuration changes are tested. Closing
the current session with a bad configuration may prevent SSH login.
Settings are configured using the isi ssh settings modify command. Note
that match blocks usually span multiple lines. If the option starts with --match=“,
zsh allows line returns and spaces until reaching the end quote (“).
The isi auth duo modify command is used to configure the MFA. The
example shows enabling Duo with autopush set the false, meaning the user is
prompted with a list of devices. The failmode is set to safe. Two modes
determine Duo behavior when the service is unavailable, safe mode and secure
mode. safe mode SSH allows normal authentication, and when secure mode is
set, authentication fails even for bypass users.
OneFS 8.2.0 enables the use of public SSH keys from LDAP rather than from a
users home directory on the cluster. The most common attribute for the --ssh-
public-key-attribute option is the sshPublicKeyattribute from the
ldapPublicKey objectClass. You can specify multiple keys in the LDAP
configuration. While there is a match, the key that corresponds to the private key
presented in the SSH session is used. The user needs a home directory on the
cluster, without a home directory the user gets an error when logging in.
Authentication Process
Shown in the table is the SSH authentication difference between the process in
prior versions of OneFS 8.2.0 and the process in OneFS 8.2.0.
Challenge
Summary
Introduction
Module 2
Introduction
Current Progression
This module discusses the other building blocks for client access, starting with
access zones.
Module 2 Structure
The graphic shows the different areas when configuring an IP address pool.
Module 2 is structured to the flow of the configuration.
Scenario
Introduction
Scenario
Although the default view of a cluster is that of one physical machine, you can
partition a cluster into multiple virtual containers called access zones. Access
zones enable you to isolate data and control who can access data in each zone.
Access zones support configuration settings for authentication and identity
management services on a cluster. Configure authentication providers and
provision protocol directories, such as SMB shares and NFS exports, on a zone-by-
zone basis. Creating an access zone, automatically creates a local provider, which
enables you to configure each access zone with a list of local users and groups.
You can also authenticate through a different authentication provider in each
access zone.
The System access zone is the default access zone within the cluster. The System
access zone is configured by OneFS. By default, all cluster IP addresses connect
to the System zone. The System zone automatically references groupnet0 on the
cluster.
The example in this slide displays two more zones that are created, the finance
access zone and the engineering, or eng, access zone. Only an administrator who
is connected through the System access zone can configure access zones. Each
access zone has their own authentication providers configured. Multiple instances
of the same provider can occur in different access zones though doing this is not a
best practice.
SMB shares bound to an access zone are only accessible to users connecting to
the SmartConnect zone and IP pool that aligns to the access zone. Assigning SMB
authentication and access is done to any specific access zone.
A good practice is to create unique base directories for each access zone. OneFS
creates a /ifs/data directory, but avoid it as a base directory. Splitting data by
access zone is the recommended implementation method. However, a few
workflows can benefit from having one access zone being able to see the dataset
of another access zone. For example, creating a /ifs/eng/dvt for the access
zone base, which is inside the eng access zone base directory. Overlapping
access zones enables the eng workers to put data on a cluster, while enabling the
dvt workers to take that data and use it. When you set it up this way, you maintain
the different authentication contexts while enabling the second group access.
There are three things to know about joining multiple authentication sources
through access zones. Joined authentication sources do not belong to any zone,
meaning the zone does not own the authentication source. Because there is no
ownership, other zones can include an authentication source that may be in use by
an existing zone. For example, although the finance zone has provider DEES.LAB,
the administrator can create the sales zone with the DEES.LAB provider also.
Second, when joining AD domains, only join domains that are not in the same
forest. AD manages trusts within the same forest, and joining them could enable
unwanted authentication between zones.
Video Link:
https://edutube.emc.com/html5/videoPlayer.htm?vno=08ieHpVlyvyD+A8mTzHopA
You can avoid configuration problems on the cluster when creating access zones
by following best practices guidelines. Create unique base directories. To achieve
data isolation, use a unique base directory path of each access zone. Base
directory paths should not overlap or be nested inside the base directory of another
access zone. Overlapping is allowed, but should only be used if your workflows
require shared data. Separate the function of the System zone from other access
zones. Reserve the System zone for configuration access, and create more zones
for data access. To isolate data access for different clients or users, create access
zones. Do not create access zones if a workflow requires data sharing between
different classes of clients or users. Avoid overlapping UID or GID ranges for
authentication providers in the same access zone. The potential for zone access
conflicts is slight, but possible if overlapping UIDs or GIDs are present in the same
access zone.
Challenge
Introduction
Scenario
Groupnets reside at the top tier of the networking hierarchy and are the
configuration level for managing multiple tenants on your external network. A
groupnet is a container that includes subnets, IP address pools, and provisioning
rules. Groupnets can contain one or more subnets, and every subnet is assigned to
a single groupnet. Each cluster has a default groupnet named groupnet0.
Groupnet0 contains an initial subnet, subnet0, an initial IP address pool, pool0, and
an initial provisioning rule, rule0.
Groupnets are how the cluster communicates with the world. DNS client settings,
such as name servers and a DNS search list, are properties of the groupnet. If the
cluster communicates to another customer’s authentication domain, your cluster
needs to find that domain. To find another authentication domain, you need a DNS
setting to route to that domain. With OneFS 8.0 and later releases, groupnets can
contain individual DNS settings, whereas prior OneFS versions had a single global
entry.
Because groupnets are the top networking configuration object, they have a close
relationship with access zones and the authentication providers. Having multiple
groupnets on the cluster means that you are configuring access to separate and
different networks, which are shown as org1 and org2. Different groupnets enable
portions of the cluster to have different networking properties for name resolution.
Configure another groupnet if separate DNS settings are required. If necessary, but
not required, you can have a different groupnet for every access zone. The
limitation of 50 access zones enables the creation of up to 50 groupnets.
When the cluster joins an Active Directory server, the cluster must know which
network to use for external communication to the external AD domain. Because of
this, if you have a groupnet, both the access zone and authentication provider must
exist within same groupnet. Access zones and authentication providers must exist
within only one groupnet. Active Directory provider org2 must exist in within the
same groupnet as access zone org2.
Configuring Groupnets
Shown is the Cluster management > Network configuration > external network
> Add a groupnet window. When creating a groupnet with access zones and
providers in the same zone, you need to create them in the proper order. First,
create the groupnet. Then create the access zone and assign to the groupnet.
Next, create the subnet and pool. Then add the authentication provider and
associate them with the groupnet. Finally, associate the authentication providers
with the access zone.
Challenge
Introduction
Scenario
Connectivity Overview
In a cluster there are two types of networks, an internal network and an external
network. The internal network enables nodes to communicate with each other
using a high-speed low latency Ethernet internal network. In an all Gen 6 cluster
running OneFS 8.2.0, the internal network follows the leaf and spine topology. A
second internal network enables failover for redundancy. The external network
enables client connectivity to the cluster using Ethernet. The Isilon cluster supports
network communication protocols including NFS, SMB, HDFS, HTTP, FTP, and
Swift. The cluster includes various external Ethernet connections providing
flexibility for a wide variety of network configurations.
While working on the cluster connectivity, ask the 'big picture' questions:
What does the application workflow look like?
Do you need direct client connections to performance tier?
What are the protocols to support?
What are the SLAs with client departments?
Do you need VLAN support and NIC aggregation?
What are the IP ranges available for use? Do you have multiple ranges?
Will the IP addresses be limited per range?
Network Interfaces
A client can connect to the cluster on any of the external interfaces depending on
the configuration. Each front-end adapter on the node can answer the client-based
requests or administrator function calls. It is a good practice to verify the external
adapter configuration by pinging it from the web administrator interface, or by
connecting to a share.
Using the isi network interfaces list -v command, you can see both
the interface name and its associated network interface card, or NIC, name. For
example, ext-1 would be an interface name and em1 would be a NIC name. NIC
names are required if you want to do a tcpdump and it may be required for more
command syntax. Understand that more than one name can identify Ethernet ports.
Link Aggregation
Round robin is a static aggregation mode that rotates connections through the
nodes in a first-in, first-out sequence, handling all processes without priority. Round
robin balances outbound traffic across all active ports in the aggregated link and
accepts inbound traffic on any port. Client requests are served one after the other
based on their arrival. In the graphic, client request 2, client request 3 and so on
follow client request 1. Note that round robin is not recommended if the cluster is
using TCP/IP workloads.
Active/Passive failover is a static aggregation mode that switches to the next active
interface when the primary interface becomes unavailable. The primary interface
handles traffic until there is an interruption in communication. At that point, one of
the secondary interfaces takes over the work of the primary.
In the example, the nodes serve the incoming client requests. If any of the nodes
become unavailable or interrupted due to an issue, the next active node takes over
and serves the upcoming client request.
LACP enables a network device to negotiate and identify any LACP enabled
devices and create a link. LACP monitors the link status and if a link fails, fails
traffic over. LACP accepts incoming traffic from any active port. Isilon is passive in
the LACP conversation and listens to the switch to dictate the conversation
parameters.
Fast Ethernet Channel, or FEC, is a static aggregation method. FEC accepts all
incoming traffic and balances outgoing traffic over aggregated interfaces that is
based on hashed protocol header information that includes source and destination
addresses. In the example shown, the node accepts and serves all the incoming
client requests. The node balances outgoing traffic.
When planning link aggregation, remember that the pools that use the same
aggregated interface cannot have different aggregation modes. For example, if
they are using the same two external interfaces, you cannot select FEC for one
pool and Round-robin for the other pool. Select the same aggregation method for
all participating devices. An IP address pool in both an aggregated configuration
and individual interfaces cannot use the node’s external interfaces. Remove the
node’s individual interfaces from all pools before configuring an aggregated NIC.
Also, enable NIC aggregation on the cluster before enabling on switch to enable
communication continuation. Enabling on the switch first, may stop communication
from the switch to the cluster and result in unexpected downtime.
OneFS uses link aggregation primarily for NIC failover purposes. For example,
aggregating two 10 gig ports does not create a 20 gig link. Each NIC is serving a
separate stream or conversation between the cluster and a single client. In general,
do not mix agg and non-agg interfaces in the same pool. Mixing results in
intermittent behavior on the single connection. Also, the aggregated NICs must
reside on the same node. You cannot aggregate a NIC from node1 and a NIC from
node2.
Challenge
Lesson - SmartConnect
Introduction
Scenario
DNS Primer
When discussing Domain Name System, or DNS, on an Isilon cluster, there are
two facets to differentiate, DNS client and DNS server. DNS serves the cluster with
names and numbers for various reasons, notably authentication. The cluster acts
as a DNS client. SmartConnect serves DNS information to inbound queries and as
such acts as a DNS server. DNS, is a hierarchical distributed database. The names
in a DNS hierarchy form a tree, which is called the DNS namespace. A set of
protocols specific to DNS allows for name resolution, more specifically, a Fully
Qualified Domain Name, or FQDN, to IP Address resolution.
An FQDN is the DNS name of an object in the DNS hierarchy. A DNS resolver
query must resolve an FQDN to its IP address so that a connection can be made
across the network or the Internet. If a computer cannot resolve a name or FQDN
to an IP address, the computer cannot make a connection, establish a session or
exchange information. An example of an FQDN looks like
Server7.support.emc.com.
The root domain, represented by a single “.” dot, is the top level of the DNS
architecture. Below the root domain are the top-level domains. Top-level domains
represent companies, educational facilities, nonprofits, and country codes such as
*.com, *.edu, *.org, *.us, *.uk, *.ca, and so on. A name registration authority
manages the top-level domains. The secondary domain represents the unique
name of the company or entity, such as EMC, Isilon, Harvard, MIT. The last record
in the tree is the hosts record, which indicates an individual computer or server.
The allocation of IPv6 addresses and their format is more complex than IPv4. In an
IPv6 environment use the AAAA record in DNS, and consult with the network
administrator to ensure that you are representing the IPv6 addresses correctly.
The Name Server Record, or NS Records, indicate which name servers are
authoritative for the zone or domain. Companies that want to divide their domain
into sub domains use NS records. Sub domains indicate a delegation of a portion
of the domain name to a different group of name servers. You create NS records to
point the name of this delegated sub domain to different name servers.
Use one name server record for each SmartConnect zone name or alias. Isilon
recommends creating one delegation for each SmartConnect zone name or for
each SmartConnect zone alias on a cluster. This method permits failover of only a
portion of the workflow—one SmartConnect zone—without affecting any other
zones. This method is useful for scenarios such as testing disaster recovery
failover and moving workflows between data centers.
Isilon does not recommend creating a single delegation for each cluster and then
creating the SmartConnect zones as sub records of that delegation. Using this
method would enable Isilon administrators to change, create, or modify the
SmartConnect zones and zone names as needed without involving a DNS team,
but causes failover operations to involve the entire cluster and affects the entire
workflow, not just the affected SmartConnect zone.
SmartConnect Overview
In Isilon OneFS 8.2, SmartConnect supports connection service for 252 nodes.
Licensing
SmartConnect advanced enables multiple network pools within each subnet, and it
supports dynamic IP allocation and NFS failover. The advanced license also
enables multiple SmartConnect zones to be defined to support multiple subnets,
NFS failover, and rebalancing of IP addresses across the cluster. Multiple
SmartConnect zones enable the storage administrator to decide which nodes
should participate in a specific connection balancing configuration strategy. In other
words, any specific node can be selected to be excluded or included from any or all
balancing schemes for each Isilon cluster.
Video Link:
https://edutube.emc.com/html5/videoPlayer.htm?vno=UxQVoTIjUy8pLCL8TqMHM
g
OneFS 8.2 provides multiple SSIPs for each subnet. As the cluster scales, it would
need multiple SSIPs to serve the requests. Multiple SSIPs are for failover and not
intended for DNS server load balancing. Each node requests all the SSIP in its
subnet. A node may own more than one SSIP but should not own all the SSIPs. If a
node owns many SSIPs, a integrated function called “bullying” is used to auto
release the SSIPs.
The SmartConnect service IP answers queries from DNS. There can be multiple
SIPs per cluster and they reside on the node with the lowest array ID for their node
pool. For a large cluster that contains multiple node pools with multiple subnets, the
SIP for each subnet resides on the node with the lowest array ID for that subnet. If
you know the IP address of the SIP and want to know only the zone name, use
For this approach, create the Service Principal Name (SPN) records in Active
Directory or in MIT Kerberos for the SmartConnect zone names, as a component of
the cluster’s machine account. To create the SPN records, use the CLI isi auth
command after you add the zone alias, similar to the following: isi auth ads
spn check --domain=<domain.com> --repair.
SmartConnect load balances client connections across the front-end ports based
on the choice of the balancing option that is selected by the administrator for the
cluster. The options are different depending on whether SmartConnect is licensed
or not. If a cluster is licensed, the administrator has four options to load balance:
Round-robin, Connection count, Throughput, and CPU usage. If the cluster does
not have SmartConnect licensed, it uses Round-robin only.
Round Robin selects the next available node on a rotating basis. If no policy is
selected, round robin is the default policy.
Connection Count is a load balancing option that determines the number of open
TCP connections on each available node to optimize the cluster usage.
Network throughput is a load balancing option that sets the overall average
throughput volume on each available node to optimize the cluster usage.
CPU usage sends the client connections to the node with the least CPU utilization
at the time the client connects. The policy helps spread the load across the nodes
and does not over burden any one node.
SmartConnect load balances client connections across the front-end ports based
on what the administrator has determined to be the best choice for their cluster.
Because each SmartConnect zone is managed as an independent SmartConnect
environment, they can have different attributes, such as the client connection
policy. For environments with different workloads, varying balancing options
provide flexibility in how cluster resources are allocated. Clients use one DNS
name to connect to the performance zone and another to connect to the general
use nodes. The performance zone could use CPU utilization as the basis for
distributing client connections, while the general use zone could use round-robin or
connection count.
For example, a customer can create a subnet or pool for use by a high compute
farm to give a higher level of performance. A second subnet or pool is created with
a different zone name for general use, often desktops, that do not need as high
level of performance. The lower performance zone is shown as the general use
zone. Each group connects to a different name and gets different levels of
performance. This way, no matter what the desktop users are doing, it does not
affect the performance to the cluster.
IP address pools partition the external network interfaces into groups or pools of IP
address ranges in a subnet. Address pools enable customization of how users
connect. Pools control connectivity into the cluster by allowing different functional
groups, such as sales, engineering, and marketing, access to different nodes. This
is important for clusters that have different node types.
In OneFS 8.2 all the nodes within the subnet will race to lock a file in the following
directory /ifs/.ifsvar/modules/smartconnect/resource/vips. The
node that locks the file owns the SSIP.
An administrator can choose either static pools or dynamic pools when configuring
IP address pools on the cluster. A static pool is a range of IP addresses that
allocates only one IP address at a time. Like most computers and servers, a single
IP address would be allocated from the pool to the chosen NIC. If there are more IP
addresses than nodes, new nodes that are added to the pool get the additional IP
addresses. Static pools are best used for SMB clients because of the stateful
nature of the SMB protocol. When an SMB client establishes a connection with the
cluster, the session or “state” information is negotiated and stored on the server or
node. If the node goes offline, the state information goes with it and the SMB client
have to reestablish a connection to the cluster. SmartConnect is intelligent enough
to hand out the IP address of an active node when the SMB client reconnects.
Dynamic pools are best used for NFSv3 clients. Dynamic pools assign out all the IP
addresses in their range to the NICs on the cluster. You can identify a Dynamic
range by the way the IP addresses present in the interface as .110 -.112 or .113 -
.115 instead of a single IP address like 0.10. NFSv3 is a stateless protocol. A
stateless connection maintains the session or “state” information on the client side.
If a node goes down, the IP address that the client is connected to fails over to
another node in the cluster. For example, a Linux client connects to a node hosting
IP address ending with .110. If the node goes down, the .110, .111, and .112 IP
addresses are distributed equally to the remaining nodes in the pool. The Linux
client seamlessly fails over to one of the active nodes. The client would not know
that their original node had failed.
When Node 1 goes offline, the static node IP for Node 1 is no longer available. The
NFS failover IPs, and the connected clients associated with Node 1, failover to the
remaining nodes based on the IP failover policy.
If a node with client connections established goes offline, the behavior is protocol-
specific. The practice for NFSv3 and NFSv4 clients is to set the IP allocation
method to dynamic. NFSv3 automatically reestablishes an IP connection as part of
NFS failover. Although NFSv4 is stateful, OneFS 8.x versions and higher keep the
connection state information for NFSv4 in sync across multiple nodes. In other
words, if the IP address gets moved off an interface because that interface went
down, the TCP connection is reset. NFSv3 and NFSv4 reestablishes the
connection with the IP on the new interface and retries the last NFS operation.
For the second pool in the same subnet, the IP allocation method is set to dynamic.
Dynamic IP allocation is only available with SmartConnect Advanced and is only
recommended for use with NFSv3. Dynamic IP allocation ensures that all available
IP addresses in the IP address pool are assigned to member interfaces when the
pool is created. Dynamic IP allocation enables clients to connect to any IP address
in the pool and receive a response. If a node or an interface becomes unavailable,
its IP addresses are automatically moved to other member interfaces in the IP
address pool. Dynamic IP allocation has the following advantages:
NFSv3 protocols are stateless, and in almost all cases perform best in a dynamic
pool. The NFSv4 protocol introduced state making it a better fit for static zones in
most cases, as it expects the server to maintain session state information.
However, OneFS 8.0 introduced session-state information across multiple nodes
for NFSv4, making dynamic pools the better option.
Video Link:
https://edutube.emc.com/html5/videoPlayer.htm?vno=4hL0i4iBe2BLqJzlT4dN/Q
Challenge
Introduction
Scenario
Routing Overview
The graphic shows three subnets that are created on the cluster. Only one gateway
is created per subnet, however, each of the gateways has a priority. OneFS always
uses the highest-priority gateway that is operational, regardless of where the traffic
originated. The Network 1 gateway has the lowest number, highest priority. If all the
subnets that are in Network 1, 2, or 3 are known, the approach might work, but you
need to define static routes for those subnets.
Source-Based Routing
Source-Based Routing, or SBR, simplifies routing when there are multiple access
routes and the default gateway does not look to be the best route available. As
shown, the client must send a packet to the cluster at IP address 10.3.1.90.
First, the client determines that the destination IP address is not local and it does
not have a static route that is defined for that address. The client sends the packet
to its default gateway, Router C, for further processing. Next, Router C receives the
packet from the client and examines the destination IP address in the packet. It
determines that it has a route to the destination through the router “A” at 10.1.1.1.
Then, router A receives the packet on its external interface and determines that it
has a direct connection to the destination IP address, 10.3.1.90. Router A sends
the packet directly to its destination using its internal interface on the 40-GbE
switch.
Next, the Isilon must send a response packet to client. Without SBR, it determines
that the destination IP address, 10.2.1.50, is not local and that it does not have a
static route that is defined for that address. OneFS determines which gateway to
send the response packet to based on its priority numbers. OneFS has two default
gateways: 10.1.1.1 with a priority of 1 and 10.3.1.1 with a priority of 10. OneFS
chooses the gateway with the lower priority number and sends the packet to
gateway 10.1.1.1 through the 1-GbE interface, not the 40-GbE interface.
For the return route, OneFS uses an internal gateway and creates a dynamic route
to facilitate the return of the packet.
Configuring SBR
SBR is enabled from the CLI or the WebUI. Shown is the SBR checkbox on the
Settings tab on the Network configuration page. Using the CLI, SBR can be
enabled or disabled by running the isi network external modify command.
To view the SBR setting, run the isi networks eternal view command.
Virtual LAN, or VLAN, tagging is an optional front-end network setting that enables
a cluster to participate in multiple virtual networks. A VLAN is a group of hosts that
communicate as though they are connected to the same local area network
regardless of their physical location. Enabling VLAN supports multiple cluster
subnets without multiple network switches. It also provides increased security and
privacy because network traffic across one VLAN is not visible to another VLAN.
To configure VLAN on the cluster, use the isi network subnets modify
command or from the WebUI go to Cluster management > Network
configuration > External network tab.
NANON
Isilon clusters can be large, in the hundreds of PBs. At a certain point most
customers are expanding their clusters, not because they need more front-end IO,
but because they need more capacity. Imagine a 100 node cluster with 20 A2000
nodes. Each A2000 node has 2x 10 GbE links per node. The total potential
bandwidth for the A2000 nodes is 2x10x20=400 Gbps, or 50 GBps. Usually adding
nodes at this point is done for capacity and aggregated cache/CPU/disk spindle
count reasons, rather than front-end IO. As a result, some customers choose to
stop connecting more nodes to the front-end network, because the cost of network
switches and optics cannot be justified. NANON enables lower network costs. You
can perform maintenance of NANON nodes at any time if enough nodes are online
to meet protection criteria. Having enough online nodes when doing maintenance
on NANON nodes does not disrupt clients to patch and firmware updates. The
reason why NANON is not advisable follows.
There are certain features, like anti-virus, that require all the nodes that access files
to have IP addresses that can reach the ICAP (Internet control adaptation protocol)
server. Also, the lowest LNN should always be connected as there are cluster-wide
notifications that go out using the lowest LNN. If using SMB, have all nodes
connected to the network. With SMB, the LNNs communicates notifications,
SupportIQ information, ESRS, and logfiles from the cluster, and ensure that there
are no clock skew or time issues. ESRS works without all nodes able to directly
communicate with the ESRS gateway, however, requests must be proxied through
nonconnected nodes, and as such NANON is not recommended.
Challenge
Summary
Introduction
Module 3
Introduction
Current Progression
This module discusses role-based access control, user identity mapping, and user
access control.
Scenario
Introduction
Scenario
RBAC Overview
Built-In Roles
Shown are the built-in roles that have a predefined set of privileges. Administrators
cannot modify built-in roles. OneFS 8.2.0 introduces zone-aware RBAC, or
ZRBAC. The ZRBAC feature enhancement provides flexibility for organization
administrators to manage resources according to their specific organization. The
example shows that the "Sales" organization has a dedicated access zone. The
administrator for the Sales organization is given access only for that zone and
when managing the system cannot view, configure, or monitor other zones.
Privileges
Shown are the built-in roles that have a predefined set of privileges. Administrators
cannot modify built-in roles. OneFS 8.2.0 introduces zone-aware RBAC, or
ZRBAC. The ZRBAC feature enhancement provides flexibility for organization
administrators to manage resources according to their specific organization. The
example shows that the "Sales" organization has a dedicated access zone. The
administrator for the Sales organization is given access only for that zone and
when managing the system cannot view, configure, or monitor other zones.
Note that the WebUI privileges names differ from the names seen in the CLI:
ISI_PRIV_AUTH Privilege
Link:
https://edutube.emc.com/html5/videoPlayer.htm?vno=tQkWrNubtdORFBHxoRlMAg
Some best practices for assigning users to roles are to first perform an in-depth
needs-based security review. Once identifying individuals, their roles are defined
based on the job requirements. Role-based access is a matter of who needs what
access and why. Assign users to roles that contain the minimum set of necessary
privileges. For most purposes, the default permission policy settings, system
access zone, and built-in roles are sufficient. If not, create custom roles. A fail-safe
root account and password should be generated and distributed among a quorum
of responsible corporate officers. To ensure that the roles are used, not abused,
sufficient, and up-to-date membership, add an audit review process. Exceeding
200 roles could impact cluster performance.
Challenge
Introduction
Scenario
Layers of Access
Connectivity with the cluster has four layers of interaction. The first layer is the
protocol layer. Protocols may be Server Message Block, or SMB, Network File
System, or NFS, File Transfer Protocol, or FTP, or some other protocol. The
authentication layer identifies a user using a system such as NIS, local files, or
Active Directory. The third layer is identity assignment. The layer is straightforward
and based on the results of the authentication layer, but there are some cases that
need identity mediation within the cluster, or where roles are assigned within the
cluster that are based on user identity. Finally, based on the established connection
and authenticated user identity, the file and directory permissions are evaluated.
The evaluation determines whether the user is entitled to perform the requested
data activities.
Identity Management
The OneFS identity management maps the users and groups from separate
services. The mapping provides a single unified identity on a cluster and uniform
access control to files and directories, regardless of the incoming protocol. This
illustration shows the authentication providers OneFS uses to first verify a user
identity after which users are authorized to access cluster resources. The top
layers are access protocols – NFS for UNIX clients, SMB for Windows clients, and
FTP and HTTP for all. Between the protocols and the lower-level services providers
and their associated data repositories, is the OneFS lsassd daemon. lsassd
mediates between the authentication protocols that clients and the authentication
providers, who check their data repositories for user identity and file access, use.
When the cluster receives an authentication request, lsassd searches the
configured authentication sources for matches to an incoming identity. If the identity
is verified, OneFS generates an access token. This token is not the same as an
Active Directory or Kerberos token, but an internal token that reflects the OneFS
identity management system. When a user attempts to access cluster resources,
OneFS allows or denies access based on matching the identity, user, and group
memberships to this same information on the file or folder.
Access tokens form the basis of who you are when performing actions on the
cluster. The tokens supply the primary owner and group identities to use during file
creation. When the cluster builds an access token, it must begin by looking up
users in external directory services. By default, the cluster matches users with the
same name in different authentication providers and treats them as the same user.
The ID-mapping service populates the access token with the appropriate identifiers.
Finally, the on-disk identity is determined.
Overview
URL:
https://edutube.emc.com/Player.aspx?vno=MmSHIH1OvcP5nHsi0hd51g==&autopl
ay=true
Primary Identities
OneFS supports three primary identity types, UIDs, GIDs, and SIDs. The user
identifier, or UID, is a 32-bit string that uniquely identifies users on the cluster.
UNIX-based systems use UIDs for identity management. The group identifier, or
GID, for UNIX serves the same purpose for groups that UID does for users. The
security identifier, or SID, is a unique identifier that begins with the domain identifier
and ends with a 32-bit Relative Identifier (RID). Most SIDs take the form S-1-5-21-
<A>-<B>-<C>-<RID>, where <A>, <B>, and <C> are specific to a domain or
computer, and <RID> denotes the object inside the domain. SID is the primary
identifier for users and groups in Active Directory.
Secondary Identifiers
Kerberos and NFSv4 define principals that require all names to have a format
similar to an email addresses. For example, given username sera and the domain
dees.lab, dees\sera and sera@dees.lab are valid names for a single object in
Active Directory. With OneFS, whenever providing a name as an identifier, the
correct primary identifier of UID, GID, or SID is requested.
Multiple Identities
Another factor to consider is merging UIDs together on the cluster from different
environments. Do not put UIDs from different environments and their authentication
providers in the same access zone. The UIDs from different environments map as
the same user. Mapping gets further complicated if another NAS product providing
UIDs for the Windows users overlap users with a range used elsewhere. Limit of
the overlap with the use of access zones. When two identifiers that are for the
same user, build the user token with all appropriate IDs. If there is the same
number for two different users, do not place the two users in the same access zone
or directory structure. If in the same access zone, the two users are treated as the
same user. The final challenge in a multiprotocol environment is to appropriately
apply the permissions. Verification may require some testing and experimenting on
the administrators part to fully understand what different permission settings mean
when applied to a user.
ID Mapper Database
On-Disk Identity
OneFS uses an on-disk identity store for a single identity for users and groups. On-
disk identities enable administrators to choose storing UNIX or Windows identity
automatically or enable the system to determine the correct identity to store.
Though OneFS creates a user token from information on other management
systems, OneFS stores an authoritative version of the identity as the preferred on-
disk identity. The graphic shows the token of Windows user Sera with a UID as the
on-disk identity.
The available on-disk identity types are Native, UNIX, and SID. The on-disk identity
is a global setting. Because most protocols require some level of mapping to
operate correctly, choose the preferred identity to store on-disk.
The use case for the default Native setting is an environment that has NFS and
SMB client and application access. With the Native on-disk identity set, lsassd
attempts to locate the correct identity to store on disk by running through each ID
mapping method. The preferred object to store is a real UNIX identifier. OneFS
uses a real UNIX identifier when found. If a user or group does not have a real
UNIX identifier (UID or GID), OneFS stores the real SID.
Setting the UNIX on-disk identity always stores the UNIX identifier if available.
During authentication, lsassd looks up any incoming SIDs in the configured
authentication sources. If finding a UID or GID, the SID converts to either a UID or
GID. If a UID or GID does not exist on the cluster, whether it is local to the client or
part of an untrusted AD domain, OneFS stores the SID instead. This setting is
recommended for NFSv3, which uses UIDs and GIDs exclusively. If the SID on-
disk identity type is set, the system always stores a SID, if available. The lsassd
searches the configured authentication sources for SIDs to match to an incoming
UID or GID. If finding no SID, OneFS stores the UID on-disk.
Troubleshooting Resources
Challenge
Introduction
Scenario
Permission Overview
OneFS must not only store an authoritative version of the original file permissions
for the file sharing protocol, but map the authoritative permissions to an acceptable
form for the other protocol. OneFS must do so while maintaining the security
settings for the file and meeting user expectations for access. The result of the
transformation preserves the intended security settings on the files. The result also
ensures that users and applications can continue to access the files with the same
behavior.
for an SMB client. Because OneFS derives the synthetic ACL from mode bits, it
can express only as much permission information as mode bits can and not more.
POSIX Overview
In a UNIX environment, you modify permissions for owners, groups, and others to
allow or deny file and directory access as needed. These permissions are saved in
16 bits, which are called mode bits. You configure permission flags to grant read
(r), write (w), and execute (x) permissions to users, groups, and others in the form
of permission triplets. You set permissions flags to grant permissions to each of
these classes. Assuming the user is not root, the class determines if the requested
access to the file should be granted or denied. The classes are not cumulative.
OneFS uses the first class that matches. Common practice is to grant permissions
in decreasing order, with the highest permissions that are given to the file owner
and the lowest to users who are not the owner or the owning group. The graphic
shows that the owner of the /ifs/boston/hr directory has read, write, and execute
permission while the group and all others have read an execute permission.
The information in the upper 7 bits can also encode what the file can do, although it
has no bearing on file ownership. An example of such a setting would be the “sticky
bit.”
OneFS does not support POSIX ACLs, which are different from Windows ACLs.
chmod
OneFS supports the standard UNIX tools for changing permissions, chmod and
chown. The change mode command, chmod, can change permissions of files and
directories. The man page for chmod documents all options. Changes that are
made using chmod can affect Windows ACLs. Shown is changing the permissions
on a directory so that group members and all others can only read the directory.
chown Command
The chown command is used to change ownership of a file. Root user access is
needed when change the owner of a file. The basic syntax for chown is chown [-
R] newowner filenames. Using the -R option changes the ownership on the
sub directories. In the example shown, user penni is an LDAP user who is
responsible for the content of /ifs/boston/hr directory. The chgrp command
changes the group. View the man pages for command definitions.
In Windows environments, ACLs define file and directory access rights. A Windows
ACL is a list of access control entries, or ACEs. Each entry contains a user or
group and a permission that allows or denies access to a file or folder. While you
can apply permissions for individual users, Windows administrators usually use
groups to organize users, and then assign permissions to groups instead of
individual users. Group memberships can cause a user to have several
permissions to a folder or file. Windows includes many rights that you can assign
individually or you can assign rights that are bundled together as permissions. For
example, the Read permission includes the rights to read and execute a file while
the Full
Control permission assigns all user rights. Full Control includes the right to change
ownership and change the assigned permissions of a file or folder. When working
with Windows, note the important rules that dictate the behavior of Windows
permissions. First, if a user has no permission that is assigned in an ACL, then the
user has no access to that file or folder. Second, permissions can be explicitly
assigned to a file or folder and they can be inherited from the parent folder. By
default, when creating a file or folder, it inherits the permissions of the parent folder.
If moving a file or folder, it retains the original permissions. View the security
permissions in the properties of the file or folder in Windows Explorer. If the check
boxes in the Permissions dialog are not available, the permission are inherited. You
can explicitly assign permissions. Remember that explicit permissions override
inherited permissions. The last rule to remember is that Deny permissions take
OneFS has configurable ACL policies that control permission management and
processing. You can change the default ACL settings globally or individually, to
best support your environment. The global permissions policies change the
behavior of permissions on the system. For example, selecting UNIX only changes
the individual ACL policies to correspond with the global setting. The permissions
settings of the cluster are handled uniformly across the entire cluster, rather than
by each access zone.
If a General ACL Setting or Advanced ACL Setting needs changing, select the
Custom environment global setting. Shown is the CLI command and how the
WebUI translates to the CLI options. The isi auth settings acls modify
command is used to configure the ACL settings using the CLI.
Isilon takes advantage of standard UNIX commands and has enhanced some
commands for specific use on Isilon clusters. the list directory contents, ls,
command provides file and directory permissions information, when using an SSH
session to the cluster. Isilon has added specific options to enable reporting on
ACLs and POSIX mode bits. The ls command options are all designed for long
notation format, which is displayed when the -l option is used. The -l option also
displays the actual permissions that are stored on disk.
Adding the -e option prints the ACLs associated with the file. The -n option
displays user and group IDs numerically rather than converting them to a user or
group name. Use the options in combination to report the wanted permissions
information. The different options change the output.
A Windows client processes only ACLs, it does not process UNIX permissions.
When viewing the permission of a file from a Windows client, OneFS must translate
the UNIX permissions into an ACL. Synthetic ACL is the name of the OneFS
translation. Synthetic ACLs are not stored anywhere, instead they are dynamically
generated as needed and then discarded. Running the ls -le command shows
the synthetic ACLs for files and directories.
If a file has Windows-based ACLs (and not only UNIX permissions), OneFS
considers it to have advanced, or real ACLs. Advanced ACLs display a plus (+)
sign when listed using an ls –l command. POSIX mode bits are present when a
file has a real ACL, however these bits are for protocol compatibility and are not
used for access checks.
Overview
Link:
https://edutube.emc.com/html5/videoPlayer.htm?vno=EN8uMS3WuRwjY4Q0mIUa
Zw
if a user has no write permission to the share, /dvt, then the user cannot write to
the /linux and /win directories or files within the directories.
Two options are available when creating a share, Do not change existing
permissions and Apply Windows default ACLs. Understand the Apply
Windows default ACLs settings. This setting can destroy or at a minimum alter
explicitly defined directory permissions that are created on the share. For example,
carefully migrated permissions can change, creating more work and the potential of
causing data unavailability. Files and directories can be either POSIX authoritative
or ACLs authoritative.
A synthetic ACL does not exist on the file system and is not stored anywhere.
Instead, OneFS generates a synthetic ACL as needed, and then discards it. OneFS
creates the synthetic ACL in memory when a client that only understands ACLs,
such as Windows clients, queries the permissions on a file that only has POSIX
permissions.
With synthetic ACLs, POSIX mode bits are authoritative. POSIX mode bits handle
permissions in UNIX environments and govern the synthetic ACLs. Permissions
are applied to users, groups, and everyone, and allow or deny file and directory
access as needed. The read, write, and execute bits form the permissions triplets
for users, groups, and everyone. The mode bits can be modified using the WebUI
or the CLI standard UNIX tools such as chmod and chown. Since POSIX governs
the synthetic ACLs, changes made using chmod change the synthetic ACLs. For
example, running chmod 775 on the /ifs/dvt directory changes the mode bits to
read-write-execute for group, changing the synthetic ACL for the group. The same
behavior happens when making the access more restrictive, for example, running
chmod 755, changes the synthetic ACL to its corresponding permission. The
chmod behavior is different when ACLs are authoritative.
In the example, the directory /ifs/dvt/win has a real ACL. The POSIX mode bits are
775. Running chmod 755 does not change to the POSIX mode bits since merging
775 with 755 gives the combined value of 775. To understand the behavior of
chmod. Shown is an excerpt from the Isilon cluster WebUI page that shows the
different behaviors.
The first example shows that the share permission is everyone read-only although
the POSIX indicates read-write-execute. Windows users can write to the share
based on the synthetic ACLs. The second example shows POSIX at 755. Although
the ACL is set to a user with full control, the user cannot write to the share—POSIX
is authoritative.
The “+” indicates a real or native ACL that comes directly from Windows and is
applied to the file. Access control entries make up Windows ACLs. An administrator
can remove the real ACL permission using the chmod -b command. ACLs are
more complex than mode bits and can express a richer set of access rules.
However, not all POSIX mode bits can represent Windows ACLs any more than
Windows ACLs can represent POSIX mode bits.
Once a file is given an ACL, its previous POSIX mode bits are no longer
enforced—the ACL is authoritative. The first example shows a real ACL used,
POSIX set for 777, and the share permissions for the user set to read-only.
Although the POSIX show read-write-execute for everyone, the user cannot write
because of the ACL. In contrast, the second example shows the case where the
user can write.
Troubleshooting Resources
Challenge
Module Summary
Introduction
Module 4
Data Access
Current Progression
Module 1 covered authentication, module 2 you configured access zones and the
network components, and in module 3 authorization was discussed.
Scenario
Introduction
Scenario
There are several methods that Isilon clusters use for caching. Each Gen 6 storage
node contains DDR4 or double data rate fourth generation synchronous dynamic
random-access memory. RAM is primarily used to cache data that is on that
particular storage node that clients are connected to. RAM access is effectively
instant, compared to other latency, and OneFS caches active metadata in RAM.
Also, each node contributes, and has access to a cluster-wide cache that is
accessible and coherent across all nodes. A portion of the RAM is dynamically
allocated and adjusted as read and write cache as needed. Each node
communicates with the cache that is contained on every other node and extracts
any available cached file data as needed. Some node pools use SSDs as a
specialized cache. Waiting for HDD access is about 50 to 100 times slower than
SSD access. The use of SSDs for cache is optional but enabled by default.
Shown is the RAM for Gen 6 nodes, older generation nodes may have less.
OneFS Caching
Caching maintains a copy of metadata and or the user data blocks in a location
other than primary storage. The copy is used to accelerate access to the data by
placing the copy on a medium with faster access than the drives. Because cache is
a copy of the metadata and user data, any data that is contained in cache is
temporary and can be discarded when no longer needed. Cache in OneFS is
divided into levels. Each level serves a specific purpose in read and write
transactions. The cache levels provide guidance to the immediacy of information
from a client-side transaction perspective. The cache level accounts for the relative
latency or time to retrieve or write information. The immediacy determines how the
cache is refreshed, how long the data is available, and how the data is emptied or
flushed from cache.
Cache Levels
Caching in OneFS consists of the client-side level 1, or L1, cache and write
coalescer, and level 2, or L2 storage and node-side cache. Both L1 cache and L2
cache are managed and maintained in RAM. However, OneFS is also capable of
using SSDs as level 3, or L3 cache. As displayed, L3 cache interacts with the L2
cache and is contained on SSDs. Each cache has its own specialized purpose and
works together to provide performance improvements across the entire cluster.
L1 Cache
L1 cache is the client-side cache. It is the immediate buffer on the node that is
connected to the client and is involved in any immediate client data transaction.
OneFS L1 cache refers to read transaction requests, or when a client requests data
from the cluster. L1 cache collects the requested data from the L2 cache of the
nodes that contain the data. L1 cache is stored in a segmented area of the node
RAM and as a result is fast. Following a successful read transaction, the data in L1
cache is flushed or emptied to provide space for other transactions.
Related to L1 cache is the write cache or the write coalescer that buffers write
transactions from the client. The write coalescer collects the write blocks and
performs the additional process of optimizing the write to disk. The write cache is
flushed after successful write transactions. In OneFS, the two similar caches are
distinguished based on their read or write functionality. Client-side caching includes
both the in and out client transaction buffers.
L2 Cache
L2 cache is the storage side or node-side buffer. L2 cache stores blocks from
previous read and write transactions. L2 buffers write transactions to be written to
disk and prefetches anticipated blocks for read requests, sometimes called read
ahead caching. L2 cache is also contained in the node RAM. It is fast and available
to serve L1 cache read requests and take data handoffs from the write coalescer.
For write transactions, L2 cache works with the journaling process to ensure
protected committed writes. As L2 cache becomes full, it flushes according to the
age of the data. L2 flushes the least recently used, or LRU, data.
L2 cache is node-specific. L2 cache interacts with the data that is contained on the
specific node. The interactions between the drive subsystem, the HDDs, and the
SSDs on the node go through the L2 cache for all read and write transactions. L2
cache on any node communicates when the L1 cache and write coalescers from
any other node requests.
L3 Cache
L3 cache provides additional level of storage node-side cache using the SSDs as
read cache. L3 cache is good for random, read heavy workflows accessing the
same data sets. Also, L3 cache benefits metadata read operations, assuming
metadata has been loaded. L3 cache has no prefetch. SSD access is slower than
access to RAM and is relatively slower than L2 cache but faster than access to
data on HDDs. L3 cache is an extension of the L2 read cache functionality.
Because SSDs are larger than RAM, SSDs can store significantly more cached
metadata and user data blocks than RAM. Like L2 cache, L3 cache is node-specific
and only caches data that is associated with the specific node. Advanced
algorithms are used to determine the metadata and user data blocks that are
cached in L3. L3 cached data is durable and survives a node reboot without
requiring repopulating. When L3 cache becomes full and new metadata or user
data blocks are loaded into L3 cache, the oldest existing blocks are flushed from L3
cache. Flushing is based on first in first out, or FIFO. L3 cache should be filled with
blocks being rotated as node use requires.
Shown is an eight node cluster that is divided into two node pools with a detailed
view of one of the nodes. Clients connect to L1 cache and the write coalescer. The
L1 cache is connected to the L2 cache on the other nodes and within the same
node. The connection to other nodes occurs over the internal network when data
that is contained on those nodes is required for read or write. The L2 cache on the
node connects to the disk storage on the same node. The L3 cache is connected to
the L2 cache and serves as a read-only buffer. L3 cache is spread across the
SSDs in the same node and enabled per node pool. Pre Gen 6 accelerator nodes
do not allocate memory for level 2 cache. Accelerator nodes do not write any data
to their local disks, so there are no blocks to cache. Instead accelerator nodes use
all their memory for L1 cache to service their clients. An accelerator entire read
cache is L1, having no local disks storing file system data. However, all the data
that is handled by an accelerator is remote data. In a cluster consisting of storage
and accelerator nodes, the primary performance advantage of accelerators is the
ability to serve more clients.
Anatomy of a Read
When a client requests a file, the client-connected node uses the isi get
command to determine where the blocks that comprise the file are located. The first
file inode is loaded, and the file blocks are read from disk on all other nodes. If the
data is not already in the L2 cache, data blocks are copied in the L2. The blocks
are sent from other nodes through the backend network. If the data was already in
L2 cache, it is not loaded from the hard disks—OneFS waits for the data blocks
from the other nodes to arrive. Otherwise, the node gets the data load from the
local hard disks, and then the file is reconstructed in L1 cache and sent to the
client.
When a client requests a file write to the cluster, the client-connected node
receives and processes the file. The client-connected node creates a write plan for
the file including calculating FEC. Data blocks assigned to the node are written to
the journal of that node. Data blocks assigned to other nodes travel through the
internal network to their L2 cache, and then to their journal. Once all nodes have all
the data and FEC blocks journaled, a commit is returned to the client. Data blocks
assigned to client-connected node stay cached in L2 for future reads, and then
data is written onto the HDDs.
The Block Allocation Manager, or BAM, on the node that initiated a write operation
makes the layout decisions. The BAM decides on where best to write the data
blocks to ensure that the file is properly protected. To decide, the BAM Safe Write,
or BSW, generates a write plan, which comprises all the steps that are required to
safely write the new data blocks across the protection group. Once complete, the
BSW runs this write plan and guaranties its successful completion. OneFS does
not write files at less than the desired protection level.
Endurant Cache
Endurant Cache, or EC, is only for synchronous writes or writes that require
returning a stable write acknowledgement to the client. EC provides ingest and
staging of stable synchronous writes. EC manages the incoming write blocks and
stages them to the journal. EC also provides stable synchronous write loss
protection by creating multiple mirrored copies of the data, further guaranteeing
protection from single node and multiple node failures. The EC process lowers the
latency that is associated with synchronous writes by reducing the “time to
acknowledge” back to the client. The process removes the Read-Modify-Write
operations from the acknowledgement latency path.
Shown is an example of a synchronous write of a new file, and how the write
process occurs in OneFS with endurant cache. The example is an NFS client
sending 4-KB blocks writing a 512-KB file with a simple return acknowledgement
after the entire file is written. We will assume N+1 protection. First, a client sends a
file to the cluster requesting a synchronous write acknowledgement. The client
begins the write process by sending 4-KB data blocks. The blocks are received into
the node’s write coalescer. The write coalescer manages the write in the most
efficient and economical manner according to the BAM and the BAM Safe Write
path processes.
EC manages how the write request comes into the system. Once the write
coalescer receives the file, the EC log writer process writes mirrored copies of the
data blocks, with log file-specific information added. The mirrored copy writes
happen in parallel with the EC logfiles, which resides in the journal. Once in the
journal, the write is protected and considered stable. The protection level of the
mirrored EC logfiles is based on the drive loss protection level that is assigned to
the datafile to be written. The number of mirrored copies equals two, three, four or
five times.
Once the EC logfiles receive the data copies, a stable write exists and the write
acknowledgement is sent back to the client. The acknowledgement indicates a
stable write of the file. The client assumes that the write is completed and can close
out the write cycle with its application or process. The client considers the write
process complete. The latency or delay time is measured from the start of the
process to the return of the acknowledgement to the client. This process is similar
to many block storage systems.
From this point forward, the standard asynchronous write process is followed. Once
the asynchronous write process is stable with copies of the different blocks on each
of the involved node L2 cache and journal, the EC logfile copies are deallocated.
The write is secure throughout the process. Finally the write to the hard disks is
completed and the file copies in the journal are deallocated. Copies of the writes in
L2 cache remain in L2 cache until flushed though one of the normal processes.
The Write Coalescer fills and is flushed as needed. The file is divided into 128-KB
data stripe units. Protection is calculated, and FEC stripe units are created. Then
the write plan is determined. The 128-KB stripe units and FEC units are written to
their corresponding node L2 cache and journal. Then the EC logfiles are cleared
from the journal. Then the stripe and FEC units are written to physical disk from L2.
Once written to physical disk, the stripe and FEC unit copies created during the
asynchronous write are deallocated from the journal. The stripe and FEC units
remain in L2 cache until flushed to make room for more recently accessed data.
The write process is now complete.
L3 Cache Settings
L3 cache is enabled by default for all new node pools that are added to a cluster.
Shown on the left is the WebUI global setting to change the default behavior. The
graphic on the right shows that each node pool can be enabled or disabled
separately. L3 cache is either on or off and no other visible configuration settings
are available.
L3 cache cannot coexist with other SSD strategies on the same node pool, such as
metadata read acceleration, metadata read/write acceleration, and data on SSD.
SSDs in an L3 cache enabled node pool cannot participate as space used for GNA.
L3 acts as an extension of L2 cache regarding reads and writes on a node. The
process of reading or writing, except for larger available cache, is substantially
unchanged.
L3 cache cannot be enabled in all-flash nodes (F800). In Gen 6 nodes, slots for
SSD drive slots are separate from data drives. Because all data drives are SSD in
the F800, the dedicated slots for SSD drives are not populated.
Shown are the CLI commands to disable globally and to enable at the node pool
level.
Shown is the command to query historical statistics for cache. The first command
lists the keys related to cache. The number and granularity of available keys is
numerous. The keys give administrators insight to the caching efficiency and can
help isolate caching related issues. The second command shows the key to list the
L1 metadata read hits for node 2, the node connected over SSH.
A use case is running the command to determine the L3 hit and miss stats to
indicate if the node pool needs more SSDs.
Challenge
Introduction
Scenario
SMB shares provide Windows clients network access to file system resources on
the cluster. In OneFS 7.2.1 and earlier, an SMB client connects to a single node. If
this node goes down or if there is a network interruption between the client and the
node, the client would have to reconnect to the cluster manually. Clients using SMB
1.0 and SMB 2.x use a time-out service using SMB or TCP. The time-out services
must wait for a specific period before notifying the client of a server down. The
time-outs can take 30 to 45 seconds, which creates a high latency that is disruptive
to enterprise applications. To continue working, the client must manually reconnect
to the share on the cluster. Too many disconnections would prompt for the clients
to open help desk tickets with their local IT department to determine the nature of
the data unavailability.
OneFS 8.0 introduces support for Continuously Available, or CA. CA enables SMB
clients to transparently and automatically failover to another node if a network or
node fails. CA is supported with Microsoft Windows 8, Windows 10, and Windows
2012 R2 clients. CA enables a continuous workflow from the client-side with no
appearance or disruption to their working time.
CA is not enabled by default and must be enabled when the share is created. An
existing share without CA enabled must be re-created in order to enable CA.
Server-side copy offloads copy operations to the server when the involvement of
the client is unnecessary. File data no longer needs to traverse the network for
copy operations that the server can perform. Clients using server-side copy can
experience considerable performance improvements for file copy operations, like
CopyFileEx or "copy-paste" when using Windows Explorer. Server-side copy only
affects file copy or partial copy operations in which the source and destination file
handles are open on the same share and does not work for cross-share operations.
The server-side copy feature is enabled by default in OneFS version 8.0 and later.
To disable the feature, use the CLI. Note that in OneFS, server-side copy is
incompatible with the SMB CA. If CA is enabled for a share and the client opens a
persistent file handle, server-side copy is automatically disabled for that file.
To enable SMB, in the WebUI, go to to the Protocols > Windows sharing (SMB)
page, and then select the SMB server settings tab. The SMB server settings page
contains the global settings that determine how the SMB file sharing service
operates. These settings include enabling or disabling support for the SMB service.
The SMB service is enabled by default.
A case for disabling the SMB service is when testing disaster readiness. The
organization fails over the production cluster or directory to a remote site. When the
remote data is available and users write to the remote cluster, all SMB traffic
should be halted on the production site. Preventing writes on the production site
prevents data loss when the remote site is restored back to the production site.
Link:
https://edutube.emc.com/html5/videoPlayer.htm?vno=aMwue+nqUbFdOFoqKa98F
g
The demonstration walks through the process of creating an SMB share, mapping
the share, and verifying access.
Select the correct access zone before creating the SMB share. The share name
can contain up to 80 characters, and can only contain alphanumeric characters,
hyphens, and spaces. The description field contains basic information about the
share. There is a 255 character limit. A description is optional, but is helpful when
managing multiple shares. Type the full path of the share in the path field,
beginning with /ifs. You can also browse to the share. If the directory does not
exist, the Create SMB share directory if it does not exist creates the required
directory.
Use caution when applying the default ACL settings as it may overwrite existing
permissions in cases where the data has been migrated onto the cluster. When a
cluster is set up, the default permissions on /ifs may or may not be appropriate
for the permissions on your directories. As an example, /ifs/eng is an NFS
export and you explicitly want the /ifs/eng mode bit rights set based on UNIX
client application requirements. Selecting the Apply Windows default ACLs
option as shown in the screen capture, overwrites the original ACLs, which can
break the application. Thus, there is risk that is associated with using Apply
Windows default ACLs with an existing directory.
Conversely, say that /ifs/eng is a new directory that was created using the CLI.
Windows users can create and delete files in the directory. When creating the
share, if the Do not change existing permissions is set and then users attempt to
save files to the share, an access denied occurs because Everyone has read
access. Even as an administrator you cannot modify the security tab of the
directory to add Windows users because the mode bits limit access to only Root.
Windows default ACLs and then once the share is created, go into the
Windows Security tab and assign permissions to users as needed.
OneFS supports the automatic creation of SMB home directory paths for users.
Using variable expansion, user home directories are automatically provisioned.
Home directory provisioning creates a single home share that redirects users to
their SMB home directories. If one does not exist, a directory is automatically
created. To create a share that automatically redirects users to their home
directories, select the Allow variable expansion box. Variable expansion
automatically expands the %U and %D in the path to the specified user name and
domain name. To automatically a create directory for the user, check the Auto-
create directories box. You may also set the appropriate flags by using the isi
smb command in the command-line interface.
Adjustments made to Advanced settings override the default settings for this
share only. Administrators can make access zone global changes to the default
values in the Default share settings tab. Changing the default share settings is
not recommended. In the command-line interface, you can create shares using the
isi smb shares create command. You can also use the isi smb shares
modify to edit a share and isi smb shares list to view the current Windows
shares on a cluster.
Challenge
Introduction
Scenario
NFS Overview
The NFS service enables you to create as many NFS exports as needed.
Video link:
https://edutube.emc.com/html5/videoPlayer.htm?vno=qjvfjdLECp0nd099PzoK6Q
Script: NFS relies upon remote procedure call, or RPC, for client authentication and
port mapping. RPC is the NFS method that is used for communication between a
client and server over a network. RPC is on Layer 5 of the OSI model. Because
RPC deals with the authentication functions, it serves as gatekeeper to the cluster.
The procedure always starts with a CALL from a client. When the server receives
the CALL, it performs the service that is requested and sends back the REPLY to
the client. During a CALL and REPLY, RPC looks for client credentials, that is,
identity and permissions. A server can reject a client CALL for one of two reasons.
If the server is not running a compatible version of the RPC protocol, it sends an
RPC_MISMATCH. If the server rejects the identity of the caller, it sends an
AUTH_ERROR.
Internet Assigned Numbers Authority has defined which RPC services should run
on which RPC ports, such as port 25 is used for SMTP email. In the same way,
calling a specific RPC port is the same as calling a particular service. For example
MOUNT is always found at 100005. Not all RPC services are registered at known
ports. As an example, NFS often requires the use of rpc.mountd or rpc.statd, yet
these services use a random IP port that is assigned by the cluster. Because IP
ports can dynamically change, portmapper is needed.
Portmapper provides the client RPC process with service ports. It acts as a
gatekeeper by mapping RPC ports to IP ports on the cluster so that the right
service is offered. Clients calling for an RPC service need two pieces of
information, the number of the RPC program it wants to call and the IP port
number. RPC services cannot run unless they register with portmapper. Let us look
at the flow of a request by a client. When the RPC services start up on the cluster,
it registers with portmapper. The service tells portmapper what port number it is
listening on, and what RPC program numbers it is prepared to serve. In this
example, the NFS clients requests access to a file. Portmapper knows for RPC
program 100003, is offered at IP port 2049.
Step 1 – The client wants to use NFS, accessing a file. The client forms a CALL
to the cluster requesting the port to get the RPC 100003 program. 100003 is the
RPC port that is assigned to NFS services.
Step 2 – The request goes to IP port 111, portmapper.
Step 3 – Then portmapper queries the cluster, gets the IP port of the service,
and then responds to the client.
Step 4 – The client now knows that the NFS service is found on IP port 2049.
Step 5 – Next the client makes its RPC CALL to IP port 2049 on the cluster.
Step 6 – The cluster sends a REPLY, and the client and server can start
negotiating.
In OneFS 7.2.1 and earlier versions when an NFSv4 client connects to the cluster,
it connects to a single node. If the node goes down or if there is a network
interruption between the client and the node, the NFSv4 client has to reconnect to
the cluster manually. The reconnect is due in part to the stateful nature of the
protocol. Reconnect is an issue because it is a noticeable interruption to the clients
work. To continue working, the client must manually reconnect to the cluster. Too
many disconnections would also prompt for the clients to open help desk tickets
with their local IT department to determine the nature of the
interruption/disconnection.
To enable and disable NFS using the WebUI, click Protocols > UNIX sharing
(NFS) > Global settings tab. The NFS service is enabled by default. The NFS
global settings determine how the NFS file sharing service operates. The settings
include enabling or disabling support for different versions of NFS. Enabling NFSv4
is nondisruptive, and it runs concurrently with NFSv3. Enabling NFSv4 does not
impact any existing NFSv3 clients.
If changing a value in the Export settings, that value changes for all NFS exports
in the access zone. Modifying the access zone default values is not recommended.
You can change the settings for individual NFS exports as you create them, or edit
the settings for individual exports as needed. If NFSv4 is enabled, specify the name
for the NFSv4 domain in the NFSv4 domain field on the Zone setting page.
Other configuration steps on the UNIX sharing (NFS) page are the possibilities to
reload the cached NFS exports configuration to ensure that any DNS or NIS
changes take effect immediately. You can customize the user/group mappings, and
the security types (UNIX and/or Kerberos), and other advanced NFS settings.
Link:
https://edutube.emc.com/html5/videoPlayer.htm?vno=x8fM3V3tRC61RpWeP6qvrQ
Create and manage NFS exports using either the WebUI or the CLI. For the CLI,
use the isi nfs exports command. Using the WebUI from the Protocols >
UNIX sharing (NFS) > NFS exports page, choose the access zone, and select
Create an export button. Shown is the Create an export window with the paths to
export highlighted. When multiple exports are created for the same path, the more
specific rule takes precedence. For example, if the 192.168.3 subnet has read-only
access and 192.168.3.3 client has read/write access. In this case, the 192.168.3.3
client has read/write access, even through it is within in the 192.168.3 subnet
because it is more specific. OneFS can have multiple exports with different rules
that apply the same directory. A network hostname, an IP address, a subnet, or a
netgroup name can be used for reference. The same export settings and rules that
are created here apply to all the listed directory paths. If no clients are listed in any
entries, no client restrictions apply to attempted mounts.
Permissions settings can restrict access to read-only and enable mount access to
subdirectories. Other export settings are user mappings. The Root user mapping
default is to map root users to nobody, and group is none. The default Security
types is UNIX (system). Kerberos security can be set also or instead UNIX
(system). Scrolling down in the Create an export window show the Advanced
settings.
NFSv3 does not track state. A client can be redirected to another node, if
configured, without interruption to the client. NFSv4 tracks state, including file
locks. Automatic failover is not an option in NFSv4. Because of the advances in the
protocol specification, NFSv4 can use Windows ACLs. NFSv4 mandates strong
authentication, and can be used with or without Kerberos. NFSv4 drops support for
UDP communications, and only uses TCP because of the need for larger packet
payloads than UDP supports. File caching can be delegated to the client. A read
delegation implies a guarantee by the server that no other clients are writing to the
file. A write delegation means that no other clients are accessing the file at all.
NFSv4 adds byte-range locking, moving this function into the protocol, whereas
NFSv3 relied on NLM for file locking. NFSv4 exports are mounted and browsable in
a unified hierarchy on a pseudo root (/) directory, which differs from previous
versions of NFS.
NFS Considerations
NFSv3 and NFSv4 clients should use dynamic IP address pools. For OneFS 8.0
and later, the recommended SmartConnect IP allocation setting for NFSV4 clients
is to use dynamic pools. For earlier versions of OneFS, NFSv4 IP allocation setting
should use static pools. Pre OneFS 8.0 Isilon supported up to 1000 exports,
however, many customers required or requested a larger number of exports. With
OneFS 8.0, to meet the demands of large and growing customers, Isilon now
supports up to 40 K exports.
Challenge
Lesson - Auditing
Introduction
Scenario
Auditing Overview
Auditing is the ability to log specific activities on the cluster. Auditing provides the
capability to track whether the data was accessed, modified, created, and deleted.
The auditing capabilities in OneFS include monitoring preaccess and postaccess
on the cluster. Preaccess configuration changes are cluster login failures and
successes, and postaccess are changes to protocols and configurations. The two
activities are the ability to audit any configuration changes and to audit the client
protocol activity. Audit capabilities are required to meet regulatory and
organizational compliance mandates. These include HIPAA, SOX, governmental
agency, and other requirements. Only the configuration changes made through
PAPI are logged.
The audit system also provides the capability to make the audit logs available to
third-party audit applications for review and reporting.
Audit Capabilities
In OneFS, if the configuration audit topic is selected, all data regardless of the zone
is logged in the audit_config.log by default. The audit_config.log is in
the /var/log directory. Protocol auditing tracks and stores activity that is
performed through SMB, NFS, and HDFS protocol connections. You can enable
and configure protocol auditing for one or more access zones in a cluster. Shown is
the Cluster management > Auditing page. Enable protocol auditing for an access
zone records file access events through the SMB, NFS, and HDFS protocols in the
protocol audit topic directories. You can specify which events to log in each access
zone. For example, you might want to audit the default set of protocol events in the
System access zone, but audit only successful attempts to delete files in a different
access zone. The audit events are logged on the individual nodes where the SMB,
NFS, or HDFS client initiated the activity. Then the events are stored in a binary file
under /ifs/.ifsvar/audit/logs. The logs automatically roll over to a new file
after the size reaches 1 GB.
You can configure the cluster to log audit events and forward them to syslog by
using the syslog forwarder. By default, all protocol events that occur on a
particular node are forwarded to the /var/log/audit_protocol.log file,
regardless of the access zone the event originated from.
user-defined audit success and failure events are eligible for being forwarded to
syslog.
Event Forwarding
You can configure OneFS to send protocol auditing logs to servers that support the
Common Event Enabler, or CEE. The CEE enables third-party auditing applications
to collect and analyze protocol auditing logs. The CEE has been tested and verified
to work on several third-party software vendors.
OneFS 8.2.0 improves protocol audit events to add control over what protocol
activity is audited. In OneFS 8.2.0, auditing stops the collection of audit events that
third-party applications do not register for or need. Shown are the detail_type
events. Use the CLI command isi audit settings view to list the events.
The events are a direct mapping to CEE audit events - create, close, delete,
rename set_security, get_security, write, read. The CEE servers listen, by default,
on port 12228.
The first command sets a create_file audit event upon success. The second
example logs all audit failures. To view the configured events for the access
zone, use the command that is shown.
Errors while processing audit events when delivering them to an external CEE
server are shown in the /var/log/isi_audit_cee.log. Protocol-specific logs
show issues that the audit filter has encountered:
/var/log/lwiod.log –SMB
/var/log/nfs.log –NFS
/var/log/hdfs.log -HDFS
OneFS uses an audit log compression algorithm when the file rolls over. The
algorithm uses on-the-fly compression and decompression of on-disk audit data.
Compression is transparent to the user. The estimated space saving from this
compression is 90%. Audit logfiles are located in
/ifs/.ifsvar/audit/logs/nodeXXX/topic directory and are compressed
as binary files.
Troubleshooting Resources
https://community.emc.com/docs/DOC-49017
Challenge
Introduction
Scenario
Hadoop Introduction
The Data Lake represents a paradigm shift away from the linear data flow model. A
Data Lake is a central data repository that enables you to access and manipulate
the data using various clients and protocols. The flexibility keeps IT from managing
and maintaining a separate storage solution (silo) for each type of data such as
SMB, NFS, Hadoop, SQL, and others. Utilizing Isilon to hold the Hadoop data gives
you all of the protection benefits of the Isilon OneFS operating systems. You can
select any of the data protection levels that OneFS offers giving you both disk and
node fault tolerance.
A Data Lake-based ingest captures a wider range of data types than were possible
in the past. Data is stored in raw, unprocessed forms to ensure that no information
is lost. Massively parallel processing and in memory technologies enable data
transformation in real time as data is analyzed. Because the Data Lake has a
single, shared repository, more tools can be made available on demand, enabling
data scientists and analysts to find insights. The Data Lake makes it simple to
surface the insights in a consistent way to executives and managers so that
decisions are made quickly.
takes from having an idea to identifying insight, to action, and creating value. A
Data Lake helps IT and the business run better.
Link: URL:
https://edutube.emc.com/html5/videoPlayer.htm?vno=wZCty171ec2RjiMSRZZe9g
Script: Hadoop enables the distributed processing of large datasets across clusters
of servers. Hadoop clusters can dynamically scale up and down based on the
available resources and the required services levels. Let us show a traditional
Hadoop cluster. The components are the NameNode, secondary NameNode, and
DataNodes. The NameNode holds the metadata, or the location information for
every file in the cluster. There is also a secondary NameNode that is a backup for
the NameNode. The secondary NameNode is passive. As its name implies, the
DataNode is where the data resides. Data is spread across the nodes with a 3x
mirror. A logical, compute process runs on each DataNode, handling compute
operations such as MapReduce that run analytics jobs. In a traditional Hadoop only
environment, the HDFS is a read-only file system. As you can imagine, it would be
difficult to do an analysis on a dataset that constantly changes. Typically, Hadoop
data exists in silos. Production data is maintained on production server and then
copied to a landing zone server, which imports or ingests the data into HDFS. The
data on HDFS is not production data, it is copied from another source.
NameNode gets all the IPs in the access zone, in this example, the hadoop access
zone. Next, the NameNode looks at the rack configuration and gets the IP
addresses for the rack. The NameNode also checks if any IP addresses are
blacklisted. Then the NameNode gives out rack IP addresses first based on client
IP, otherwise it returns IP addresses across the entire zone.
In closing, there are two top known issues with NameNode to DataNode IP address
allocation. First is when there are multiple access zones for HDFS, the NameNode
can give out IP addresses from a different access zone. Second, opening multiple
security contexts can cause the error status_Too_Many_Files_Open All
datanodes are bad error. The Pipeline Write Recovery feature fixes the security
context issue.
To recap the overview, all production data resides on Isilon, removing the need to
export it from your production applications and import it as with a traditional
Hadoop environment. The MapReduce continues to run on dedicated Hadoop
compute nodes. Isilon requires this Hadoop front end to do the data analysis. Isilon
holds the data so that Hadoop, applications, or clients can manipulate it.
Benefits
Data Protection – Hadoop does 3X mirror for data protection and has no
replication capabilities. OneFS supports snapshots, clones, and replication.
No data migration – Hadoop requires a landing zone to stage data before using
tools to ingest data to the Hadoop cluster. Isilon enables cluster data analysis
by Hadoop. Consider the time that it takes to push 100 TB across the WAN and
wait for it to migrate before any analysis can start. Isilon does in place analytics
so no data moves around the network.
Security – Hadoop does not support kerborized authentication. It assumes that
all members of the domain are trusted. Isilon supports integrating with AD or
LDAP, and gives you the ability to safely segment access.
Dedupe – Hadoop natively mirrors files 3x, meaning 33% storage efficiency.
Isilon is 80% efficient.
Compliance and security – Hadoop has no native encryption. Isilon supports
Self-Encrypting Drives, using ACLs and Mode bits, access zones, RBAC, and is
SEC-compliant.
Multi Distribution Support – each physical HDFS cluster can only support one
distribution of Hadoop. Isilon can co-mingle physical and virtual versions of any
Apache standards-based distributions.
Scales compute and storage independently. Hadoop pairs the storage with the
compute, so adding more space may require you to pay for more CPU that may
go unused. If you need more compute, you end up with lots of overhead space.
With Isilon you scale compute as needed or storage as needed, aligning your
costs with your requirements.
For additional information about in-place analytics:
http://www.emc.com/collateral/TechnicalDocument/docu50638.pdf
Hadoop Settings
HDFS Enhancements in OneFS 8.0 include a WebUI interface, and support for
auditing, CloudPools, and SMB file filtering. Shown is the WebUI Protocols >
Hadoop (HDFS) > Settings page and the corresponding isi hdfs settings
command.
The Default block size determines how the HDFS service returns data upon read
requests from Hadoop compute client. The server-side block size determines how
the OneFS HDFS daemon returns data to read requests. Leave the default block
size at 128 MB. If the customer runs an older version of HDFS, the block size may
need to be lowered to 64 MB. If the block size is set to high, many read/write errors
and performance problems occur. Tune on setup.
Default checksum type is used for old HDFS workflows. Because OneFS uses
forward error correction, checksums for every transaction are not used. Can cause
a performance issue if used.
Odp version - on updates, the Hortonworks version must match the version that is
seen in Ambari. Version conflict is common when customer upgrades Hortonworks.
Can cause jobs not to run. Installation also fails when Odp version does not
match.
Proxy users for secure impersonation can be created on the Proxy users tab. For
example, create an Apache Oozie proxy user to securely impersonate a user called
HadoopAdmin, enabling the Oozie user to request that the HadoopAdmin user
perform Hadoop jobs. Apache Oozie is an application that can automatically
schedule, manage, and run Hadoop jobs.
On the Virtual racks tabs, nodes can be preferred along with an associated group
of Hadoop compute clients to optimize access to HDFS data.
Troubleshooting Resources
https://community.emc.com/docs/DOC-49017
Challenge
Introduction
Scenario
OneFS supports Isilon Swift, an object storage interface compatible with the
OpenStack Swift 1.0 API. Isilon Swift is a hybrid between the two storage types,
storing Swift metadata as an alternative data stream. Through Isilon Swift, users
can access file-based data that is stored on the cluster as objects. The Swift API is
implemented as Representational State Transfer, or REST, web services over
HTTP or HTTPS. Since the Swift API is considered a protocol, content and
metadata can be ingested as objects and concurrently accessed through protocols
that are configured on the cluster. The cluster must be licensed to support Isilon
Swift.
File storage deals with a specific set of users who require shared access to a
specific set of files. Shared access led to file access permissions and locking
mechanisms, enabling users to share and modify files without effecting each
other’s changes. A file system stores data in a hierarchy of directories,
subdirectories, folders, and files. The file system manages the location of the data
within the hierarchy. If you want to access a specific file, you need to know where
to look for the file. Queries to a file system are limited. You can search for a specific
file type such as *.doc, or file names such as serverfile12*.*, but you cannot parse
through the files to find the content contained within them. Determining the context
of a file is also difficult. For example, should you store the file in an archival tier or
will you access the information regularly? It is difficult to determine the content of
the data from the limited metadata provided. A document might contain the minutes
of a weekly team meeting, or contain confidential personal performance evaluation
data.
Object storage combines the data with richly populated metadata to enable
searching for information by file content. Instead of a file that tells you the create or
modified date, file type, and owner, you can have metadata that tells you the
project name, formula results, personnel assigned, location of test and next run
date. The rich metadata of an object store enables applications to run analytics
against the data.
Object storage has a flat hierarchy and stores its data within containers as
individual object. An object storage platform can store billions of objects within its
containers, and you can access each object with a URL. The URL associated with
a file enables the file to be located within the container. Hence, the path to the
physical location of the file on the disk is not required. Object storage is well suited
for workflows with static file data or cloud storage.
Shown is the Swift logical data layout. Accounts are the administrative control point
for containers and objects, containers organize objects, and objects contain user
data. For users to access objects, they must have an account on the system. An
account is the top of the hierarchy.
Administrators must provision the accounts before users can use the service. The
general steps are enable Swift license, decide upon file system user or group
ownership, create accounts using the isi swift command, and then assign
users access to account. Make any necessary file system permission changes if
you are relocating data into the account.
The example shows creating a Swift account in the sales access zone and using
an Active Directory user and group. The isi swift accounts list shows the
accounts that are created in the access zone. The isi swift accounts view
shows the account details.
Storage URL
Shown is what a Swift Storage URL looks like. URIs identify objects in the form
http://<cluster>/v1/account/container/object. In the example
shown, 192.168.0.1 identifies the cluster. HTTP requests are sent to an internal
web service listening on port 28080. This port is not configurable. HTTPS requests
are proxied through the Apache web server listening on port 8083. This port is not
configurable. OpenStack defines the protocol version /v1. The reseller prefix
/AUTH_bob, where /AUTH is a vestige of the OpenStack implementation's internal
details. The _bob portion of the URL is the account name used. The container /c1
is the container in which an object is stored and the object /obj1 is the object.
Isilon Swift supports up to 150 concurrent active connections per cluster node.
When uploading objects or listing containers, the Isilon Swift service can become
memory-constrained and cause a service outage. To avoid an outage, maintain the
Swift Service memory load within 384 MB. Account and container listing requests
initiate a full file system walk from the requested entity. Workloads can expect
longer response times during the listing operations as the number of containers or
objects increase. To prevent response time issues, redistribute or reduce the
objects and containers until the response times are within the acceptable limits.
You cannot submit a PUT request to create a zero-length object because PUT is
incorrectly interpreted as a pseudo-hierarchical object. If the container is not empty,
you cannot submit a DELETE request to delete a container. As a best practice,
delete all the objects from the container before deleting the container. When
authenticating with Active Directory and Isilon Swift, the user name in the X-Auth-
User header must include the fully qualified AD domain name in the form test-
name@mydomain.com unless the domain has been configured as the default
through the assume-default-domain configuration parameter in the AD provider
configuration.
Pre OneFS 8.0 Swift accounts are deactivated when upgrading to OneFS 8.0 and
later. After the upgrade, Swift no longer uses home directories for accounts. The
upgrade plan should determine which users are using Swift. Create new accounts
under the new Swift path, and then move the data from the old accounts into the
newly provisioned accounts. Swift is not compatible with the auditing feature.
Challenge
Summary
Introduction
Module 5
Introduction
Scenario
Introduction
Scenario
OneFS stripes the data stripe units and FEC stripe units across the nodes. Some
protection schemes use more than one drive per node. OneFS uses advanced data
layout algorithms to determine data layout for maximum efficiency and
performance. Data is evenly distributed across nodes in the node pool as it is
written. The system can continuously reallocate where the data is stored and make
storage space more usable and efficient. Depending on the file size and the stripe
width, as the cluster size increases, the system stores large files more efficiently.
Every disk within each node is assigned both a unique GUID (global unique
identifier) and logical drive number. The disks are subdivided into 32-MB cylinder
groups that are composed of 8-KB blocks. Each cylinder group is responsible for
tracking, using a bitmap, whether its blocks are used for data, inodes or other
metadata constructs. The combination of node number, logical drive number, and
block offset make the block or inode address, controlled by the Block Allocation
Manager.
Displayed is a simple example of the write process. Step 1 - The client saves a file
to the node it is connected.
If the file is greater than 128KB, then the file is divided into data stripe units.
Step 2 - The file is divided into data stripe units. The node for which the client is
connected divides the file into data stripe units.
Displayed is a simple example of the write process. The client saves a file to the
node it is connected. The file is divided into data stripe units. The data stripe units
are assembled into the maximum stripe widths for the file. FEC stripe units are
calculated to meet the Requested Protection level. Next, the data and FEC stripe
units are striped across nodes.
Step 5: The data and FEC stripe units are striped across nodes.
The data stripe units and protection stripe units are calculated for each file stripe by
the Block Allocation Manager (BAM) process. The file data is broken in to 128-KB
data stripe units consisting of 16 x 8-KB blocks per data stripe unit. A single file
stripe width can contain up to 16, 128-KB data stripe units for a maximum size of 2
MB as the files data portion. A large file has thousands of file stripes per file that is
distributed across the node pool. The protection is calculated based on the
Requested Protection level for each file stripe using the data stripe units that are
assigned to that file stripe. The BAM process calculates 128-KB FEC stripe units to
meet the protection level for each file stripe. The higher the protection level, the
more FEC stripes units are calculated.
Files written to Isilon are divided in the file stripes. File stripe is a descriptive term
and is called stripes, protection stripes, or data stripes. File stripes are portions of a
file that are contained in a single data and protection band that is distributed across
nodes on the cluster. Each file stripe contains both data stripe units and protection
stripe units. The file stripe width, or stripe size, varies based on the file size, the
number of node pool nodes, and the applied file Requested Protection level. The
number of file stripes can range from a single stripe to thousands of stripes per file.
Mirrored data protection is exactly what the description would indicate. The
protection blocks are copies of the original set of data blocks. OneFS includes the
capability to use 2X to 8X mirrored protection. The number indicates the total
number of data copies to store. The original data blocks plus one to seven
duplicate copies. Also, mirroring is used to protect the file metadata and some
system files that exist under /ifs in hidden directories. Mirroring can be explicitly set
as the Requested Protection level in all available locations. One particular use case
is where the system is used to only store small files. A file of 128 KB or less is
considered a small file. Some workflows store millions of 1 KB to 4-KB files.
Explicitly setting the Requested Protection to mirroring can save fractions of a
second per file and reduce the write ingest time for the files. Mirroring is set as the
Actual Protection on a file even though another Requested Protection level is
specified under certain conditions. If the files are small, the FEC protection for the
file results in a mirroring. The loss protection requirements of the Requested
Protection determine the number of mirrored copies. Mirroring is also used if the
node pool is not large enough to support the Requested Protection level. For
example, five nodes in a node pool with N+3n Requested Protection, saves the file
at 4X mirror level, the Actual Protection.
N+Mn illustrates the primary protection level in OneFS. N represents the number of
data stripe units, and Mn represents the number of simultaneous drive or node
failures that can be tolerated without data loss. M also represents the number of
protection or FEC stripe units that are created and added to the protection stripe to
meet the failure tolerance requirements. The available N+Mn Requested Protection
levels are +1n, +2n, +3n, and +4n. N must be greater than M to gain benefit from
the data protection. Referring to the chart, the minimum number of nodes that are
required in the node pool for each Requested Protection level are displayed.
Shown is three nodes for N+1n, five nodes for N+2n, seven nodes for N+3n, and
nine nodes for N+4n. If N equals M, the protection overhead is 50 percent. If N is
less than M, the protection results in a level of FEC calculated mirroring. The drives
in each node are separated into related sub pools. The sub pools are created
across the nodes within the same node pool. The sub pools create more drive
failure isolation zones for the node pool. The number of sustainable drive failures
are per sub pool on separate nodes. Multiple drive failures on a single node are
equivalent to a single node failure. The drive loss protection level is applied per sub
pool. With N+Mn protection, only one stripe unit is on a single node. Each stripe
unit is written to a single drive on the node. Assuming the node pool is large
enough, the maximum file stripe width size is 16 data stripe units plus the
Requested Protection stripe units. The maximum stripe width per N+Mn protection
level is displayed.
Some protection schemes use a single drive per node per protection stripe. As
displayed in the graphic, only a single data stripe unit or a single FEC stripe unit is
written to each node. These protection levels are N+M or N+Mn. In the OneFS
WebUI and CLI, the syntax is represented as +Mn. M represents the number of
simultaneous drive failures on separate nodes that can be tolerated at one time. It
also represents the number of simultaneous node failures at one time. A
combination of both drive failures on separate nodes and node failures is also
possible.
The table shows each requested N+Mn Requested Protection level over the
minimum number of required nodes for each level. The data stripe units and
protection stripe units can be placed on any node pool and in any order. The
number of data stripe units depends on the size of the file and the size of the node
pool up to the maximum stripe width. As illustrated, N+1n has one FEC stripe unit
per protection stripe, N+2n has two, N+3n has three, and N+4n has four. N+2n and
N+3n are the two most widely used Requested Protection levels for larger node
pools, node pools with around 15 nodes or more. The ability to sustain both drive or
node loss drives the use when possible.
N+Md:Bn uses multiple drives per node as part of the same data stripe with
multiple stripe units per node. N+Md:Bn protection lowers the protection overhead
by increasing the size of the protection stripe. N+Md:Bn simulates a larger node
pool by using the multiple drives per node. The single protection stripe spans the
nodes and each of the included drives on each node. The supported N+Md:Bn
protections are N+2d:1n, N+3d:1n, and N+4d:1n. N+2d:1n is the default node
pool Requested Protection level in OneFS. M is the number of stripe units or drives
per node, and the number of FEC stripe units per protection stripe. The same
maximum of 16 data stripe units per stripe is applied to each protection stripe. The
maximum stripe with for each Requested Protection level is displayed in the chart.
The other FEC protection schemes use multiple drives per node. The multiple
drives contain parts of the same protection stripe. Multiple data stripe units and
FEC stripe units are placed on a separate drive on each node. The scheme is
called as N+M:B or N+Md:Bn protection. These protection schemes are
represented as +Md:Bn in the OneFS web administration interface and the CLI.
The M value represents the number of simultaneous tolerable drive failures on
separate nodes without data loss. It also represents the number of FEC stripe units
per protection stripe. The : (colon) represents an “or” conjunction. The B value
represents the number of tolerated node losses without data loss. Unlike N+Mn,
N+Md:Bn has different values for the number of drive loss and node losses that
are tolerated before data loss may occur. When a node loss occurs, multiple stripe
units are unavailable from each protection stripe and the tolerable drive loss limit is
reached when a node loss occurs. Displayed is an example of a 1MB file with a
Requested Protection of +2d:1n. Two stripe units, either data or protection stripe
units are placed on separate drives in each node. Two drives on different nodes
per disk pool can simultaneously be lost or a single node without the risk of data
loss.
Displayed are examples for the available N+Md:Bn protection levels. The data
stripe units and FEC stripe units can be placed on any node in the node pool in any
order. N+2d:1n contains two FEC stripe units, and has two stripe units per node.
N+3d:1n contains three FEC stripe units, and has three stripe units per node. As
displayed, N+4d:1n contains four FEC stripe units, and has four stripe units per
node. N+2d:1n is the default Requested Protection in OneFS and is an acceptable
protection level for smaller node pools and node pools with smaller drive sizes.
N+3d:1n and N+4d:1 are most effective with larger file sizes on smaller node pools.
Smaller files are mirrored when these protection levels are requested.
In addition to the previous N+Md:Bn, there are two advanced forms of Requested
Protection. M represents the number of FEC stripe units per protection stripe.
However, the number of drives per node and the number of stripe units per node is
set at two. The number stripe units per node do not equal the number of FEC stripe
units per protection stripe. The benefit to the advanced N+Md:Bn protection levels
are they provide a higher level of node loss protection. Besides the drive loss
protection, the node loss protection is increased. The available Requested
Protection levels N+3d:1n1d and N+4d:2n. N+3d:1n1d includes three FEC stripe
units per protection stripe, and provides protection for three simultaneous drive
losses, or one node and one drive loss. The higher protection provides the extra
safety during data rebuilds associated with the larger drive sizes of 4 TB and 6 TB.
The maximum number of data stripe units is 15 and not 16 when using N+3d:1n1d
Requested Protection. N+4d:2n includes four FEC stripe units per stripe, and
provides protection for four simultaneous drive losses, or two simultaneous node
failures.
There are six data stripe units to write a 768-KB file. The desired protection
includes the ability to sustain the loss of two hard drives. If there is a eight node
cluster, two FEC stripe units would be calculated on the six data stripe units using
an N+2n protection level. The protection overhead in this case is 25 percent.
However there is only a four node cluster to write to. When using N+2n protection,
the 768-KB file would be placed into three separate data stripes, each with two
protection stripe units. Six protection stripe units are required to deliver the
Requested Protection level for the six data stripe units. The protection overhead is
50 percent. Using N+2d:1n protection the same 768-KB file requires one data
stripe, two drives per node wide per node and only two protection stripe units. The
eight stripe units are written to two different drives per node. The protection
overhead is the same as the eight node cluster at 25 percent.
Protection Overhead
The protection overhead for each protection level depends on the file size and the
number of nodes in the cluster. The percentage of protection overhead declines as
the cluster gets larger. In general, N+1n protection has a protection overhead equal
to the capacity of one node. N+2n protection has a protection overhead equal to
the capacity two nodes. N+3n is equal to the capacity of three nodes, and so on.
OneFS also supports optional data mirroring from 2x-8x, enabling from two to eight
mirrors of the specified content. Data mirroring requires significant storage
overhead and may not always be the best data-protection method. For example, if
you enable 3x mirroring, the specified content is explicitly duplicated three times on
the cluster. Depending on the amount of content being mirrored, the mirrors can
require a significant amount of capacity. The table that is shown indicates the
relative protection overhead that is associated with each FEC Requested
Protection level available in OneFS. Indicators include when the FEC protection
would result in mirroring.
Gen 6 nodes support all the same data protection levels used by the previous
generations of Isilon hardware. However, for better reliability, better efficiency, and
simplified protection, use N+2d:1n, N+3d:1n1d, or N+4d:2n, as indicated with a
red box.
Introduction
Scenario
In Gen 6 nodes, changes around data protection and efficiency focus on three
critical areas such as the mirrored journal, mirrored boot drives, and smaller
neighborhoods. Mirrored journals improve node reliability. Nodes have a consistent
copy of the journal, either locally, on flash or on the peer node, if a node should fail.
Mirrored boot drives that are not on separate flash drives, but that are on the data
drives is a win for supportability. For example, there have been situations where a
customer accidentally pulls out the bootflash drives not realizing what they were.
Now, with the boot partitions on the existing data drives, there is no chance for a
customer or support personnel to accidentally make that error. Smaller
neighborhoods improve efficiency by the fact that the fewer devices you have
within a neighborhood, the less chance that multiple devices will simultaneously
fail.
Shown are the high-level descriptions that are used when discussing data
protection in OneFS. Requested protection is what is configured, Suggested
protection is what OneFS recommends, and Actual Protection is what OneFS
enforces. Mirrored Protection makes multiple copies of the data.
Requested Protection
The cluster-wide default data protection setting is made using the default file pool
policy. The setting applies to any file or directory that does not have a higher
priority setting. To view or edit the default setting, go to File system > Storage
pools > File pool policies, and click View / Edit on the Default policy.
The View Default Policy Details window is displayed with the current default file
pool policy settings. The current protection is displayed under Requested
protection. The default setting is to use the Requested protection setting at the
node pool level as highlighted in the Edit default policy details window. To
change the setting, use the drop-down to show the available options.
The example CLI sets the Requested protection for the file pool policy at +3d:1n.
The default file pool policy protection setting uses the node pool or tier setting.
Requested protection at the node pool level is set per node pool. When a node
pool is created, the default Requested protection that is applied to the node pool is
+2d:1n. The minimum Requested protection for an archive-series node pool is
+3d:1n1d. To meet the minimum, modify the archive-series node pool Requested
protection. The Requested protection should meet the minimum Requested
protection level for the node pool configuration.
To view and edit the Requested protection setting for the node pools in the WebUI,
go to the File system > Storage pools > SmartPools page. The current
Requested protection for each node pool is displayed in the Tiers and node pools
section. To modify the settings, click View / Edit . To expand the Requested
protection options, click the drop-down list. After selecting the new Requested
Protection level, click Save.
The CLI example shows setting the Requested protection of a node pool to +2n.
Use the WebUI File system explorer to view directories and files on the cluster.
OneFS stores the properties for each file. To view the files and the next level
subdirectories, click the specific directory. Search for a file using the Search button
or browse directly to a directory or file. To modify the protection level, click View /
Edit.Manual settings can be used to modify the protection on specific directories or
files. The settings can be changed at the directory, subdirectory, and file level. Best
practices recommend against using manual settings, because manual settings can
return unexpected results and create management issues as the data and cluster
age. Once manually set, reset the settings to default to use automated file pool
policy settings, or continue as manually managed settings. Manual settings
override file pool policy automated changes. Manual changes are made using the
WebUI File system explorer or the CLI isi set command.
The example use case for setting a directory Requested protection is that the
/ifs/finance/data directory requires a 4x mirror whereas all other node pool
directories use the +2d:1n node pool setting.
Shown is a workflow that moves data to an archive tier of storage. The archive tier
is on a node pool that is created on the A200 nodes. A file pool policy moves the
data from the production H600 node pool to the archive pool. The protection on the
production node pool is higher than the protection of the archive node pool. The
Requested protection settings for the use case can be set at the node pool level or
at the directory level.
Suggested Protection
OneFS notifies the administrator when the Requested protection setting is different
than the Suggested protection for a node pool. The notification does give the
suggested setting and node pools that are within Suggested Protection levels are
not displayed. As shown, Suggested protection is part of the SmartPools health
status reporting. The message indicates that the node pool is below the Suggested
protection level. Displayed is an example of the v200_24gb_2gb node pool with a
Requested Protection level that is different than the suggested. To modify the
settings, go to the SmartPools tab and click View / Edit on the pool.
Actual Protection
The actual protection applied to a file depends on the Requested protection level,
the size of the file, and the number of node pool nodes. The actual protection level
is the protection level OneFS sets. Actual protection is not necessarily the same as
the Requested protection level. The rules are:
Actual protection depends upon file size. Case 2, a 128-KB file is protected using
3x mirroring, because at that file size the FEC calculation results in mirroring.
In both cases, the actual protection applied to the file exceeds the minimum drive
loss protection of two drives and node loss protection of one node. The exception
to meeting the minimum Requested protection is if the node pool is too small and
unable to support the Requested protection minimums. For example, a node pool
with four nodes and set to +4n Requested protection. The maximum supported
protection is 4x mirroring in this scenario.
Shown is a chart indicating the actual protection that is applied to a file according to
the number of nodes in the node pool. The dark blue shows files protected at 50%
storage overhead, while offering the Requested protection level. The gray indicates
that the maximum size of the protection stripe is reached and a subset of the
available nodes is used for the file. Red shows the actual protection that is applied
is changed from the Requested protection while meeting or exceeding the
Requested protection level.
The isi get command displays the protection settings on an entire directory path
or, as shown, a specific file without any options. The POLICY or Requested
protection policy, the LEVEL or Actual protection, the PERFORMANCE or data
access pattern are displayed for each file. Using with a directory path displays the
properties for every file and subdirectory under the specified directory path. Output
can show files where protection is set manually. If there is no / in the output, it
implies a single drive per node. Mirrored file protection is represented as 2x to 8x in
the output.
isi get
The isi get command provides detailed file or directory information. The primary
options are –d <path> for directory settings and –DD <path>/<filename> for
individual file settings. Shown is the isi get –DD output. The output has three
primary locations containing file protection. The locations are a summary in the
header, line item detail settings in the body, and detailed per stripe layout per drive
at the bottom.
Challenge
Introduction
Scenario
There are four variables that combine to determine how data is laid out. The
variables make the possible outcomes almost unlimited when trying to understand
how the cluster behaves with varying workflow with differing variables. The number
of nodes in the cluster affects the data layout. Because data is laid out across all
nodes in the cluster, and then the number of nodes determines the stripe width.
The protection level also affects data layout. You can change the protection level of
your data down to the file level. Changing the protection level on a file changes
how it stripes across the cluster. The file size also affects data layout because the
system employs different layout options for larger files than for smaller files to
maximize efficiency and performance. The disk access pattern modifies both
prefetching and data layout settings that are associated with the node pool. Setting
a disk access pattern at a file or directory level enables using different patterns
across the cluster.
Ultimately OneFS lays out data in the most efficient, economical, highest
performing way possible. You can manually define some aspects of how it
determines what is best, but the process is automated. The maximum number of
drives for streaming is six drives per node across the node pool for each file.
An administrator can optimize layout decisions that OneFS makes to better suit the
workflow. The data access pattern influences how a file is written to the drives
during the write process. Concurrency is used to optimize workflows with many
concurrent users access the same files. The preference is that each protection
stripe for a file is placed on the same drive or drives depending on the Requested
protection level. For example, a larger file with 20 protection stripes, each stripe
unit from each protection stripe would prefer placement on the same drive in each
node. Concurrency is the default data access pattern. Concurrency influences the
prefetch caching algorithm to prefetch and cache a reasonable amount of
anticipated data during a read access.
Streaming is used for large streaming workflow data such as movie or audio files.
Streaming prefers to use as many drives as possible, within the given pool, when
writing multiple protection stripes for a file. Each file is written to the same sub pool
within the node pool. Streaming maximizes the number of active drives per node as
the streaming data is retrieved. Streaming also influences the prefetch caching
algorithm to be highly aggressive and gather as much associated data as possible.
A random access pattern prefers using a single drive per node for all protection
stripes for a file, like a concurrency access pattern. With random however, the
prefetch caching request is minimal. Most random data does not benefit from
prefetching data into cache.
The process of striping spreads all write operations from a client across the nodes
of a cluster. The graphic illustrates a 256-MB file that is broken down into chunks,
after which it is striped across disks in the cluster along with the FEC. Even though
a client is connected to only one node, when that client saves data to the cluster,
the write operation occurs in multiple nodes. The scheme is true for read
operations also. A client is connected to only one node at a time. However when
that client requests a file from the cluster, the client connected node does not have
the entire file locally on its drives. The client-connected node retrieves and rebuilds
the file using the back-end network.
All files 128 KB or less are mirrored. For a protection strategy of N+1 the 128-K file
would have 2X mirroring, the original data and one mirrored copy.
The example shows a file that is not evenly distributed in 128-KB chunks. Blocks in
the chunk that are not used are free for use in the next stripe unit. Unused blocks in
a chunk are not wasted.
A 1-MB file is divided into eight data stripe units and three FEC units. The data is
laid out in three stripes, one drive wide.
A 1-MB file is divided into eight data stripe units and three FEC units. The data is
laid out in three stripes. With a streaming access pattern, more spindles are
preferred.
OneFS also supports several hybrid protection schemes such as N+2:1 and N+3:1.
N+2:1 and N+3:1 protect against two drive failures or one node failure, and three
drive failures or one node failure, respectively. These protection schemes are
useful for high-density node configurations, where each node contains up to thirty-
six, multiterabyte SATA drives. Here, the probability of multiple drives failing far
surpasses that of an entire node failure. In the unlikely event that multiple devices
have simultaneously failures, such that the file is “beyond its protection level,”
OneFS reprotects everything possible. One FS reports errors on the individual files
that are affected to the cluster logs.
Shown is N+2d:1n protection of a 1-MB file. The file is divided into eight data stripe
units and three FEC units. The data is laid out in two stripes over two drives per
node to achieve the protection.
Data layout is managed the same way as Requested protection. The exception is
data layout is not set at the node pool level. Configuring the data access pattern in
done on the file pool policy, or manually on at the directory and file level. Set data
access patterns set using the WebUI or use isi set for directory and file level or
isi filepool policy for file pool policy level.
For WebUI administration, go to File system > Storage pools > File pool policies.
Modify either the default policy or an existing file pool policy.
Challenge
Introduction
Scenario
Shown are the storage pool building blocks. Storage pools are an abstraction that
encompasses neighborhoods, node pools, and tiers. Storage pools also monitor
the health and status of those storage pools at the node pool level. With storage
pools, multiple tiers of cluster nodes can coexist within a single file system with a
single point of management. With SmartPools, administrators can specify exactly
which files they want to live on particular node pool and storage tier. Node pool
membership changes through the addition or removal of nodes to the cluster. Tiers
are a grouping of different node pools.
SmartPools manages global settings for the cluster, such as L3 cache enablement
status, global namespace acceleration enablement, virtual hot spare management,
and global spillover settings. This lesson covers these settings.
Storage pools differ in Gen 6 and Gen 5. Unless noted, the graphics and lesson
content addresses Gen 6. Shown is the Gen 6 node pool that has two chassis,
eight nodes, and each node having five drive sleds with three disks. Gen 6 drive
sleds have three, four, or six drives.
Disk pools are the smallest unit and are a subset of neighborhoods. Disk pools
provide separate failure domain. Each drive within the sled is in a different disk
pool, limiting the chance for data unavailability. Each color in the graphic
represents a separate disk pool. Data protection stripes or mirrors do not span disk
pools, making disk pools the granularity at which files are striped to the cluster.
Disk pool configuration is automatic and cannot be configured manually.
Considering each disk pool default protection is +2d:1n, removing a sled does not
cause data unavailability as only one disk per disk pool is temporarily lost.
Similar node drives are automatically provisioned into neighborhoods. The graphic
shows eight nodes with all the nodes in a single neighborhood. Neighborhoods
span 4 to 20 nodes in a node pool. Gen 5 disk pools span 3 to 40 nodes in a node
pool. A node pool is used to describe a group of similar nodes. With Gen 6 and
OneFS 8.2.0, there can be from 4 up to 252 nodes in a single node pool. OneFS
versions prior to OneFS 8.2.0 are limited to 144 nodes. All the nodes with identical
hardware characteristics are automatically grouped in one node pool. A node pool
is the lowest granularity of storage space that users manage.
Neighborhoods
Neighborhood Splits
Gen 6 introduces new failure modes, such as simultaneous peer node journal
failure and chassis failure. If both journals fail, both nodes fail. When a
neighborhood splits, peer nodes are provisioned to protect against a simultaneous
peer node journal failure by placing a node peer in a separate fault domain. Though
a chassis-wide failure is highly unlikely, OneFS takes precautions against chassis
failure once a cluster is large enough. Nodes sharing a chassis are split across
fault domains, or neighborhoods, to reduce the number of node failures occurring
within one fault domain. The split is done automatically. The left image shows that
nodes have a single neighborhood from 1-to-18 nodes. When the 19th and 20th
nodes are added, the single neighborhood splits into two neighborhoods, with one
node from each node-pair moving into separate neighborhoods.
The neighborhoods split again when the node pool reaches 40 nodes. At 40 nodes,
each node within the chassis belongs to a separate neighborhood thus ensuring
that if a chassis fails, only one node from each neighborhood is down. Given a
protection of +2d:1n, the loss of a single chassis does not result in a data
unavailable or a data loss scenario.
SmartPools Licensing
More advanced features are available in SmartPools with a license. With the
advanced features you can create multiple tiers and file pool policies that direct
specific files and directories to a specific node pool or a specific tier. Because of
the availability to have multiple data target locations, some additional target options
are enabled in some global settings. Advanced features include the ability to create
multiple storage tiers, multiple file pool policy targets, and multiple file pool policies.
Each policy can have its own protection, I/O optimization, SSD metadata
acceleration, and node pool spillover settings. The advanced feature, disk pool
spillover management, enables the choice whether write operations are redirected
to another node pool when the target node pool is full. If SmartPools is unlicensed,
spillover is automatically enabled.
SmartPools Settings
Shown is the SmartPools settings page. The example shows the default state or
the options except for VHS. Discussed first is the option Increase directory
protection to a higher requested protection than its contents. The option
increases the amount of protection for directories at a higher level than the
directories and files that they contain. For example, if a +2d:1n protection is set
and the disk pool suffers three drive failures, the data that is not lost can still be
accessed. Enabling the option ensures that intact data is still accessible. If the
option is disabled, the intact file data is not accessible.
The CLI commands show disabling the Increase directory protection to a higher
requested protection than its contents option.
The SmartPools feature enables you to combine different node pools in the same
cluster, all in a single file system. SmartPools can automatically transfer data
among tiers with different performance and capacity characteristics. Tiering data
enables you to store appropriately, basing the data on its value and how it is
accessed. Global namespace acceleration, or GNA, enables the use of SSDs for
metadata acceleration across the entire cluster. GNA also uses SSDs in one part of
the cluster to store metadata for nodes that have no SSDs. The result is that critical
SSD resources are maximized to improve performance across a wide range of
workflows.
GNA can be enabled if 20% or more of the nodes in the cluster contain SSDs and
1.5% or more of the total cluster storage is SSD-based. The recommendation is
that at least 2.0% of the total cluster storage is SSD-based before enabling GNA.
Going below the 1.5% SSD total cluster space capacity requirement automatically
disables GNA metadata. If you SmartFail a node that has SSDs, the SSD total size
percentage or node percentage containing SSDs could drop below the minimum
requirement, disabling GNA. Any node pool with L3 cache enabled is excluded
from GNA space calculations and do not participate in GNA enablement.
Selecting GNA
Adding nodes for GNA may require an inordinate investment. Adding SSDs to
existing nodes may make more sense. With the cost of SSDs decreasing, it is
reasonable to add SSDs and avoid the GNA complexity. The table highlights the
pros and cons of enabling GNA.
Virtual hot spare, or VHS, allocation enables space to rebuild data when a drive
fails. VHS is available with the licensed and unlicensed SmartPools module. By
default, all available free space on a cluster is used to rebuild data. The virtual hot
spare option reserves free space for this purpose. VHS provides a mechanism to
assure there is always space available and to protect data integrity when the
cluster space is overused. For example, if specifying two virtual drives or 3%, each
node pool reserves virtual drive space that is equivalent to two drives or 3% of their
total capacity for VHS, whichever is larger. You can reserve space in node pools
across the cluster for this purpose, equivalent to a maximum of four full drives. If
using a combination of virtual drives and total disk space, the larger number of the
two settings determines the space allocation, not the sum of the numbers.
If you select the option to reduce the amount of available space, free-space
calculations exclude the space that is reserved for the VHS. The reserved VHS free
space is used for write operations unless you select the option to deny new data
writes.
The CLI example shows reserving 10 percent capacity for VHS. The isi
storagepool settings modify --virtual-hot-spare-limit-drives
<integer> sets the number of virtual drives to reserve for VHS.
Global Spillover
The Enable global spillover and Spillover data target options configure how
OneFS handles a write operation when a node pool is full. Spillover is node
capacity overflow management. With the licensed SmartPools module, you can
direct data to spillover to a specific node pool or tier group. If spillover is not
wanted, disable spillover so that a file will not move to another node pool. VHS
reservations can affect when spillover would occur. If the VHS reservation is 10
percent of storage pool capacity, spillover occurs if the storage pool is 90 percent
full.
Action Settings
Shown is the isi storagepool settings view command noting the settings
that were made in the previous CLI examples.
Challenge
Summary
Introduction
Module 6
Introduction
Scenario
Introduction
Scenario
File pool policies are used to determine where data is placed, how it is protected,
and what policy settings are applied. Settings are based on the user-defined and
default storage pool policies. File pool policies add the capability to modify the
settings at any time, for any file or directory. Files and directories are selected
using filters and apply actions to files matching the filter settings. The policies are
used to change the storage pool location, requested protection settings, and I/O
optimization settings. The management is file-based and not hardware-based.
Each file is managed independent of the hardware, and is controlled through the
OneFS operating system. The policies are applied in order through the SmartPools
job.
The default file pool policy is defined under the default policy. The individual
settings in the default file pool policy apply to files without settings defined in
another file pool policy that you create. You cannot reorder or remove the default
file pool policy.
To modify the default file pool policy, click File System, click Storage Pools, and
then click the File Pool Policies tab. On the File Pool Policies page, next to the
default policy, click View/Edit. After finishing the configuration changes, submit
and then confirm your changes. You can specify a pool for data and a pool for
snapshots. For data, you can choose any node pool or tier, and the snapshots can
either follow the data, or go to a different storage location. You can also apply the
cluster default protection level to the default file pool, or specify a different
protection level.
Streaming access works best for medium to large files that have sequential reads.
This access pattern uses aggressive prefetching to improve overall read
throughput.
You can create the filters in the File matching criteria section when creating or
editing a file pool policy. In the File Matching Criteria section, click the drop-down
list and select the appropriate filter and the appropriate operators. Operators can
vary according to the selected filter. Next, you can configure the comparison value,
which also varies according to the selected filter and operator.
At least one criterion is required, and multiple criteria are allowed. You can add
AND or OR statements to a list of criteria. Using AND adds a criterion to the
selected criteria block. Files must satisfy each criterion to match the filter. You can
configure up to three criteria blocks per file pool policy. The Ignore case box should
be selected for files that are saved to the cluster by a Windows client. File pool
policies with path-based policy filters and storage pool location actions are run
during the write of a file matching the path criteria. Path-based policies are first
started when the SmartPools job runs, after that they are started during the
matching file write. File pool policies with storage pool location actions, and policy
filters that are based on other attributes besides path, get written to the node pool
with the highest available capacity. This ensures that write performance is not
sacrificed for initial data placement.
SSD Usage
If a node pool has SSDs, by default the L3 cache is enabled on the node pool. To
use the SSDs for other strategies, first disable L3 cache on the node pool. The
metadata read acceleration option is the recommended SSD strategy. With
metadata read acceleration, OneFS directs one copy of the metadata to SSDs, and
the data and remaining metadata copies are directed to reside on HDDs. The
benefit of using SSDs for file-system metadata includes faster namespace
operations for file lookups. The settings that control SSD behavior in SmartPools
either in the Default File Pool policy or when SmartPools is licensed in the file pool
policy settings. Manual setting can be used to enable SSD strategies on specific
files and directories, but is not recommended.
Selecting metadata read acceleration creates one metadata mirror on the SSDs
and writes the rest of the metadata mirrors plus all user data on HDDs. Selecting
metadata read/write acceleration writes all metadata mirrors to SSDs. This
setting can consume up to six times more SSD space than a metadata read
acceleration, which can impact the OneFS ability to manage SnapShot operations.
Selecting Use SSDs for data and metadata writes all data and metadata for a file
on SSDs. Selecting Avoid SSDs writes all associated file data and all metadata
mirrors to HDDs only and does not use SSDs.
SSDs are node pool specific and used within only the node pool containing the
data. The exception is with global namespace acceleration (GNA). When enabling
GNA, data on node pools without SSDs can have additional metadata mirrors on
SSDs elsewhere in the cluster. If a node pool has SSDs and GNA is enabled,
OneFS uses the node pool SSDs first for GNA before using SSDs contained on
other node pools.
If converting from L3 cache to Use SSDs for metadata read acceleration or Use
SSDs for metadata read/write acceleration requires a migration. OneFS must
populate the SSDs with metadata. However, content of L3 is not migrated out as
OneFS cannot discard the L3 cache data. Switching the SSD strategy to L3 cache
from metadata acceleration or Use SSDs for data and metadata requires
migration of data and metadata from SSD to HDD. The migration is automatic, but
can take many hours or days to complete.
The table highlights the pros and cons of setting Use SSDs for metadata read
acceleration.
The table highlights the pros and cons of setting Use SSDs for metadata
read/write acceleration.
The table highlights the pros and cons of setting Use SSDs for data and
metadata.
File pool policies are applied to the cluster by a job. When SmartPools is
unlicensed, the SetProtectPlus job applies the default file pool policy. When
SmartPools is licensed, the SmartPools job processes and applies all file pool
policies. By default, the job runs at 22:00 hours every day at a low priority. Policies
are checked in order from top to bottom.
The FilePolicy job uses a file system index database on the file system instead of
the file system itself to find files needing policy changes. The FilePolicy job was
introduced in OneFS 8.2.0.
The SmartPoolsTree job is used to apply selective SmartPools file pool policies.
The job runs the isi filepool apply command. The Job Engine manages the
resources that are assigned to the job. The job enables for testing file pool policies
before applying them to the entire cluster.
Policy Template
When the template is used, the basic settings are preset to the name of the
template along with a brief description. You can change the settings. A filter is also
preconfigured to achieve the specified function, in this case to archive files older
than two months. You can configure more criteria using the links in the filter box.
Decide where to store the archived files and what, if any, changes to make to the
protection level. Also, you can change the I/O optimization levels. You can use a
template as a policy by changing the name and settings you desire, and then
saving the policy.
Templates may only be used to create policies in the web administration interface.
In the CLI, the templates provide a guide to creating the CLI text used to create the
policy.
Challenge
Lesson - CloudPools
Introduction
Scenario
Link:
https://edutube.emc.com/html5/videoPlayer.htm?vno=wx4VTLcN32kSlHGFwGLE1
Q
CloudPools 2.0 provides data access using standard file operations through the
kernel. It also eliminates data CoW to snapshots, implements more space savings
by storing sparse files efficiently in the cloud, and improves performance.
Also, CloudPools 2.0 prevents enabling compliance mode on stubs. Archiving a file
before it is committed and moving a stub into a compliance directory is denied.
CloudPools Considerations
In a public cloud, enterprises may pay only for the capacity they use per month. For
instance, storage of 100 TB on a public cloud might be three thousand dollars per
month. Once data is stored in the cloud, fees are incurred at a low rate for reading,
higher for writing or copying of the data, and still higher for the removal of data
back to private resources. Pricing varies widely based on performance
requirements and other agreements.
Private clouds use similar arrays of compute and storage resources. Private clouds
are offered either within the company network, or connected through a private
direct connection rather than the Internet, possibly through a VPN connection. The
private object stores may use Dell Technologies ECS or Isilon solutions as their
base infrastructure and offer various services similar to a public cloud.
When accessing files on the cluster, whether through SMB, NFS, HDFS, SWIFT,
and so on, files that are stored in the cloud vs. stored locally on the cluster appear
identical. The cluster makes the appropriate read request to bring the file to view
for the client when opening a file stored in the cloud. These read requests incur
more latency dependent on the quality of networking and service connection to the
cloud resource, but the client behavior remains the same. Updates to the file are
stored in the stub data cache on the Isilon cluster. At a designated interval, the
Isilon cluster flushes cached changes out to the cloud, updating the files. The
design enables administrators greater control of cloud storage costs, as writes
often incur more fees.
CloudPools Administration
Once the SmartPools and CloudPools licenses are applied, the web administration
interface shows the cloud storage account options. Selecting defines the
connection details for a cloud service. After a cloud storage account is defined and
confirmed, the administrator can define the cloud pool itself. The file pool policies
enable the definition of a policy to move data out to the cloud.
Shown here is the window for creating a cloud storage account. All the fields are
required. The Name or alias must be unique to the cluster. The Type is the type of
cloud account, and options are on the drop-down list. The URI must use HTTPS
and match the URI used to set up the cloud account. The User Name is the name
that is provided to the cloud provider. The Key is the account password that is
provided to (or received from) the cloud provider.
SmartPools file pool policies are used to move data from the cluster to the selected
CloudPool storage target. When configuring a file pool policy, you can apply
CloudPools actions to the selected files. As part of the setting, you select the
CloudPool storage target from the available list. You can elect to encrypt the data
before sending to the specified CloudPool. You can compress the data before
transfer to improve the transfer rate.
CloudPools Settings
Various default advanced CloudPool options are configured. You may want to
modify the setting for the file pool policy based on your requirements. Modification
are not necessary for most workflows.
From the CLI, you can manage specific files. You can archive files to the CloudPool
and recall files from the CloudPool using the isi cloud archive and isi
cloud recall commands. The CloudPools job is outside of the Job Engine.
Separate commands to manage the CloudPools jobs are provided using the isi
cloud jobs command. To view the files associated with a specific CloudPools
job, use the isi cloud jobs file command.
Files that are stored in the cloud can be fully recalled using the isi cloud
recall command. Recall can only be done using the CLI. When recalled, the full
file is restored to its original directory. The file may be subject to the same file pool
policy that originally archived it and rearchived to the cloud the next time the
SmartPools job runs. If rearchiving is unintended, the recalled file should be moved
to a different, unaffected, directory. The recalled file overwrites the stub file. The
command can be started for an individual file or recursively for all files in a directory
path.
The CloudPools C2S feature offers an integrated solution with AWS Commercial
Cloud Services (C2S), a private instantiation of the AWS commercial cloud. This
service is 'air gapped' which means no direct connection to the Internet.
C2S also provides support (from AIMA) to securely store certificates, validate, and
refresh if needed.
CloudPools Limitations
In a standard node pool, file pool policies can move data from high-performance
tiers to storage tiers and back as defined by their access policies. However, data
that moves to the cloud remains stored in the cloud unless an administrator
explicitly requests data recall to local storage. If a file pool policy change is made
that rearranges data on a normal node pool, data is not pulled from the cloud.
Public cloud storage often places the largest fees on data removal, thus file pool
policies avoid removal fees by placing this decision in the hands of the
administrator.
The connection between a cluster and a cloud pool has limited statistical features.
The cluster does not track the data storage that is used in the cloud, therefore file
spillover is not supported. Spillover to the cloud again presents the potential for file
recall fees. Spillover is designed as a temporary safety net, once the target pool
capacity issues are resolved, data would be recalled back to the target node pool.
Statistic details, such as the number of stub files on a cluster or how much cache
data is stored in stub files and would be written to the cloud on a flush of that
cache, is not easily available. Finally, no historical data is tracked on the network
usage between the cluster and cloud either in writing traffic or in read requests.
These network usage details should be found by referring to the cloud service
management system.
Challenge
Lesson - SmartQuotas
Introduction
Scenario
SmartQuotas Overview
SmartQuotas is a software module that is used to limit, monitor, thin provision, and
report disk storage usage at the user, group, and directory levels. Administrators
commonly use file system quotas for tracking and limiting the storage capacity that
a user, group, or project can consume. SmartQuotas can send automated
notifications when storage limits are exceeded or approached.
Quotas are a useful way to ensure that a user or department uses only their share
of the available space. SmartQuotas are also useful for enforcing an internal
chargeback system. SmartQuotas contain flexible reporting options that can help
administrators analyze data usage statistics for their Isilon cluster. Both
enforcement and accounting quotas are supported, and various notification
methods are available.
Before OneFS 8.2, SmartQuotas reports the quota free space only on directory
quotas with a hard limit. For user and group quotas, SmartQuotas reports the size
of the entire cluster capacity or parent directory quota, not the size of the quota.
OneFS 8.2.0 includes enhancements to report the quota size for users and groups.
The enhancements reflect the true available capacity that is seen by the user.
Enforcement quotas include the functionality of accounting quotas and enable the
sending of notifications and the limiting of disk storage. Using enforcement quotas,
you can logically partition a cluster to control or restrict the storage use by a user,
group, or directory. Enforcement quotas support three subtypes and are based on
administrator-defined thresholds. Hard quotas limit disk usage to a specified
amount. Writes are denied after reaching the hard quota threshold and are only
permitted when the used capacity falls below the threshold. Soft quotas enable an
administrator to configure a grace period that starts after the threshold is exceeded.
After the grace period expires, the boundary becomes a hard quota, and writes are
denied. If the usage drops below the threshold, writes are again permitted.
Advisory quotas do not deny writes to the disk, but they can trigger alerts and
notifications after the threshold is reached.
Quota Types
There are six types of quotas that can be configured, which are directory, default
directory, user, default user, group, and default group. Directory quotas are placed
on a directory, and apply to all directories and files within that directory, regardless
of user or group. Directory quotas are useful for shared folders where many users
store data, and the concern is that the directory will grow unchecked.
User quotas are applied to individual users, and track all data that is written to a
specific directory. User quotas enable the administrator to control how much data
any individual user stores in a particular directory. Default user quotas are applied
to all users, unless a user has an explicitly defined quota for that directory. Default
user quotas enable the administrator to apply a quota to all users, instead of
individual user quotas.
Group quotas are applied to groups and limit the amount of data that the collective
users within a group can write to a directory. Group quotas function in the same
way as user quotas, except for a group of people and instead of individual users.
Default group quotas are applied to all groups, unless a group has an explicitly
defined quota for that directory. Default group quotas operate like default user
quotas, except on a group basis. Do not configure any quotas on the root of the file
system (/ifs), as it could result in significant performance degradation.
With default quotas, you can apply a template configuration to another quota
domain. Versions previous to OneFS 8.2.0 have default quotas for users and
groups, but not for directory quotas. Common directory quota workflows are home
directories and project management, and having a default directory quota simplifies
quota management. Shown is an example of creating a 10-GB hard quota, default
directory quota on the /ifs/sales/promotions directory. The directory default quota
is not in and of itself a quota on the promotions directory. Directories below the
promotions directory, such as the /Q1 and /Q2 directories inherit and apply the 10-
GB quota. The /Q1 domain and the /Q2 domain are independent of each other.
Sub directories such as /storage and /servers do not inherit the 10-GB directory
quota.
Given this example, if 10 GB of data is reached in the /Q2 folder, that linked quota
is independent of the 10-GB default directory quota on the parent directory.
Modifications to default directory quota, promotions, reflect to inherited quotas
asynchronously. Inheritance is seen when listing quotas, querying inheriting quota
record, or when I/O happen in the sub directory tree.
In OneFS 8.2.0, the default directory quota can only be created using the CLI. The
WeUI can be used to view the created quotas and their links. The isi quota
command is used to create the default directory quota. The example shows
creating a template on the Features directory with a hard limit of 10 GB, an
advisory at 6 GB, a soft limit at 8 GB with a grace period of 2 days.
The default directory quotas can be viewed either using the CLI or the WebUI. The
directory quota /ifs/training/Features/Quota is linked to the default quota.
Selecting Unlink in the WebUI makes the quota independent of the parent,
meaning modifications to the default directory quota no longer apply to the sub
directory. This example shows removing the link on the Screen_shots sub directory
and then modifying the default directory quota on the parent, Quota, directory.
Remove the link using the button on the WebUI or isi quota quotas modify
--path=/ifs/training/Features/Quota/Screen_shots --
type=directory --linked=false Using the --linked=true option re-
links or links to the default directory quota.
Nesting Quotas
Nesting quotas is having multiple quotas within the same directory structure. In the
example shown, all quotas are hard enforced. At the top of the hierarchy, the
/ifs/sales folder has a directory quota of 1 TB. Any user can write data into this
directory, or the /ifs/sales/proposals directory, up to a combined total of 1 TB. The
/ifs/sales/promotions directory has a user quota assigned that restricts the total
amount that any single user can write into this directory to 25 GB. Even though the
parent directory (sales) is below its quota restriction, a user is restricted within the
promotions directory. The /ifs/sales/customers directory has a directory quota of
800 GB that restricts the capacity of this directory to 800 GB. However, if users
place 500 GB of data in the /ifs/sales/proposals directory, only 500 GB can be
placed in the other directories, as the parent directory cannot exceed 1 TB.
You can nest default directories. The example views the promotions directory with
a hard limit of 10 GB, advisory at 6 GB, soft limit at 8 GB with a 2 day grace period.
/Q2 is a default directory that is nested within the promotions default directory.
Quota Accounting
The quota accounting options are Include snapshots in the storage quota, and
Enforce the limits for this quota based on: Physical size, or File system
logical size, or Application logical size.
The default quota accounting setting enforces the File system logical size quota
limits. The default setting is to only track user data, not accounting for metadata,
snapshots, or protection. The option to Include snapshots in the storage quota
tracks both the user data and any associated snapshots. A single path can have
two quotas that are applied to it, one without snapshot usage (default) and one with
snapshot usage. If snapshots are in the quota, more files are in the calculation.
Include snapshots in the storage quota option cannot be changed after the
quota is created. The quota must be deleted and re-created to disable snapshot
tracking. The Physical size option tracks the user data, metadata, and any
associated FEC or mirroring overhead. This option can be changed after the quota
is defined.
OneFS 8.2.0 and later have the option to track quotas that are based on the
Application logical size. Application logical size tracks the usage on the
application or user view of each file. Application logical size is typically equal or
less than file system logical size. The view is in terms of how much capacity is
available to store logical data regardless of data reduction, tiering technology, or
sparse blocks. The option enforces quotas limits, and reports the total logical data
across different tiers, such as CloudPools. The example shows the reporting
behavior on a 1-MB file.
Overhead Calculations
In OneFS 8.2.0, advisory and soft quota limits can be viewed as a percent of the
hard quota limit. Only advisory and soft quota limits can be defined. A hard limit
must exist to set the advisory and soft percentage. Administrators cannot set both
an absolute and a percent-based limit on a directory.
Quota Notifications
In OneFS 8.2.0, administrators can configure quota notification for multiple users.
PAPI supports an email ID list in the action_email_address property:
{"action_email_address":
["user1@isilon.com","user2@isilon.com"].The maximum size of the
comma-separated email IDs list is 1024 characters. The isi quota command
option --action-email-address field accepts multiple comma-separated
values.
Notification Template
Shown is one of the available quota templates that are located in the /etc/ifs
directory. The <ISI_QUOTA_DOMAIN_TYPE> variable indicates what type of quota
has been reached. The template for an advisory or soft quota that is reached
includes the hard quota variable <ISI_QUOTA_HARD_LIMIT>.
In OneFS 8.2.0, administrators can configure quota notification for multiple users.
PAPI supports an email ID list in the action_email_address property:
{"action_email_address":
["user1@isilon.com","user2@isilon.com"].The maximum size of the
comma-separated email IDs list is 1024 characters.
SmartQuota Considerations
Challenge
Introduction
Scenario
File filtering enables administrators to deny or allow file access on the cluster that is
based on the file extension. File filtering controls both the ability to write new files to
the cluster or access existing files on the cluster. Explicitly deny lists are used to
block only the extensions in the list. Explicitly allow list permits access to files only
with the listed file extensions. There is no limit or predefined list of extensions.
Administrators can create custom extension lists based on specific needs and
requirements. The top level of file filtering is set up on a per access zone and
controls all access zone aware protocols such as SMB, NFS, HDFS, and Swift. The
file filtering rules limit any client on any access zone aware protocol. At a lower
level, file filtering is configurable for the SMB default share, and is configurable as
part of any individual SMB share setup. OneFS 8.0 introduces file filtering and no
license is required.
The example show that .avi files are prevented from writing to the finance access
zone.
If enabling file filtering on an access zone with existing shares or exports, the file
extensions determine access to the files. Users cannot access any file with a
denied extension. The extension can be denied through the denied extensions list,
or because the extension was not included as part of the allowed extensions list.
Administrators can still access existing files. Administrators can read the files or
delete the files. Modifying or updating a file is not permitted. If a user or
administrator accesses the cluster through an access zone or SMB share without
applying file filtering, files are fully available. How the file filtering rule is applied to
the file determines where the file filtering occurs. Administrators with direct access
to the cluster can manipulate the files. File filters are applied only when accessed
using the four protocols.
A use case to enforce file filtering is to adhere to organizational policies. With the
compliance considerations today, organizations struggle to meet many of the
requirements. For example, many organizations are required to make all email
available for litigation purposes. To help ensure that email is not stored longer than
wanted, deny storing .pst files to be stored on the cluster by the users. Another use
case is to limit the cost of storage. Organizations may not want typically large files,
such as video files, to be stored on the cluster, so they can deny .mov or .mp4 file
extension. An organizational legal issue is copy write infringement. Many users
store their .mp3 files on the cluster and open a potential issue for copywrite
infringement. Another use case is to limit an access zone for a specific application
with its unique set of file extensions. File filtering with an explicit allow list of
extensions limits the access zone or SMB share for its singular intended purpose.
You can configure file filtering at three separate levels within the cluster. Shown is
configuring at the access zone level. To configure file filtering at the access zone
level, go to to Access > File filter > File filter settings. Next select to deny or
allow and then enter the extension of the file, and click submit. The file extension
window does not permit the use of wildcards or special characters, only the (.)
period and extension, such as .mp3, .doc, .jpg.
You can configure file filters on the Protocols > Windows sharing (SMB) >
Default share settings page. For more granular control, you can configure file
filters on individual SMB shares. You can set file filters for SMB shares using the
isi smb shares create and isi smb shares modify commands as well
as using the WebUI. If using RBAC to delegate control of the task, the user must
have the ISI_PRIV_FILE_FILTER privilege.
Challenge
Lesson - SnapshotIQ
Introduction
Scenario
You can use snapshots to protect data against accidental deletion and
modification. If a user modifies a file and determines that the changes were
unwanted, the earlier file version can be copied back from the snapshot. Also,
because snapshots are available locally, users can restore their data without the
administrator intervention, saving administrators time. Also, you can use snapshots
for staging content to export, and ensuring that a consistent point-in-time copy of
your data is replicated or backed up. To use the SnapshotIQ, you must activate a
SnapshotIQ license on the cluster. However, some OneFS operations generate
snapshots for internal system use without requiring a SnapshotIQ license. If an
application generates a snapshot, and a SnapshotIQ license is not configured, you
can still view the snapshot. However, all snapshots OneFS operations generates
are automatically deleted when no longer needed. You can disable or enable
SnapshotIQ at any time.
SnapshotIQ uses both CoW and RoW for its differential snapshots. Basic functions
for SnapshotIQ are automatically creating or deleting snapshots, and setting the
OneFS uses both CoW and RoW. CoW is used for user generated snapshots and
RoW are system defined snapshots. Both have pros and cons, and OneFS
dynamically picks which method to use to maximize performance and keep
overhead to a minimum. With CoW, a new write to HEAD results in the old blocks
being copied out to the snapshot version first. Shown here changes are made to D.
Although this incurs a double write penalty, there is less fragmentation of the HEAD
file, which is better for cache prefetch and related file reading functions. Typically,
CoW is most prevalent in OneFS, and is primarily used for small changes, inodes,
and directories. RoW avoids the double write penalty by writing changes to a
snapshot protected file directly to another free area of the file system. However,
RoW has increased file fragmentation. RoW in OneFS is used for more substantial
changes such as deletes and large sequential writes.
Operation of Snapshots
Snapshot files are in two places. First, they are within the path that is being
snapped. For example, if snapping a directory located at
/ifs/data/students/name1, view the hidden .snapshot directory using the CLI or
Windows Explorer. The path would look like /ifs/data/students/name1/.snapshot.
The second location to view the .snapshot files is at the root of the /ifs directory.
From /ifs you can view all the .snapshots on the system, but users can only open
the .snapshot directories for which they already have permissions. They would be
unable to open or view any .snapshot file for any directory to which they did not
already have access rights.
There are two paths through which to access snapshots. The first is through the
/ifs/.snapshot directory. This is a virtual directory where you can see all the snaps
listed for the entire cluster. The second way to access your snapshots is to access
the .snapshot directory in the path in which the snapshot was taken. So if you are
snapping /ifs/data/media, you can change directory or browse your way to the
/ifs/data/media path, and you will have access to the /.snapshot directory for just
the snapshots taken on this directory.
Because snapshots are a picture of a file or directory at that point in time, the
permissions are preserved on snapshots; meaning that if you go and restore a
snapshot from 3 months ago, if the owner of that data has left the company, you
will need to restore the file and then change/update the permissions. Snapshots
are read-only. Snapshots are pointers to a point-in-time in the past. As the data is
modified, the changed blocks become owned by the snapshots, and the new
blocks are owned by the current version. You cannot go back to the pointers and
modify the blocks they point to after the fact. Isilon does provide this functionality in
the use of clones or writeable snapshots. Clones can be created on the cluster
using the cp command and do not require you to license the SnapshotIQ module.
The isi snapshot list | wc –l command tells you how many snapshots
you currently have on disk.
Snapshot Permissions
You can take snapshots at any point in the directory tree. Each department or user
can have their own snapshot schedule. All snapshots are accessible in the virtual
directory /ifs/.snapshot. Snapshots are also available in any directory in the path
where a snapshot was taken, such as /ifs/marketing/matt/.snapshot. Snapshot
remembers which .snapshot directory that you entered.
Permissions are preserved at the time of the snapshot. If the permissions or owner
of the current file change, it does not affect the permissions or owner of the
snapshot version. The snapshot of /ifs/sales/forecast/dave can be accessed from
/ifs/.snapshot or /ifs/sales/forecast/dave/.snapshot. Permissions for ../dave are
maintained, and the ability to traverse the .snapshot directory matches those
permissions.
Manage Snapshots
You can manage snapshots by using the web administration interface or the
command line.
To manage SnapshotIQ at the command line, use the isi snapshot command.
Creating Snapshots
Creating more than one snapshot per directory is advantageous. You can use
shorter expiration periods assigned to snapshots that are generated more
frequently, and longer expiration periods assigned to snapshots that are generated
less frequently. The default cluster limit is 20,000 snapshots. The default maximum
number of snapshots is 1,024 per directory path.
Restoring Snapshots
For example, here is a file system with writes and snapshots at different times:
•Time 3: A’,B,C,D’
•Time 4: A’,B,C,D’,E
So, what happens when the user wants to recover A that was overwritten in Time 3
with A’?
Challenge
Lesson - SyncIQ
Introduction
Scenario
Replication
Replication provides for making extra copies of data. Depending on the solution,
the copies are actively updated as changes are made to the source, or the copy
can be static and stand-alone. While replication is used for many purposes, it is
most often implemented as part of a business continuity plan. Replication for
business continuity is implemented either between block arrays or NAS devices.
Most enterprise NAS products offer some type of replication feature. Isilon uses the
SyncIQ feature for replication. Replication most often takes place between two
storage devices, a primary and a secondary. For a synchronization solution, clients
access and update the source data on the primary. The secondary is the target of
the replication and holds a copy of the data. When the source data gets updated on
the primary, those updates are replicated to the target.
OneFS 8.2.0 and later supports over-the-wire encryption to protect against man-in-
the-middle attacks, making data transfer between OneFS clusters secure.
SyncIQ Function
SyncIQ uses snapshot technology, taking a point in time copy of the SyncIQ
domain when the SyncIQ job starts. The first time the policy runs, an initial or full
replication of the data occurs. With subsequent policy runs changes are tracked as
they occur and then a snapshot is taken for the change tracking. The new change
list begins when a snapshot is taken to begin the synchronization.
The secondary system acknowledges receipt of the data, returning an ACK once
the entire file or update is securely received. When a SyncIQ job completes
successfully, a snapshot is taken on the target cluster. This snapshot replaces the
previous last known good snapshot. If a sync job fails, the last known good
snapshot is used to reverse any target cluster modifications. On the primary, when
a SyncIQ job completes successfully, the older source snapshot is deleted. With
SnapshotIQ licensed, administrators can choose to retain the snapshots for
historical purposes.
Each cluster can have target and source directories. A single directory cannot be
both a source and a target between the same two clusters, which would cause an
infinite loop. Only one policy per SyncIQ domain can be configured and each
replication set is one way from the source to the target.
secondary cluster. Users continue to read and write to the secondary cluster while
the primary cluster is repaired.
SyncIQ Policies
SyncIQ jobs are the operations that do the work of moving the data from one Isilon
cluster to another. SyncIQ generates these jobs according to replication policies.
A SyncIQ policy can copy or synchronize source data to meet organizational goals.
If a mirrored copy of the source is the goal, create a sync policy. If the goal is to
have all source data copied and to retain deleted file copies, then create a copy
policy. When creating a SyncIQ policy, choose a replication type of either sync or
copy.
Sync maintains a duplicate copy of the source data on the target. Any files that are
deleted on the source are removed from the target. Sync does not provide
protection from file deletion, unless the synchronization has not yet taken place.
Copy maintains a duplicate copy of the source data on the target the same as sync.
However, files that are deleted on the source are retained on the target. In this way,
copy offers file deletion, but not file change protection. This retention is passive and
not a secure retention. Copy policies can include file filter criteria not available with
the synchronization option.
You can always license SnapshotIQ on the target cluster and retain historic SyncIQ
associated snapshots to aid in file deletion and change protection.
To create a policy in the WebUI, go to to Policies tab on the SyncIQ page and
click the Create a SyncIQ Policy button. In the Settings section, assign a unique
name to the policy. Optionally you can add a description of the policy. The Enable
this policy box is checked by default. If you cleared the box, it would disable the
policy and stop the policy from running.
In the Source Cluster criteria, the Source root directory is the SyncIQ domain.
The path has the data that you want to protect by replicating it to the target
directory on the secondary cluster. Unless otherwise filtered, everything in the
directory structure from the source root directory and below replicates to the target
directory on the secondary cluster.
The Included directories field permits adding one or more directory paths below
the root to include in the replication. Once an include path is listed that means that
only paths listed in the include path replicate to the target. Without include paths all
directories below the root are included. The Excluded directories field lists
directories below the root you want explicitly excluded from the replication process.
You cannot fail back replication policies that specify includes or exclude settings.
The DomainMark job does not work for policies with subdrectories mentioned in
Include or Exclude. Using includes or excludes for directory paths does not affect
performance.
The File matching criteria enables the creation of one or more rules to filter which
files do and do not get replicated. Creating multiple rules connect them together
with Boolean AND or OR statements. When adding a new filter rule, click either the
Add an “And” condition or Add an “Or” condition links. File matching criteria
says that if the file matches these rules then replicate it. If the criteria does not
match the rules, do not replicate the file. File criteria can be based on file filters as
shown in the graphic. Filename includes or excludes files that are based on the file
name. Path includes or excludes files that are based on the file path. Paths can
also use wildcards. File type includes or excludes files that are based on one of
the following file system object types, soft link, regular file, or directory. Modified
includes or excludes files that are based on when the file was last modified.
Accessed includes or excludes files that are based on when the file was last
accessed. The Accessed option is available only if the global access-time-tracking
option of the cluster is enabled. Created includes or excludes files that are based
on when the file was created. Size includes or excludes files that are based on their
size. File sizes are represented in multiples of 1024, not 1000. Specifying file
criteria in a SyncIQ policy slows down a copy or synchronization job.
Selecting Run on all nodes in this cluster means that the cluster can use any of
its external interfaces to replicate the data to the secondary cluster. Selecting Run
the policy only on nodes in the specified subnet and pool uses only those
interfaces that are members of the specific pool for replication traffic. This option is
selecting a SmartConnect zone to use for replication traffic. The drop-down list
shows all the subnets and pools on the primary cluster. SyncIQ only supports static
IP address pools. Only static address pools should be used. If a replication job
connects to a dynamically allocated IP address, SmartConnect might reassign the
address while a replication job is running. The IP address reassignment
disconnects the job, causing it to fail.
The target cluster identification is required for each policy. You specify the target
host using the target SmartConnect zone IP address, the fully qualified domain
name, or local host. Local host is used for replication to the same cluster. You also
specify the target SyncIQ domain root path. Best practices suggest the source
target name, the access zone name are in the target directory path. An option is
provided to restrict the target nodes that are used to process to only the node
connected within the SmartConnect zone.
Snapshots are used on the target directory to retain one or more consistent recover
points for the replication data. You can specify if and how these snapshots
generate. To retain the snapshots SyncIQ takes, select Enable capture of
snapshots on the target cluster. SyncIQ always retains one snapshot of the most
recently replicated delta set on the secondary cluster to facilitate failover,
regardless of this setting. Enabling capture snapshots retains snapshots beyond
the time period that is needed for SyncIQ. The snapshots provide more recover
points on the secondary cluster.
The Snapshot Alias Name is the default alias for the most recently taken
snapshot. The alias name pattern is SIQ_%(SrcCluster)_%(PolicyName). For
example, a cluster called “cluster1” for a policy called “policy2” would have the alias
SIQ_cluster1_policy2. You can specify the alias name as a Snapshot naming
pattern. For example, the pattern %{PolicyName}-on-%{SrcCluster}-latest
produces names similar to newPolicy-on-Cluster1-latest.
Either Snapshots do not expire, or Snapshots expire after... and then stipulate
the time period. The expire options are days, weeks, months, and years. It is
recommended to always select a snapshot expiration period.
You can retain SyncIQ job reports for a specified time. With an increased number
of SyncIQ jobs in OneFS 8.0, the report retention period could be an important
consideration. If tracking file and directory deletions that are performed during
synchronization on the target, you can select to Record deletions on
synchronization. The Deep copy for CloudPools setting applies to those policies
that have files in a CloudPools target. Deny is the default. Deny enables only stub
file replication. The source and target clusters must be at least OneFS 8.0 to
support Deny. Allow the SyncIQ policy determine if a deep copy should be
performed. Force automatically enforces a deep copy for all CloudPools data that
are contained within the SyncIQ domain. Allow or Force are required for target
clusters that are not CloudPools aware.
SyncIQ Failover
Failover is the process of changing the role of the target replication directories into
the role of the source directories for assuming client read, write, and modify data
activities. The example shows a failover. Failovers can happen when the primary
cluster is unavailable for client activities. The reason could be from any number of
circumstances including natural disasters, site communication outages, or power
outages. The reason could be a planned event, such as testing a disaster recovery
plan or as a result of upgrade or other schedule maintenance activities. Failover
changes the target directory from read-only to a read/write status. Failover is
managed per SyncIQ policy. Only policies that are failed over are modified. SyncIQ
only changes the directory status and does not change other required operations
for client access to the data. Network routing and DNS must be redirected to the
target cluster. Any authentication resources such as AD or LDAP must be available
to the target cluster. All shares and exports must be available on the target cluster
or be created as part of the failover process.
SyncIQ Failback
Failover Revert
A failover revert undoes a failover job in process. Use revert if the primary cluster
once again becomes available before any writes happen to the target. A temporary
communications outage or if doing a failover test scenario are typical use cases for
a revert. Failover revert stops the failover job and restores the cluster to a sync
ready state. Failover revert enables replication to the target cluster to once again
continue without performing a failback. Revert may occur even if data modifications
have happened to the target directories. If data has been modified on the original
target cluster, perform a failback operation to preserve those changes. Not doing a
failback loses the changes made to the target cluster. Before a revert can take
place, a failover of a replication policy must have occurred. A revert is not
supported for SmartLock directories.
SyncIQ can synchronize CloudPools data from the CloudPools aware source
cluster to an Isilon target cluster. SyncIQ provides data protection for CloudPools
data and provides failover and failback capabilities. SyncIQ uses the CloudPools
API tools to enable support. The processes and capabilities of SyncIQ are based
on the OneFS version relationship between the source cluster and the target
cluster. This relationship determines the capabilities and behaviors available for
SyncIQ policy replication.
When OneFS 8.0 or later runs on both the source and target clusters, SyncIQ can
replicate and understand the CloudPools data natively. The CloudPools data
contains the stub file and the cached CloudPools synchronization data. SyncIQ
replicates and synchronizes both data components to the target cluster. Both the
source cluster and target cluster are CloudPools aware. If CloudPools is configured
and licensed, the target cluster supports direct access to CloudPools data. Failback
to the original source cluster updates the stub file information and current cached
CloudPools data as part of the process.
SyncIQ can support target clusters running OneFS 6.5 through OneFS 7.2.1.
These OneFS versions are pre CloudPools and are not aware of CloudPools stub
files. In this scenario, SyncIQ initiates a deep copy of the CloudPools data to the
target. The files that are synchronized contain the CloudPools information that is
stored as part of the file along with a full copy of the file data. The target cluster
cannot connect directly to the CloudPools and relies on the deep copy data that is
stored locally on the cluster. The synchronization behaves like any standard
SyncIQ job updating the target data. If failing over or failing back, the target relies
on the local copy of the data. During failback, the source cluster recognizes when a
file has been tiered to the cloud and updates the cloud with data from the target.
Changes made to the target file data is saved as a new file version on the cloud.
Link:
https://edutube.emc.com/html5/videoPlayer.htm?vno=6cyyA4XvBqkyHJwXs6ltdg
Troubleshooting Resources
https://community.emc.com/docs/DOC-49017
Challenge
Introduction
Scenario
Deduplication Overview
The user should not experience any difference except for greater efficiency in data
storage on the cluster, because the user visible metadata remains untouched.
Administrators can designate which directories to deduplicate, so as to manage
cluster resources. Not all workflows are right for every cluster.
Because the amount of time that deduplication takes is heavily dependent on the
size and usage level of the cluster, a large and complex environment benefits not
only from using the dry run procedure, but also from consultation with high-level
support or engineering.
Deduplication Considerations
SmartDedupe does not deduplicate files that are 32 KB or smaller, because doing
so would consume more cluster resources than the storage savings are worth. The
default size of a shadow store is 2 GB, and each shadow store can contain up to
256,000 blocks. Each block in a shadow store can be referenced up to 32,000
times. When deduplicated files are replicated to another Isilon cluster or backed up
to a tape device, the deduplicated files no longer share blocks on the target cluster
or backup device. Although you can deduplicate data on a target Isilon cluster, you
cannot deduplicate data on an NDMP backup device. Shadow stores are not
transferred to target clusters or backup devices. Because of this, deduplicated files
do not consume less space than non deduplicated files when they are replicated or
backed up. To avoid running out of space, ensure that target clusters and tape
devices have free space to store deduplicated data. You cannot deduplicate the
data that is stored in a snapshot. However, you can create snapshots of
deduplicated data.
Deduplication Function
A job in the OneFS Job Engine runs through blocks that are saved in every disk
pool, and compares the block hash values. If a match is found, and confirmed as a
true copy, the block is moved to the shadow store, and the file block references are
updated in the metadata. The job has a few of phases. The job first builds an index
of blocks, against which comparisons are done in a later phase, and ultimately
confirmations and copies take place. The deduplication job can be a time
consuming, but because it happens as a job the system load throttles, the impact is
seamless. Administrators find that their cluster space usage has dropped once the
job completes. Because deduplication job is a post process form of deduplication,
data has to be written to the system before it is inspected. Writing data before
deduplication enables faster cluster writes, but the disadvantage is that the cluster
may have duplicate data that is written before it eliminates the duplicates.
Dedupe Phases
The process of deduplication consists of four phases. The first phase is sampling,
in which blocks in files are taken for measurement, and hash values calculated. In
the second phase, blocks are compared with each other using the sampled data. In
the sharing phase, matching blocks are written to shared locations. Finally the
index of blocks is updated to reflect what has changed. The deduplication time is
heavily dependent on the cluster size and cluster usage level.
The deduplication dry run has three phases – the sharing phase is missing
compared to the full deduplication job. Because this is the slowest phase, it
enables customers to get a quick overview of how much data storage they are
likely to reclaim through deduplication. The dry run has no licensing requirement,
so customers can run it before licensing. The only factors that are open to
customer alteration are scheduling, job impact policy, and which directories on the
cluster to deduplicate.
A good use case for deduplication is home directories. A home directory scenario
where many users save copies of the same file can offer excellent opportunities for
deduplication. Static, archival files are another example. Typically archival data is
seldom changing, therefore the storage that is saved may far outweigh the load
dedupe places on a cluster. Deduplication is more justifiable when the data is
relatively static. Workflows that create many copies of uncompressed virtual
machine images can benefit from deduplication. Deduplication does not work well
with compressed data, the compression process tends to rearrange data to the
point that identical files in separate archives are not identified as such.
Environments with many unique files do not duplicate each other, so the chances
of blocks being found which are identical are low. Rapid changes in the file system
tend to undo deduplication, so that the net savings achieved at any one time are
low. If in doubt, or attempting to establish the viability of deduplication, perform a
dry run.
Deduplication Jobs
Because the sharing phase is the slowest deduplication phase, a dry run, or
DedupeAssessment, returns an estimate of capacity savings. The dry run places
minor load on the cluster and completes more quickly than a full deduplication run.
The assessment enables a customer to decide if the savings that are offered by
deduplication are worth the effort, load, and cost. Shown in the screen capture are
the jobs that are associated with deduplication, Dedupe and DedupeAssessment.
The administrator can start the dry run and edit the job type. Editing the Dedupe or
DedupeAssessment jobs enables the administrator to change the Default
priority, Default impact policy, and Schedule. The Default priority gives the job
priority as compared to other system maintenance jobs running simultaneously.
The Default impact policy is the amount of system resources that the job uses
compared to other system maintenance jobs running simultaneously. With the
Schedule options, you can start the job manually or set to run on a regularly
scheduled basis.
Deduplication Interface
After enabling the SmartDedupe license, you can find SmartDedupe under the File
system menu. From the Deduplication window you can start a deduplication job
and view any reports that have been generated. On the Settings tab, paths for
deduplicated must be entered. Selecting specific directory gives the administrator
granular control to avoid attempting to deduplicate data where no duplicate blocks
are expected, like large collections of compressed data. Deduplicating an entire
cluster without considering the nature of the data is likely to be inefficient.
Challenge
Module Summary
Introduction
Module 7
Introduction
Scenario
Introduction
Scenario
The Job Engine performs cluster-wide automation of tasks. The Job Engine
comprises of all the daemons, isi_job_d, that runs on each node. Each daemon
manages the separate jobs that are run on the cluster. The daemons run
continuously, and spawn off processes to perform jobs as necessary. Individual
jobs are procedures that are run until complete. Individual jobs are scheduled to
run at certain times, start by an event, such as a drive failure, or start manually by
the administrator. Jobs do not run on a continuous basis.
The isi_job_d daemons on each node communicate with each other to confirm
that actions are coordinated across the cluster. This communication ensures that
jobs are shared between nodes to keep the work load as evenly distributed as
possible. Each job is broken down into work units. The work units are handed off to
nodes based on node speed and workload. Every unit of work is tracked. That way,
if you pause a job needs, it can be restarted from where it last stopped.
Jobs are given impact policies that define the maximum amount of usable cluster
resources. The relationship between the running jobs and the system resources is
complex. A job running with a high impact policy can use a significant percentage
of cluster resources, resulting in a noticeable reduction in cluster performance.
Because jobs are used to perform cluster maintenance activities and are often
running, most jobs are assigned a low impact policy. Do not assign high impact
policies without understanding the potential risk of generating errors and impacting
cluster performance. Several dependencies exist between the category of the
different jobs and the amount of system resources that are consumed before
resource throttling begins. The default job settings, job priorities, and impact
policies are designed to balance the job requirements to optimize resources.
OneFS does not enable administrators to define custom jobs, and it does permit
administrators to change the configured priority and impact policies for existing
jobs. Changing the job priority can impact the systems ability to maintain data
protection and integrity. The recommendation is to not change the default impact
policies or job priorities without consulting qualified Isilon engineers
Job - An application that is built on the distributed work system of the Job Engine.
A specific instance of a job is controlled primarily through its job ID that is returned
using the isi job jobs start command.
Phase - One complete stage of a job. Some jobs have only one phase, while
others, like MediaScan, have as many as seven. If an error occurs in a phase, the
job is marked failed at the end of the phase and does not progress. Each phase of
a job must complete successfully before advancing to the next stage or being
marked complete returning a job state Succeeded message.
Task - A task is a division of work. A phase is started with one or more tasks that
are created during job startup. All remaining tasks are derived from those original
tasks similar to the way a cell divides. A single task does not split if one of the
halves reduces to a unit less than whatever makes up an item for the job. For
example, if a task derived from a restripe job has the configuration setting to a
minimum of 100 logical inode number (LINS), then that task does not split further if
it derives two tasks, one of which produces an item with fewer than 100 LINs. A LIN
is the indexed information that is associated with specific data.
Task result - A task result is a small set of statistics about the work that is done by
a task up to that point. A task produces one or more results, usually several,
sometimes hundreds. Task results are producing by merging item results, usually
on the order of 500 or 1000 item results in one task result. The coordinator
accumulates and merges the task results. Each task result that is received on the
coordinator updates the status of the job phase that is seen in the isi job
status command.
Checkpoints - Tasks and task results are written to disk, along with some details
about the job and phase, to provide a restart point.
Jobs can have several phases. There might be only one phase, for simpler jobs,
but more complex ones can have multiple phases. Each phase is run in turn, but
the job is not finished until all the phases are complete. Each phase is broken down
into tasks. These tasks are distributed to the nodes by the coordinator, and the job
is run across the entire cluster. Each task consists of a list of items. The result of
each item execution is logged, so that if there is an interruption, the job can restart
from where it stopped.
The coordinator, the directors, the managers, and the workers are the four main
functional components of the Job Engine. The coordinator is the executive of the
Job Engine. The thread starts and stops jobs, and processes work results as they
are returned during the execution of the job. The job daemons elect a job
coordinator. The election is by the first daemon to respond when a job is started.
The director runs on each node, communicates with the job coordinator, and
coordinates tasks with the managers.
Each manager process manages a single job at a time on the node, and is
responsible for managing the flow of tasks and task results throughout the node.
The managers on each node coordinate and manage the tasks with the workers on
their respective node. If three jobs run simultaneously, each node would have three
manager processes, each with its own number of worker threads. Managers
request and exchange work with each other and supervise the worker processes
they assign.
If any task is available, each worker is given a task. Then the worker processes the
task item by item until the task is complete or the manager removes the task from
the worker. The impact policy sets the number of workers that are assigned to a
task. The impact policy applied to the cluster is based on the highest impact policy
for all current running jobs.
Job Coordinator
The job daemons elect a coordinator by racing to lock a file. The node that first
locks the file becomes the coordinator. Racing is an approximate way of choosing
the least busy node as the coordinator. If the coordinator node goes offline and the
lock is released, the next node in line becomes the new coordinator. Then the
coordinator coordinates the execution of each job, and shares out the parts of each
job. To find the coordinator node, run isi_job_d status from the CLI. The node
number that is displayed is the node array ID.
Job Workers
The job daemon uses threads to enable it to run multiple tasks simultaneously. A
thread is the processing of a single command by the CPU. The coordinator tells
each node job daemon what the impact policy of the job is, and how many threads
should be started to complete the job. Each thread handles its task one item at a
time, and the threads operate in parallel. The number of threads determines the
number of items being processed. The maximum number of assigned threads
manages the defined impact level and the load that is placed on any one node.
It is possible to run enough threads on a node that they can conflict with each
other. An example would be five threads all trying to read data off the same hard
drive. Since serving each thread at once cannot be done, threads are queued and
wait for each other to complete.
The Job Engine includes the concept of job exclusions sets. Job phases are
grouped into three categories: restripe, mark and all other job phase activities. Two
categories of job phase activity, restripe and mark, modify the core data and
metadata. Up to three jobs can run simultaneously. Multiple restripe or mark job
phases cannot safely and securely run simultaneously without interfering with each
other or risking data corruption. The Job Engine restricts the simultaneous jobs to
include only one restripe category job phase and one mark category job phase
simultaneously. MultiScan is both a restripe job and a mark job. When MultiScan
runs, no additional restripe or mark job phases are permitted to run. Up to three
other jobs can run simultaneously and run simultaneous with the running restripe or
mark job phases. Only one instance of any job may run simultaneously. Shown are
the valid simultaneous job combinations.
In situations where the Job Engine sees the available space on one or more
diskpools below a low space threshold, it regards the cluster as running out of
space. When available space reaches the high threshold, the Job Engine exits the
low space mode.
Low space mode enables jobs that free space (space saving jobs) to run before the
Job Engine or even the cluster become unusable. It enables jobs like TreeDelete
and Collect to complete so that they free space.
A space saving job is identified with a flag in the job-config output: isi_gconfig
–t job-config jobs.types.<job type>.pi_alloc_reserved. If the
flag is true, the job is a space saving job. The jobs with the flag set by default are
shown.
Challenge
Introduction
Scenario
Jobs in Context
Many functions and features of an Isilon cluster depend on jobs, which means that
the Job Engine jobs are critical to cluster health. Jobs play a key role in data
reprotection and balancing data across the cluster, especially if the hardware fails
or the cluster is reconfigured. Features such as anti-virus scanning and quota
calculation also involve jobs. Additional jobs or job phases that are limited by
exclusion sets are queued and run sequentially. Higher priority jobs are run before
of lower priority jobs. Jobs with the same priority run in the order that the job start
request is made, a first-in-queue, first-to-run order. Jobs run sequentially, one job
that holds up other jobs can affect cluster operations. If contention occurs, examine
which jobs are running, which jobs are queued, when the jobs started, and the job
priority and impact policies for the jobs.
Some jobs can take a long time to complete. However, those jobs should get
paused so jobs of higher immediate importance can complete. MediaScan can
take days to complete and that is why the default priority is set to eight as the
lowest priority job. All other jobs may interrupt MediaScan. Pausing and restarting
is an example of the balance for job priorities that are considered when the default
setting were determined.
The most common Job Engine jobs can be broken into different types of use. Job
types are jobs that are related to the distribution of the data on the cluster, jobs that
are related to testing the data integrity and protection, jobs that are associated with
specific feature functionality, and other jobs which are used selectively for particular
needs. Jobs are not exclusive to themselves and often work in conjunction calling
other jobs to complete their task.
Four of the most common jobs are used to help distribute data across the cluster.
Collect runs a mark-and-sweep looking for orphaned or leaked inodes or blocks.
AutoBalance scans drives of an imbalanced cluster, balances the distribution of
files across the node pools and tiers. AutoBalanceLin is a LIN-based version of
AutoBalance. MultiScan is a combination of AutoBalance and Collect, and is
triggered after every group change. Collect is run if it has not been run recently,
the default is within the last 2 weeks.
Data integrity and protection jobs are regularly run on the cluster. These jobs can
be further broken down into active error detection and reprotection of the data. The
active error detection includes jobs that are often found running for long periods of
time. The jobs run when no other jobs are active and look primarily for errors on the
drives or within the files.
MediaScan scans the drives looking for error correction code, detected error
entries. MediaScan has many phases, with the general purpose of moving any file
system information off ECC-producing areas and repairing any damage.
IntegrityScan, like the first phase of Collect identifies everything valid in the file
system. The inspection process is meant to catch invalid file system elements.
The reprotection jobs focus on returning data to a fully protected state. Events such
as a drive failure trigger reprotection jobs. FlexProtect restores the protection level
of individual files. FlexProtect ensures that a file protected at, say, 3x, is still
protected at 3x. FlexProtect is run automatically after a drive or node removal (or
failure). FlexProtectLin is a LIN-based version of FlexProtect.
ShadowStoreProtect reprotects shadow store data to a higher protection level
when referenced by a LIN with a higher protection level.
Feature-related jobs are jobs that run as part of specific features scheduled in
OneFS. SetProtectPlus is the unlicensed version of SmartPools. SetProtectPlus
enforces the default system pool policies but does not enforce user pool policies.
SetProtectPlus is disabled when a SmartPools license is activated on the cluster.
When SmartPools is licensed, SmartPools maintains the layout of files in the node
or file pools according to file pool policies. SmartPoolsTree enables administrators
to run SmartPools on a particular directory tree, rather than the whole file system at
once. When SmartQuotas is licensed, QuotaScan scans modified quota domains
to incorporate existing data into new quotas. Quota creation automatically triggers
a QuotaScan.
In order from the oldest to newest deleted snapshot, SnapshotDelete deletes the
file reference in the snapshot, and then deletes the snapshot itself. With
SnapshotIQ licensed, SnapRevert reverts an entire snapshot back to the original
version. AVScan scans the file system for viruses. AVScan uses an external anti-
virus server. FSAnalyze gathers data for InsightIQ, or file system analytics to
provide cluster data such as file counts, a heat mapping, and usage by user.
scans the SmartLock directories for uncommitted files for retention, and commits
the appropriate files to a WORM state.
The last category of jobs contains the jobs that are selectively run for specific
purposes. These jobs may be scheduled, however, the administrator runs them
only when the job is required.
Earlier exclusion sets were discussed. The needs of the jobs individual phases
determines the jobs exclusion set categories. Because a job is in an exclusion set
does not mean that all its phases fit into the same exclusion set. OneFS makes the
exclusion determination at the outset of a phase, not the entire job. FlexProtect
can be part of an exclusion set when run proactively. FlexProtect overrides and
pauses all other jobs when run as an event triggered job.
The MultiScan job performs the AutoBalance action and optionally a Collect
action. The Collect action is always enabled when MultiScan is started by an
external start request. AutoBalance balances free space in the diskpool, and the
Collect job reclaims unused blocks from drives that were unavailable when the
blocks needed to be freed. In OneFS 8.2.0, one of two conditions can trigger a
MultiScan after at least one drive comes up (either new or back from being down):
FlexProtect Job
FlexProtect is the highest priority job on the cluster. FlexProtect can be run
manually as a non event triggered job to and coexist with other Job Engine jobs on
the cluster. An example would be when there is proactive action to SmartFail a
drive for replacement with an SSD during a hardware upgrade activity. If a drive
failure triggers FlexProtect, FlexProtect takes exclusive ownership of the Job
Engine. All other jobs are paused or suspended until the FlexProtect job
completes. FlexProtect ownership is normal behavior and is intended to reprotect
the data as quickly as possible to minimize any potential risk of data loss. The
FlexProtect job does not pause when there is only one temporarily unavailable
device in a diskpool, when a device SmartFails, or for dead devices.
Job Priority
Every job is assigned a priority that determines the order of precedence relative to
other jobs. The lower the number assigned, the higher the priority of the job. As an
example, FlexProtect is assigned a priority of 1, which is the top job priority.
When multiple jobs attempt to run simultaneously, the job with the highest priority
takes precedence over the lower priority jobs. If a lower priority job is running and a
higher priority job is called, the lower priority job is interrupted and paused until the
higher priority job completes. The paused job restarts from the point at which it was
interrupted. New jobs of the same or lower priority of a currently running job are
queued and then started after current job completes.
Changing the priority of a job can have negative effect on the cluster. Jobs priority
is a trade-off of importance. Historically, many issues have been created by
changing job priorities. Job priorities should remain at their default unless instructed
to change by a senior level support engineer.
Every job is assigned an impact policy that determines the amount of cluster or
node resources that are assigned to the job. The determination of what is more
important must be made, the system resources to complete the job, or the available
resources for processing workflow requirements. The set default impact policy is
based on how much load the job places on the system. Complex calculations are
used in determining how cluster resources are allocated. By default, the system
includes default impact profiles with varying impact levels assigned.
By default, most jobs have the LOW impact policy, which has a minimum impact on
the cluster resources. More time-sensitive jobs have a MEDIUM impact policy.
These jobs have a higher urgency of completion that is usually related to data
protection or data integrity concerns. The use of the HIGH impact policy is
discouraged because it can affect cluster stability. HIGH impact policy use can
cause contention for cluster resources and locks that can result in higher error
rates and negatively impact job performance. The OFF_HOURS impact policy
enables greater control of when jobs run, minimizing the impact on the cluster and
providing the resources to handle workflows.
Impact policies in the Job Engine are based on the highest impact policy for any
running job. Impact policies are not cumulative between jobs but set the resource
levels and number of workers that are shared between the jobs. Significant issues
are caused when cluster resources are modified in the job impact settings.
Lowering the number of workers for a job can cause jobs to never complete.
Raising the impact level can generate errors or disrupt production workflows. Use
the default impact policies for the jobs whenever possible. If customer workflows
require reduced impact levels, create a custom schedule that is based on the
OFF_HOURS impact policy.
The graphic shows the default job priority and impact policy for each of the system
jobs. Only a few jobs are priority 1, and have the MEDIUM impact policy. All three
of these jobs are related to data protection and data integrity. The two jobs with a
priority of 2 and a MEDIUM impact policy are jobs to complete quickly to ensure no
disruption to the system processes. No jobs have the HIGH impact policy. Few
workflows can tolerate disruption in cluster responsiveness when a HIGH impact
policy is used. The Job Engine starts the DomainMark and SnapshotDelete jobs,
but run under the SyncIQ framework. The SyncIQ framework uses a different
mechanism to perform tasks.
Challenge
Introduction
Scenario
The Job Engine is directly managed using the WebUI or through the CLI. Some
feature-related jobs are scheduled through the feature settings.
Management Capabilities
The cluster health depends on the Job Engine and the configuration of jobs in
relationship to each other. The system is engineered to maintain a delicate balance
between cluster maintenance and cluster performance. Many capabilities are
available through the WebUI, CLI, and PAPI.
Job status and history are viewed easily. Failed jobs or jobs with frequent starts or
restarts are identifiable. Administrators can view and modify job settings, change
job priorities, impact policies, and schedule jobs. Administrators can also
manipulate running jobs by pausing or stopping jobs at any time. Jobs can also run
manually.
If necessary to run a job with a modified priority or impact level from the default, run
the job manually. Priority and impact level settings are set for a manually run job.
OneFS does not enable the capability to create custom jobs or custom impact
levels. If adjusting the job impact level, create a custom schedule using the
OFF_HOURS impact policy and adjust the impact levels that are based on time
and day.
Management - WebUI
The WebUI is the primary interface for the Job Engine. You can view job status, job
histories, view and change current job schedules, view and manage job priorities
and impact policies, and run jobs manually. Job management in the WebUI can
vary in different versions of OneFS.
The isi job status command is used to view running, paused, or queued jobs,
and the status of the most recent jobs. Use command to view the running and most
recent jobs. Failed jobs are clearly indicated with messages. The output provides
job-related cluster information, including identifying the coordinator node and if any
nodes are disconnected from the cluster.
The isi job statistics command has the options to list and view. Shown
is the verbose option, providing detailed information about the job operations. To
get the most information about all current jobs, use the isi job statistics
list –v command. Use the isi job statistics view <jobID> -v
command to limit the information to a specific job. The command provides granular,
real-time information on running jobs for troubleshooting.
Misconfigured jobs can affect cluster operations. Most of these failures can be
observed by examining how the jobs have been configured to run, and how they
have been running and if jobs are failing. Failed jobs can be an indicator of other
cluster issues also. For example, if the MultiScan or Collect jobs have many starts
and restarts, indicating group changes. Group changes occur when drives or nodes
leave or join the cluster.
The job events and operations summary either from the WebUI or the CLI are
useful for immediate history. Often an issue is recurring over time and can be more
easily spotted from the job history or job reports. For example, a high priority job
constantly pushes other jobs aside, but a less consistent queue backup can still
prevent features from properly operating. This can require much deeper dives into
the job history to see what isn’t running, or is running only infrequently.
reason for the change. Look for alternative configuration options to achieve the
goal.
Impact level changes directly affect the job completion time and the cluster
resources. For example, an administrator modified the LOW impact policy to have
0.1 maximum workers or threads per storage unit. The result was that no low
impact job ever completed. The customer then changed all of the jobs with LOW
impact policies to a MEDIUM impact policy. When the jobs ran it negatively
impacted cluster performance. After investigation, the customer made the changes
to limit the impact during peak workflow hours. To fix the issue, all settings were
restored to the system defaults. The use of a custom schedule was implemented
using a modification of the OFF_HOURS policy, obtaining the intended goal.
Challenge
Module Summary
Introduction
Module 8
Introduction
Scenario
Introduction
Scenario
A full operating system upgrade is done when upgrading OneFS, requiring a cluster
reboot. Two types of upgrade can be done, rolling and simultaneous. OneFS 8.0
introduced the option of rolling or simultaneous upgrades. A rolling upgrade reboots
the cluster nodes one at a time. Only one node is offline at a time. Nodes are
upgraded and restarted sequentially. Hosts connected to a restarting node are
disconnected and reconnected. Rolling upgrades are not available between all
OneFS versions. Simultaneous upgrades are faster than rolling upgrades, but
reboot all nodes simultaneously, thus incurring an interruption in data access. Isilon
has redesigned and rebuilt the architecture surrounding upgrades to ensure that all
supported upgrades can be performed in a rolling fashion. The upgrade to OneFS
8.0 requires a simultaneous reboot to implement the new upgrade infrastructure.
Only upgrades from OneFS 8.0 and later have the option of choosing the type of
upgrade.
Rolling upgrades are nondisruptive to clients that can seamlessly failover their
connections between nodes. These clients include NFSv3, NFSv4 with CA, and
SMB 3.0 with CA shares and witness protocol features. SMB 2.0 is a stateful
protocol and does not support transparent failover of the connection. Stateful
protocol clients have a brief disruption when a node is rebooted into the new code.
Shown is the matrix to upgrade to OneFS 8.2.0. No direct upgrades are supported
to OneFS 8.2.0 from OneFS versions previous to OneFS 8.0.0. Refer to the OneFS
Upgrades - Isilon Info Hub community page at
https://community.emc.com/docs/DOC-44007 for supported upgrade paths,
upgrade details, and documentation. If the cluster version of OneFS is not
supported and an upgrade to a supported version cannot be done, contact Isilon
Technical Support.
Upgrade - WebUI
Shown is the WebUI page for upgrades. A preupgrade check can be run before the
upgrade to help with upgrade planning and address issues that may impact the
upgrade before it happens. The preupgrade check is also run automatically as the
first step of any upgrade. Selecting Upgrade OneFS launches the Upgrade
OneFS window. In the upgrade settings, you can specify the upgrade type, select
specific nodes to upgrade, and set the node order to upgrade. You can monitor the
upgrade progress using the WebUI and the CLI. OneFS list alerts on upgrade
success or failure.
Any good change management process includes the ability to back out of changes.
Administrators can rollback to the previously installed OneFS with all cluster data
fully intact. The rollback gives organizations the ability to stop or back out of an
upgrade plan. Perform a rollback any time before the release is committed. The
upgrade type does not impact the ability to rollback.
Organizations can remain in an upgraded, uncommitted state for ten days, after
which OneFS prompts to commit to the upgrade. Initiate a rollback using the
WebUI or CLI. The rollback initiates a cluster-wide reboot to return the cluster to
the prior state. Any data written after the initiation of the upgrade remains intact
with any applicable user changes during that time. However, configuration changes
specific to features in the upgraded version that are unsupported in the prior
version are lost upon rollback. The rollback feature is available only after upgrading
from OneFS 8.0. A rollback cannot be done to a release earlier than OneFS 8.0.
If no issues are found, the administrator can “commit” the release. Once initiating
the commit, any post upgrade jobs that could not be rolled back safely are initiated
and the entire upgrade process complete.
OneFS 8.2.0 enables the pausing and resuming of a OneFS upgrade. Pause and
resume are useful when the maintenance window ends. The upgrade can be
paused, and then resumed in a later window. The commands are isi upgrade
pause and isi upgrade resume.
Pausing is not immediate. Upgrade is in a pausing state until the current upgrading
node completes. Other nodes do not upgrade until the upgrade is resumed.
Pausing state can be viewed with isi upgrade view, or
isi_upgrade_status, or viewing the pause file data.
Rolling Reboot
In OneFS 8.2.0, a rolling reboot can be initiated from the CLI on an arbitrary set of
cluster nodes using the upgrade framework. The rolling reboot functionality
provides better visibility of the process, and access to relevant logging the upgrade
framework provides. Use the isi upgrade rolling-reboot command. The
isi upgrade view command shows the node that is rebooting. The graphic
shows node LNN 2 rebooting.
If the OneFS upgrade does not progress after 15 minutes, the upgrade framework
sends a notification. A WebUI critical alert is also generated. With the prolonged
nature of an upgrade on a large cluster, upgrade stalls or an upgrade that is not
progressing, is easily overlooked.
Shown is an example of the event message and the Critical Events section of the
isi status command output.
In OneFS 8.2.0, the OneFS upgrade can include a patch install at post reboot. The
release enables administrators to view the behavior of the updated OneFS and
patch before committing the upgrade. OneFS 8.2.0 only supports the install of one
patch.
The command example upgrades OneFS and applies the patch after the node
reboots. The node may require a second reboot depending on the specific patch
requirements.
Considerations
The non-disruptive features that are enabled for rolling upgrades extend to patches
and firmware updates as well. The intent is to eliminate maintenance disruptions
wherever possible. If reboots or service restarts are required, they can be
controlled, monitored, and performed in a rolling fashion to minimize any disruption.
Also, new features are enabled to support protocols, such as improving handling of
connection transition from one node to the next.
All recommended patches, and any other patches that could affect the workflow,
should be installed. There are two types of patches, a standard patch and a rollup
patch. A standard patch addresses known issues for a major, minor, or MR release
of OneFS. Some patches contain minor enhancements or more logging
functionality that can help Isilon Technical Support troubleshoot issues with your
cluster. Rollup patches address multiple issues that are related to one component
of OneFS functionality, such as SMB. It might also contain fixes from previous,
standard patches that addressed issues that are related to that component.
Similar to OneFS upgrades, firmware updates and even some patches may require
services to go down across the cluster and cause outages. Due to these
interruptions, the recommendation is to stay current with the latest patch and
firmware updates.
Upgrade Logs
Challenge
Introduction
Scenario
Obsolete drive firmware can affect the cluster performance or hardware reliability.
To ensure overall data integrity, you may update the drive firmware to the latest
revision by installing the drive support package or the drive firmware package.
Upgrading drive firmware can be divided into four categories, viewing the firmware
status, getting the package from the support page, updating the drive firmware, and
verification. The recommendation is to contact Isilon Technical Support before
updating the drive firmware.
To determine if you need a firmware update, view the status of the drive firmware
on the cluster. If the Desired field in the command output shows empty, a firmware
update is not required.
Update the firmware using the command shown. If updating specific nodes using
the node-number instead of all, wait for the node to finish updating before
initiating an update on the next node.
For the final step, ensure that no updates are in progress and then confirm. Run
isi devices drive list --node-lnn all to verify that all drives are
operating in a healthy state.
When replacing a drive in a node, OneFS automatically formats the drive, adds it to
the cluster, and updates the drive firmware. The new drive firmware matches the
current drive support package that is installed on the cluster. The drive firmware is
not updated for the entire cluster, only for the new drive. If you prefer to format and
add drives manually, disable Automatic Replacement Recognition.
OneFS 8.2.0 can upgrade node firmware on multiple nodes simultaneously using
the CLI. The list of node LNNs is used to upgrade node firmware. Shown is the
high-level workflow. The isi_upgrade_helper tool generates the recommended
series of simultaneous firmware upgrade commands.
isi_upgrade_helper
isi_upgrade_helper Default
Only run a simultaneous firmware upgrade with remote consultation. Perform the
initial simultaneous firmware upgrade on a subset of nodes before moving onto the
remainder of the cluster. The isi_upgrade_helper logs to
/var/log/isi_upgrade_helper.log on node where tool is run. Add the --
debug flag for more logging data. Nodes that are not selected for simultaneous
firmware upgrade quickly proceed through Committed > Upgrade Ready >
Committed states. Nodes that are selected for simultaneous firmware upgrade
proceed through the Committed > non-responsive (Rebooting) > Upgrade
Ready > Committed states are slower.
Challenge
Introduction
Scenario
InsightIQ Overview
InsightIQ Resources
Adding a Cluster
Adding clusters to monitor is done on the Settings > Monitored Clusters page.
After clicking Add Cluster, you can enter the information in the Add Cluster dialog
box. In the User name field, enter insightiq. In the Password box, type the local
InsightIQ user password exactly as it is configured on the monitored cluster, and
then click OK. InsightIQ begins monitoring the cluster.
InsightIQ Dashboard
Shown here is the dashboard page that you see after logging in. There are five
tabs to view data and configure settings. The DASHBOARD provides an
aggregated cluster overview and a cluster-by-cluster overview. This graphic shows
that InsightIQ is configured and monitoring three clusters. The view can be
modified to represent any time period where InsightIQ has collected data. Also,
breakouts and filters can be applied to the data. In the Aggregated Cluster
Overview section, you can view the status of all monitored clusters as a whole.
There is a list of all the clusters and nodes that are monitored. Total capacity, data
usage, and remaining capacity are shown. Overall health of the clusters is
displayed. There are graphical and numeral indicators for connected clients, active
clients, network throughput, file system throughput, and average CPU usage.
There is also an expandable cluster-by-cluster overview section.
Depending on the chart type, pre set filters enable you to view specific data. For
example, In/Out displays data by inbound traffic compare with outbound traffic.
You can also view data by file access protocol, individual node, disk, network
interface, and individual file or directory name. If displaying the data by the client
only, the most active clients are represented in the displayed data. Displaying data
by event can include an individual file system event, such as read, write, or lookup.
Filtering by operation class displays data by the type of operation being performed.
If FSA is enabled, you can view data by when a file was last accessed, by when a
file was last modified, by the size of files in each disk pool, and by file extension.
You can also view data by a user-defined attribute. To view, you must first define
the attributes through the CLI. Viewing data by logical file size includes only data
and does not include data-protection overhead, whereas physical file size
calculations include data-protection overhead.
Capacity Analysis
The InsightIQ dashboard includes a capacity analysis pie chart. The estimate of
usable capacity is based on the existing ratio of user data to overhead. There is an
assumption that data usage factors remain fairly constant over more use. If a
customer has been using the Isilon cluster for many small files and, then wants to
add some large files, the result is not precisely what the system predicts.
Default Reports
You can monitor clusters through customizable reports that display detailed cluster
data over specific periods of time. InsightIQ enables you to view two general types
of reports, performance reports and file system reports. Performance reports have
information about cluster activity and capacity. For example, if you want to
determine whether clusters are performing as expected or if you want to investigate
the specific cause of a performance issue, the reports are useful. File system
reports include data about the files that are stored on a cluster. The reports can be
useful if, for example, you want to identify the types of data being stored and where
on a cluster that data is stored. Before applying a file system report,
enableInsightIQ File System Analytics for that cluster.
InsightIQ supports live versions of reports that are available through the InsightIQ
web application. You can create live versions of both performance and file system
reports. You can modify certain attributes as you view the reports, including the
time period, breakouts, and filters.
The administrator can drill down to file system reporting to get a capacity reporting
interface that displays more detail about usage, overhead and anticipated capacity.
The administrator can select cluster information and use that as a typical usage
profile to estimate when the cluster reaches 90% full. The information is useful for
planning upgrades ahead of time to avoid delays around procurement and order
fulfillment.
Shown is Capacity Forecast, displaying the amount data that can be added to the
cluster before the cluster reaches capacity. The Plot data shows the granularity of
the reporting available. The Forecast data shows the breakout of information that
is shown in the forecast chart. Depending on the frequency and amount of
variation, outliers can have a major impact on the accuracy of the forecast usage
data.
FSA provides detailed information about files and directories on an Isilon cluster.
Unlike InsightIQ datasets, which are stored in the InsightIQ datastore, FSA result
sets are stored on the monitored cluster in the /ifs/.ifsvar/modules/fsa directory. The
monitored cluster routinely deletes result sets to save storage capacity. You can
manage result sets by specifying the maximum number of result sets that are
retained. The Job Engine runs the FSAnalyze job daily. The job collects information
across the cluster, such as the number of files per location or path, the file sizes,
and the directory activity tracking. InsightIQ collects the FSA data from the cluster
for display to the administrator.
Enable FSA
Before you can view and analyze data usage and properties through InsightIQ, you
must enable the FSA feature. Open the Monitored Clusters page by clicking
Settings > Monitored Clusters. In the Actions column for the cluster that you
want to enable or disable FSA, click Configure. The Configuration page displays.
Click the Enable FSA tab. To enable the FSA job, select Generate FSA reports
on the monitored cluster. To enable InsightIQ for FSA report, select View FSA
reports in InsightIQ.
Considerations
InsightIQ 4.x supports all versions of OneFS from 7.0 and later, including Isilon SD
Edge.
By default, web browsers connect to InsightIQ over HTTPS or HTTP using port 443
for HTTPS and port 80 for HTTP. A revert to a snapshot or modifications of the
InsightIQ datastore can cause datastore corruption. The maximum number of
clusters that you can simultaneously monitor is based on the system resources
available to the Linux computer or virtual machine. It is recommended that you
monitor no more than 8 storage clusters or 150 nodes with a single instance of
InsightIQ.
Troubleshooting Resources
Challenge
Introduction
Scenario
The three main commands that enable you to view the cluster from the command
line are isi status, isi devices, and isi statistics. The isi status
command displays information about the current status of the cluster, alerts, and
jobs. The isi devices command displays information about devices in the
cluster and changes their status. There are multiple actions available including
adding drives and nodes to your cluster. The isi statistics command has
approximately 1,500 combinations of data you can display as statistical output of
cluster operations.
The isi statistics command provides cluster and node statistics. The
statistics that are collected are stored in an sqlite3 database that is under the /ifs
folder on the cluster. Also, other services such as InsightIQ, the WebUI, and SNMP
gather needed information using the isi statistics command. The isi
statistics command enables you to view cluster throughput based on
connection type, protocol type, and open files per node. You can also use this
information to troubleshoot your cluster as needed. In the background,
isi_stats_d is the daemon that performs much of the data collection. To get
more information, run man isi statistics from any node. To display usage,
run isi statistics --help.
The isi statistics command can list over 1500 stats, dumps all collected
stats, and is useful when you want to run the query subcommand on a specific
statistic. You can build a custom isi statistics query that is not in the
provided subcommands.
You can use the isi statistics command within a cron job to gather raw
statistics over a specified time period. A cron job can run on UNIX-based systems
to schedule periodic jobs. Since cron works differently on an Isilon cluster compare
with a UNIX machine, contact support before configuring cron jobs.
The command that is shown gives you the general cluster statistics showing the
most active nodes on top, and the output refreshes every two seconds. Data
breaks down by protocol and interface. If you would like a result that is sorted by
node number, one option is to run the command while true ; do isi
statistics system --nodes all | sort -n ; sleep 2 ; done.
The example output shows the isi statistics drive command, using
isi_for_array to examine all the nodes on the cluster. The head -5 option
displays the most active results on each node. Each line shows the node providing
the data, and each node displays the top three drives and what levels of activity
they are displaying. The output is useful to identify an imbalanced load across the
cluster. The drive subcommand makes each node report where its busiest drives
are and what their levels of activity are.
Practical Skills
Challenge
Lesson - SRS
Introduction
Scenario
SRS Overview
SRS Architecture
Communications between the customer site and Dell Technologies support flow
over an encrypted HTTPS connection, which means that sensitive information does
not traverse the internet unprotected. SRS can be configured for redundancy with
more than one SRS instance installed, allowing reports through SRS in the event of
hardware or partial data environment failure. On the Dell Technologies support
SRS has improved over the years, just as OneFS has. The SRS installation is a
service provided by Dell Technologies staff. Presently, the configuration and
installation is not open for customers to perform. A dedicated virtual machine runs
the SRS gateway software. This eliminates dependency on a product or operating
system, such as Windows. SRS treats each node as a separate device, and each
node is connected to SRS individually. The cluster is not monitored as a whole.
SRS can operate through different subnets. By crafting the right set of subnets, a
storage administrator can address any set of network interfaces on any set of Isilon
cluster nodes.
Isilon logs, even compressed, can be many gigabytes of data. There are ways of
reducing the log burden, such as gathering incremental logs rather than complete
log records or selecting specific logs to gather. Even so, logs on Isilon tend to be
large. Uploading logs may require a lot of bandwidth and could take some time with
the risk of timeouts and restarts. The support scripts are based on the
isi_gather_info tool. The remote support scripts are located in the
ifs/data/Isilon_Support/ directory on each node. The scripts can be run
automatically to collect information about the cluster configuration settings and
operations. SRS uploads the information to a secure Isilon FTP site, so that it is
available for Technical Support personnel to analyze. The remote support scripts
do not affect cluster services or the availability of your data.
NANON clusters are clusters where not all the nodes are on the network. This can
be a deliberate design choice for a number of reasons. CELOG alerts that go
through an SRS channel are always directed through a network connected node.
SRS can also perform a log gather for the whole cluster through a connected node,
rather than having to reach each node individually. This way the connected node
acts as a proxy for the inaccessible nodes, but it does not allow SRS to only reach
disconnected nodes. SRS recognizes each node as a separate device and has no
unified concept of the cluster. The cluster is not semantically accessible to SRS as
a service.
Challenge
Module Summary