Beruflich Dokumente
Kultur Dokumente
Hardware Overview
The building block of XtremIO is an X-Brick.
XtremIO X-Brick
An X-Brick is composed of 2-storage controllers, a DAE holding 25xSSDs and two
battery backup units. Each X-Brick at GA will utilize 25x400GB eMLC SSDs and provide
10TB of raw capacity.
The array scales by adding X-Bricks in a scale-out manner (max of 4 X-Bricks at GA):
X-Brick Scaleout
The interconnect between all X-Bricks is 40Gbps Infiniband (similar to Isilon, except
Isilon uses 10Gbps Infiniband). If we think of Isilon as scale out NAS, XtremIO
represents the scale out all flash block category.
XtremIO Node
Connectivity Ports and Node details:
For front-end host connectivity XtremIO supports both 8Gb Fiber Channel and
10Gb iSCSI.
Connectivity to the disks are via 6Gbps SAS connections (very similar to a VNX).
Each node contains 2x SSD to serve as a dump area for metadata if the node
were to lose power.
Each node also contains 2x SAS drives to house the operating system. In this
way, the disks in the DAE themselves are decoupled from the controllers since
they only hold data, and this should facilitate easy controller upgrades in the
future when better/faster hardware becomes available.
Software Features
Inline Deduplication. Part of the system architecture and lowers effective cost
while increasing performance and reliability by reducing write amplification.
Thin Provisioning. Also part of the system architecture in how writes are managed
and carry no performance penalty.
XDP Data Protection. RAID6 designed for all flash arrays. Low overhead with
better than RAID1 performance. No hot spares.
Architecture
XtremIO Software Architecture
The second thing to note is that the D-module is free to write the data anywhere it sees
fit since there is no coupling between a host disk address and back-end SSD address
thanks to the A2H and H2P tables. This is further optimized since the content of the data
becomes the address for lookups since ultimately the Hash Value is what determines
physical disk location via the H2P table. This gives XtremIO tremendous flexibility in
performing data management and optimizing writes for SSD.
Module Communications
With an understanding of the relationship between the R,C,D modules and their
functions, the next thing to look it as how exactly they communicate with each other.
system with multiple CPU sockets, the performance will be better when utilizing the local
CPU socket to which the PCIe adapter is connected. The R,C,D module distribution was
based on optimizing the configuration based on field testing. For example the SAS card
is connected to a PCIe slot which is connected to CPU socket 2. Thus the D-Module
runs out of socket 2 to optimize the SSD I/O performance. This is a great example of
where while a software storage stack like XtremIO is hardware independent and could
be delivered as a software only product, there are optimizations for the underlying
hardware which must be taken into consideration. The value of understanding the
underlying hardware goes not only for XtremIO but all storage stacks. These are the
types of things you do NOT want to leave to chance or for an end-user to make decisions
on. Never confuse hardware independence with hardware knowledge & optimization
there is great value in the later. The great thing about the XIOS architecture is that since
it is hardware independent and modular, as the hardware architecture improves XIOS
can easily take advantage of it.
Moving on to the communication mechanism between the modules, we can see that no
preference is given to locality of modules. Meaning, when the R-module selects a Cmodule, it does not prefer the C-module local to itself. All communications between the
modules are done via RDMA or RPC (depending on if its a data path or control path
communication) over Infiniband. The total budget for IO in an XtremIO system is 600700uS and the overhead by Infiniband communication 7uS-16uS. The result of this
design is that as the system scales, the latency does NOT increase. Weather there is 1
X-Brick or 4 X-Bricks or more in future, the latency for IO remains the same since the
communication path is identical. The C-module selection by the R-module is done
utilizing the same calculated data hashes and this ensures a complete random
distribution of module selection across the system, and this is done for each 4K block.
For example if there are 8 controllers in the cluster with 8x R,C,D modules there is
communication happening between all of them evenly. In this way, every corner of the
XtremIO box is exercised evenly and uniformly with no hot spots. Everything is very
linear, deterministic and predictable. If a node fails, the performance degradation can be
predicted, the same as the performance gain when adding node(s) to the system.
XDP (XtremIO Data Protection)
A critical component of the XtremIO system is how it does data protection. RAID5?
RAID6? RAID 10? None of the above. It uses a data protection scheme called XDP
which can be broadly thought of as RAID6 for all flash arrays meaning it provides
double parity protection but without any of the penalties associated with typical RAID6.
The issue with traditional RAID6 applied to SSDs is that as random I/O comes into the
array forcing updates/overwrites, the 4K block(s) need to be updated in place on the
RAID stripe and this causes massive amounts of write amplification. This is exactly the
situation we want to avoid. For example: in a RAID6 stripe if we want to update a single
4K block we have to read that 4K block plus 2 4K parity blocks (3 reads) and then
calculate new parity and write the new 4K block and two new 4K parity blocks (3 writes)
hence for every 1 write front-end write I/O we have 3 back-end write I/Os giving us a
write amplification of 300% or said another way a 3x overhead per front-end write. The
solution to this problem is to never do in place updates of 4K blocks and this is the
foundation of XDP. Because there is an additional layer of indirection via the A2H and
H2P tables XtremIO has complete freedom (within reason) on where to place the
physical block despite the application updating the same address. If an application
updates the same address with different 4K content, a new hash will be calculated and
thus the 4K block will be put in a different location. In this way, XtremIO can avoid any
update in place operations. This is the power of content aware addressing where the
data is the address. It should also be noted that being able to write to data anywhere is
not enough by itself it is this coupled with flash that makes this architecture feasible
since flash is a random access media that has no latency penalty for random I/O unlike a
HDD with physical heads. The previously described process is illustrated below.
Configuration Tab:
Hardware Tab:
Event Tab:
Monitor Tab:
Administration Tab:
Storage Provisioning:
Note: For Provisioning Storage First the Host needs to be zoned
with Storage array.
Steps involved in storage provisioning:
1.
2.
3.
4.
Step 2:
Click on the Add volume (Highlighted in red).
Step 3:
Click on Add Multiple (Highlighted in red).
Step 4:
Specify volume name and size.
Step 5:
Create a New Folder where pervious created Volumes will be kept.
Step 6:
We can see paras folder and Paras_xio_01 volumes created.
Step 7:
For creating Initiator group, click on add and select the PWWN.
Step 8:
Give Parent folder for Initiator group.
Step 9:
Click on the volumes and Initiator group for creating a Making view.
Step 10:
Click on Map All and then click apply. Storage is visible to host.
Step 2:
Click on rescan new for identifying new devices connected to ESXi
host.