Sie sind auf Seite 1von 2

EVA 4400 - need help with performance

issues in VMware
[ Edited ]

Hi,

we currently have the following EVA 4400 configuration which is used for both a DWH system
and our VMware vSphere envrionment:

- EVA 4400 (09534000), 4 enclosures, 32x 10k FC 300GB disks, 1 disk group
- attached to two independet fabrics (2x HP StorageWorks 8/8 SAN switches)

The EVA is accessed by the following hosts:

- DWH: 2x DL380 G6 (W2K8 SP2, 1x QLogic FC1242SR)

Each server accesses two Vdisks on the EVA, one RAID5 for MSSQL data, one RAID1 for
MSSQL transaction logs. All four Vdisks are presented to Controller 1 and there are four paths to
each LUN. However, the servers are told via HP MPIO DSM Manager to only use Controller 1.

- VMware: 3x DL380 G6 (ESXi 4.1, 2x QLogic FC1142SR)

The ESXi servers access four RAID5 Vdisks. All Vdisks are presented to Controller 2 and there
are four paths to each LUN. The path policy on the ESXi hosts is set to Round Robin which, as
of ALUA, only chooses the two paths to Controller 2 as active paths. Each vDisk holds 6-8
virtual machines (mainly Windows), overall 27.

From time to time we are having troubles with slow response times of virtual machines. In fact,
this happens everytime the DWH servers are generating - what looks to me - heavy load on the
EVA. As I am no expert in debugging storage performance, I am not sure whether it really is
heavy load, but at least when looking at EVAperf during thoses times I can see the following:

http://www.abload.de/image.php?img=evaperf_12o78.png
http://www.abload.de/image.php?img=evaperf_2qo46.png

As you can see, compared to the VMware Vdisks there is much I/O on one DWH Vdisk. One the
second screen you can see that there is much load on Controller 1 which is serving the DWH
Vdisks. Controller 2, which is used for VMware vDisks only, is nearly idle compared to
Controller 1.

Even though that Controller 1 is nearly idle and that there is not much I/O happening on the
VMware Vdisks, all VMs feel extremely sluggish during these times. I.e. when working via RDP
on a Windows VM, it feels like you were working on a computer that has a virus scanner running
and therefore is slowing the hard disk down - opening the control panel for example takes
seconds and you can watch every single icon appear slowly. The impact on end user applications
running in these VMs is noticable (though not in all cases) but not a critical issue so far.

The problem affects all VMs that are stored on the EVA. VMs running on the same hosts but
stored on an MSA2312fc are not affected and continue to run fine. The problem disappears
immediately when the I/O on the DWH Vdisks lowers.

During the problem case, the disk latency of an ESXi host alternates between 10 and 50ms and is
higher than usual (between 10 and 15ms):

http://www.abload.de/image.php?img=esx_1zhd2.png

Is there a way to debug this deeper to find out what exactly is limiting here? As I have written
above, I am no expert in measuring and analyzing storage performance. But according to my
tests the problems are definitely caused by the storage system.

Maybe 32 spindles are just to less to serve 27 (even though low utilized) VMs and one fully
loaded DWH system?

Any suggestions are highly appreciated!


Sam

Das könnte Ihnen auch gefallen