Sie sind auf Seite 1von 2

SDT1

Understanding DBMS IO Performance


In this exercise, we will use fio the flexible IO tester tool developed by Jens Axboe. This tool is
now available by default in most Linux distributions. There are packages for Windows and other OS
(see the README). On MacOS, you will need to get the source code (either via git or via a tar.bz2
snapshot -- see the source section of the README), then build fio (simply with make and make
install executed as root see the building section of the README).

1. A Quick Introduction to FIO


fio is both a simple and a very powerful tool. See the HOWTO for an introduction and some
context, and the README for a description of the command line arguments and of the
configuration file format.
In a nutshell, fio executes jobs that submit IOs. For each job it is possible to give a set of parameters
that define the IOs that it submits. There are a set of example job definitions in the examples
directory of the fio distribution.
I prepared a couple of job definitions in the files named seqWrites.fio and randReads.fio.
Our goal is to understand the IO patterns submitted by a database system. In this respect, the key
parameters of a job definition that we are interested in are the following (see HOWTO for a detailed
list of parameters):
We focus on direct IOs (that bypass the file system cache).
o direct=1
We focus on asynchronous IOs (now the default on Linux)
o ioengine=libaio (on linux)
o ioengine=posixaio (on os X)
o ioengine=windowsaio (on Windows)
o Note that the synchronous read and writes primitives require ioengine=sync
For data, the size of the blocks written to disk is equal to page size (4,8,16,32k) the same
size is used for reads and writes
o bs=4k,4k
For log records, 4k pages are the unit of transfer (regardless of the log buffer size)
o bs=4k,4k

2. Understanding IO Performance
Your goal with this assignment is to get some insight into IO performance within the Virtual
Machine instance we will use for the assignments.
Overall, you should address the following questions:
1. How fast can your host system perform sequential writes and random reads?
2. How fast can your VM instance perform the same sequential writes and random reads?
3. How do the fio paramters impact sequential write and random reads?
4. How do the VM virtual storage paramters impact sequential writes and random reads?
More specifically, you should
Create a new virtual disk. You should simply add a new disk (let us call it newDisk1.vdi) to the
SATA controller that already contains the system image (DB2-10.1-students.vdi). You should
choose a vdi format for that disk. You might want to experiment with two options (fixed-size
image vs. dynamically allocated image) to investigate question 4. Note that if your host disk is a
SSD you should check the SSD box when creating the disk. You should also make sure that host
IO caching is disabled on the VirtualBox SATA controller.

Once the virtual disk is created, you can mount it from Linux. Log in as student (see
GettingStarted for the default password). On my set up the new virtual disk showed up as a new
device named /dev/sdb . To use this device, we need to partition it (sudo fdisk /dev/sdb
we now have one partition /dev/sdb1) , and format it (sudo mkfs t ext4 /dev/sdb1). You
might want to experiment with different file systems to investigate question 4. Now, you need to
mount this formatted partition onto the file system. Create a data directory under /mnt (sudo
mkdir /mnt/data) and mount it (sudo mount /dev/sdb1 /mnt/data). You can now change
the access rights of /mnt/data so that you can access it from the student account (sudo chown
student /mnt/data followed by sudo chgrp student /mnt/data).
Work with the fio scripts inside the VM Linux guest:
cd YOUR_WORKING_DIR
wget http://www.itu.dk/courses/SDT1/F2013/FIO/randReads.fio
wget http://www.itu.dk/courses/SDT1/F2013/FIO/seqWrites.fio
You can now simply run the scripts and find out about the default performance of random reads
and sequential writes (i.e., question 2.). Before your execute the following scripts, take a couple
of minutes to think about what you expect the average throughput to be for sequential writes and
random reads
fio randReads.fio
fio seqWrites.fio
The output of fio is both clear and detailled it is explained in the HOWTO.
Now you can vary some of the fio paramters to evaluate their impact and study question 3 (e.g.,
iodepth i.e., the number of outstanding asynchronous IO requests that fio maintains as it
submits Ios; numjobs i.e., the number of concurrent jobs submitting the same IOs; bs the
size of the blocks written or read from disk).
From the host, you should install fio and run the fio scripts you ran on the Linux guest. What do
you expect? What do you observe?

Das könnte Ihnen auch gefallen