Sie sind auf Seite 1von 26

Improving Performance of a Distributed File System Using OSDs and Cooperative Cache

Submitted By: Parvez Gupta Varenya Agrawal

Introduction
This work describes a cooperative cache algorithm used in zFS and explores the effectiveness of this algorithm and of zFS as a file system This is done by comparing the systems performance to NFS using the IOZONE benchmark

Results show that :


zFS performs better than NFS when cooperative cache is activated

Using pre-fetching in zFS also increases performance significantly

zFS
It is a distributed file system that uses Object Store Devices (OSD) and a set of cooperating machines The objectives of zFS design are :

Achieving a scalable file system

Built from off-the-shelf components


Make use of the memory of all participating machines Linear increase in performance with each added machine Separation of storage management from file management

The Architecture
zFS has six components : Front End (FE) Cooperative Cache (Cache) File Manager (FMGR) Lease Manager (LMGR) Transaction Server (TSVR) Object Store (ObS)

The Components
Object Store
It is the storage device on which files and directories are created and from where they are retrieved It handles the physical disk chores of block allocation and mapping ObS API enables creation and deletion of objects (files)

Front End
Runs on every workstation on which client wants to use zFS Provides access to zFS files and directories

Lease Manager
Leases are used to maintain data integrity in zFS They have an expiration period that is set in advance Each ObS has one lease manager which acquires the major lease It grants exclusive leases on objects residing on the ObS

File Manager
Each zFS file is managed by a single file manager It obtains the exclusive lease from the lease manager It keeps track of each accomplished open() and read() request

Cooperative Cache
Due to fast network connections, it takes lesser time to retrieve data from another machines memory than from a local disk

Transaction Server
Each directory operation is protected inside a transaction It helps maintain consistency of the file-system Acquires all required leases and holds onto them for as long as it can

The Cooperative Cache


It is integrated with the Linux kernel cache as :
OS does not require two separate caches with different policies that may interfere This provides comparable local performance between zFS and other local file systems in Linux

As a result of above, following is achieved :


Kernel evokes page eviction when available memory is low Caching is done per page basis-not on whole files Pages of zFS and other file systems are treated equally Pages remain in cache until memory pressure causes kernel to discard them When eviction is invoked and a zFS page is the candidate then decision is passed to a zFS routine

Cooperative cache algorithm


A page in cooperative cache is either singlet or replicated When a client wants to open a file for reading :
The local cache is checked for the page In case of a cache miss, zFS requests the page and its lease from the file manager The file manager checks if the requested pages are already present in another machine's memory in the network If not, zFS grants the leases to the client, which in turn reads the pages from the OSD directly marking each page as a singlet If the pages requested reside in the memory of some other node B, it sends a message to B to send the pages and leases to A Both A and B mark the pages as replicated. Node B is called a third-party node

Cooperative cache algorithm


When memory becomes scarce , kernel invokes page eviction

page is a replicated page is a singlet, the page is forwarded to another node using the following

steps :

1.

A message is sent to the zFS file manager indicating that the page is sent to another machine B, the node with the largest free memory known to A

2.
3.

The page is forwarded to B


The page is discarded from the page cache of A

Cooperative cache algorithm


Effects of Node Failure and Network Delays Node Failure : acceptable for the file manager to assume existence of pages on nodes unacceptable to have pages on nodes, where the file manager is unaware Thus order of steps for forwarding singlet page is important
1. 2. 3. Node failure before step 1 - The file manager will eventually detect this and update its data Node failure after step 1 - The file manager is informed that the page is on B although it is not true. Same situation as 1 Failure after step 2 - does not pose any problem

Cooperative cache algorithm


Network Delays : Case 1 :
A replicated page residing on nodes M and N is discarded from M
zFS file manager sends a singlet message to N

Due to network delay, this message reaches N after memory pressure developed on N and it
discarded the page as it was marked replicated

Cooperative cache algorithm


Case 2 :
A page has not arrived on N and a singlet message arrived and was ignored. N sent
a reject message when asked to forward the page No problem if the page never arrives However, if the page arrives after the reject message is sent, it causes inconsistency

Cooperative cache algorithm


Case 3 :

Cooperative cache algorithm


Case 4 :

Page was moved from N to M to O where its recirculation


exceeded its limit

count

O sends a release_lease message which arrives before move notification

Choosing proper third party node


zFS FMGR uses enhanced round robin method For each page range granted to node N, FMGR records time t(N) For every request the FMGR scans all nodes holding the page range For each selected node Ni, the FMGR checks if currentTime -t(Ni) > C. This checks whether enough time passed for the pages granted to Ni to reach it If true, Ni is marked as potential provider; next node is checked Among the marked nodes, the node with largest range Nmax is chosen For the next request, FMGR starts scan from node Nmax+1

Pre-fetching data in zFS


Overhead for transmitting a data block over a network is composed of two parts :
The network setup overhead The transmission time of the data block

It is more efficient to transmit k pages in one message rather than transmitting them in a separate message
Researchers tested the time it takes to transmit a file of N pages in chunks of 1...k pages in one message Best results were achieved for k=4 and k=8 Similar performance was achieved by zFS pre-fetching mechanism

zFS Testing environment


The Server PC ran an OSD simulator Another PC ran the Lease Manager, File Manager and Transaction Manager Four PCs ran the zFS front-end

NFS Testing environment


The Server PC ran an NFS server with eight NFS daemons (nfsd) Four PCs ran the NFS clients

Methodology Used
IOZONE benchmark tool was used to compare zFS performance to that of NFS NFS does not carry out pre-fetching so to make up for this, IOZONE was configured to read the NFS mounted file using record sizes of n=1,4,8,16 pages zFS mounted files were read with record size of one page but with prefetching parameter R=1,4,8,16 pages

Comparing zFS and NFS


Two scenarios were investigated during testing : file size smaller than the server's cache and all the data resided in the servers cache The file size much larger than the size of the servers cache

Results for scenario I

Results for scenario II

Observations
The performance of NFS was almost the same for different block sizes But its performance is almost four times better when the file fits entirely in the memory The performance of zFS with cooperative cache is much better than NFS When cooperative cache was deactivated, different behaviors were observed for different range of pages

Observations
The performance of zFS for R=1 is lower than that of NFS For larger ranges, the performance of zFS was slightly better than that of NFS due to pre-fetching When cooperative cache is used, zFS performance is significantly better than NFS Performance with cooperative cache is lower in second case due to memory pressure and discarded pages generating reject messages

Conclusion
The results show that using the cache of all the clients as one cooperative cache gives better performance as compared to NFS as well as the case when cooperative cache is not used The results also show that using pre-fetching with ranges of four and eight pages results in much better performance

Das könnte Ihnen auch gefallen