ZFS File System Project

Improving Performance of a Distributed File System Using OSDs and Cooperative Cache
Submitted By: Parvez Gupta Varenya Agrawal
Introduction
This work describes a cooperative cache algorithm used in zFS and explores the effectiveness of this algorithm and of zFS as a file system This is done by comparing the systems performance to NFS using the IOZONE benchmark
Results show that :

zFS performs better than NFS when cooperative cache is activated
Using pre-fetching in zFS also increases performance significantly
zFS
It is a distributed file system that uses Object Store Devices (OSD) and a set of cooperating machines The objectives of zFS design are :
Achieving a scalable file system
Built from off-the-shelf components

Make use of the memory of all participating machines Linear increase in performance with each added machine Separation of storage management from file management
The Architecture
zFS has six components : Front End (FE) Cooperative Cache (Cache) File Manager (FMGR) Lease Manager (LMGR) Transaction Server (TSVR) Object Store (ObS)
The Components
Object Store
It is the storage device on which files and directories are created and from where they are retrieved It handles the physical disk chores of block allocation and mapping ObS API enables creation and deletion of objects (files)
Front End
Runs on every workstation on which client wants to use zFS Provides access to zFS files and directories
Lease Manager
Leases are used to maintain data integrity in zFS They have an expiration period that is set in advance Each ObS has one lease manager which acquires the major lease It grants exclusive leases on objects residing on the ObS
File Manager
Each zFS file is managed by a single file manager It obtains the exclusive lease from the lease manager It keeps track of each accomplished open() and read() request
Cooperative Cache
Due to fast network connections, it takes lesser time to retrieve data from another machines memory than from a local disk
Transaction Server
Each directory operation is protected inside a transaction It helps maintain consistency of the file-system Acquires all required leases and holds onto them for as long as it can
The Cooperative Cache

It is integrated with the Linux kernel cache as :
OS does not require two separate caches with different policies that may interfere This provides comparable local performance between zFS and other local file systems in Linux
As a result of above, following is achieved :

Kernel evokes page eviction when available memory is low Caching is done per page basis-not on whole files Pages of zFS and other file systems are treated equally Pages remain in cache until memory pressure causes kernel to discard them When eviction is invoked and a zFS page is the candidate then decision is passed to a zFS routine
Cooperative cache algorithm

A page in cooperative cache is either singlet or replicated When a client wants to open a file for reading :
The local cache is checked for the page In case of a cache miss, zFS requests the page and its lease from the file manager The file manager checks if the requested pages are already present in another machine's memory in the network If not, zFS grants the leases to the client, which in turn reads the pages from the OSD directly marking each page as a singlet If the pages requested reside in the memory of some other node B, it sends a message to B to send the pages and leases to A Both A and B mark the pages as replicated. Node B is called a third-party node

When memory becomes scarce , kernel invokes page eviction
page is a replicated page is a singlet, the page is forwarded to another node using the following
steps :
1.
A message is sent to the zFS file manager indicating that the page is sent to another machine B, the node with the largest free memory known to A
2.
3.
The page is forwarded to B

The page is discarded from the page cache of A

Effects of Node Failure and Network Delays Node Failure : acceptable for the file manager to assume existence of pages on nodes unacceptable to have pages on nodes, where the file manager is unaware Thus order of steps for forwarding singlet page is important
1. 2. 3. Node failure before step 1 - The file manager will eventually detect this and update its data Node failure after step 1 - The file manager is informed that the page is on B although it is not true. Same situation as 1 Failure after step 2 - does not pose any problem

Network Delays : Case 1 :
A replicated page residing on nodes M and N is discarded from M
zFS file manager sends a singlet message to N
Due to network delay, this message reaches N after memory pressure developed on N and it
discarded the page as it was marked replicated

Case 2 :
A page has not arrived on N and a singlet message arrived and was ignored. N sent
a reject message when asked to forward the page No problem if the page never arrives However, if the page arrives after the reject message is sent, it causes inconsistency

Case 3 :

Case 4 :
Page was moved from N to M to O where its recirculation

exceeded its limit
count
O sends a release_lease message which arrives before move notification
Choosing proper third party node

zFS FMGR uses enhanced round robin method For each page range granted to node N, FMGR records time t(N) For every request the FMGR scans all nodes holding the page range For each selected node Ni, the FMGR checks if currentTime -t(Ni) > C. This checks whether enough time passed for the pages granted to Ni to reach it If true, Ni is marked as potential provider; next node is checked Among the marked nodes, the node with largest range Nmax is chosen For the next request, FMGR starts scan from node Nmax+1
Pre-fetching data in zFS

Overhead for transmitting a data block over a network is composed of two parts :
The network setup overhead The transmission time of the data block
It is more efficient to transmit k pages in one message rather than transmitting them in a separate message
Researchers tested the time it takes to transmit a file of N pages in chunks of 1...k pages in one message Best results were achieved for k=4 and k=8 Similar performance was achieved by zFS pre-fetching mechanism
zFS Testing environment

The Server PC ran an OSD simulator Another PC ran the Lease Manager, File Manager and Transaction Manager Four PCs ran the zFS front-end
NFS Testing environment

The Server PC ran an NFS server with eight NFS daemons (nfsd) Four PCs ran the NFS clients
Methodology Used
IOZONE benchmark tool was used to compare zFS performance to that of NFS NFS does not carry out pre-fetching so to make up for this, IOZONE was configured to read the NFS mounted file using record sizes of n=1,4,8,16 pages zFS mounted files were read with record size of one page but with prefetching parameter R=1,4,8,16 pages
Comparing zFS and NFS

Two scenarios were investigated during testing : file size smaller than the server's cache and all the data resided in the servers cache The file size much larger than the size of the servers cache
Results for scenario I
Results for scenario II
Observations
The performance of NFS was almost the same for different block sizes But its performance is almost four times better when the file fits entirely in the memory The performance of zFS with cooperative cache is much better than NFS When cooperative cache was deactivated, different behaviors were observed for different range of pages
Observations
The performance of zFS for R=1 is lower than that of NFS For larger ranges, the performance of zFS was slightly better than that of NFS due to pre-fetching When cooperative cache is used, zFS performance is significantly better than NFS Performance with cooperative cache is lower in second case due to memory pressure and discarded pages generating reject messages
Conclusion
The results show that using the cache of all the clients as one cooperative cache gives better performance as compared to NFS as well as the case when cooperative cache is not used The results also show that using pre-fetching with ranges of four and eight pages results in much better performance

ZFS File System Project

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

ZFS File System Project

Hochgeladen von

Copyright:

Verfügbare Formate

Improving Performance of a Distributed File System Using OSDs and Cooperative Cache

Submitted By: Parvez Gupta Varenya Agrawal

Results show that :

Using pre-fetching in zFS also increases performance significantly

Achieving a scalable file system

Built from off-the-shelf components

The Cooperative Cache

As a result of above, following is achieved :

Cooperative cache algorithm

Cooperative cache algorithm

The page is forwarded to B

Cooperative cache algorithm

Cooperative cache algorithm

Cooperative cache algorithm

Cooperative cache algorithm

Cooperative cache algorithm

Page was moved from N to M to O where its recirculation

O sends a release_lease message which arrives before move notification

Choosing proper third party node

Pre-fetching data in zFS

zFS Testing environment

NFS Testing environment

Comparing zFS and NFS

Results for scenario I

Results for scenario II

Das könnte Ihnen auch gefallen