Sie sind auf Seite 1von 16

University of Pennsylvania

Distributed File Systems

11/21/00

Remote Files
File service vs. file server

University of Pennsylvania

File service interface: the specification of what the file system offers to its clients. File server: a process that runs on some machine and helps implement the file service. upload/download model remote access model Comparison between two model creating and deleting directories

File Service Model (Fig 13-1)

The directory service


naming and renaming files
moving files

11/21/00

Goals

University of Pennsylvania

1 Network transparency: uses do not have to aware the location of files to access them
location transparency: the name of a file does not reveal any kind of the file's physical storage location. /server1/dir1/dir2/X server1 can be moved anywhere (e.g., from CIS to SEAS). location independence: the name of a file does not need to be changed when the file's physical storage location changes. The above file X cannot moved to server2 if server1 is full and server2 is no so full.

2 High availability: system failures or scheduled activities such as backups, addition of nodes

11/21/00

Architecture
Computation model

University of Pennsylvania

file severs -- machines dedicated to storing files and performing storage and retrieval operations (for high performance) clients -- machines used for computational activities may have a local disk for caching remote files name server -- maps user specified names to stored objects, files and directories cache manager -- to reduce network delay, disk delay problem: inconsistency open, close, read, write, etc.

Two most important services

Typical data access actions

11/21/00

Design Issues
Naming and name resolution

University of Pennsylvania

Semantics of file sharing (Fig 13-4, Fig 13-5) Stateless versus stateful servers (Fig 13-8) Caching -- where to store files (Fig 13-9)

Cache consistency (Fig 13-11)


Replication (Fig 13-12)

11/21/00

Naming and Name Resolution


a name space -- collection of names name resolution -- mapping a name to an object

University of Pennsylvania

same or different view of a directory hierarchy (Fig. 13-3)

3 traditional ways to name files in a distributed environment


concatenate the host name to the names of files stored on that host: system-wide uniqueness guaranteed, simple to located a file; however, not network transparent, not location independent, e.g., /machine/usr/foo mount remote directories onto local directories: once mounted, files can be referenced in a location-transparent manner

provide a single global directory: requires a unique file name for every file, location independent, cannot encompass heterogeneous environments and wide geographical areas
11/21/00

Semantics of File Sharing


Assume open; reads/writes; close

University of Pennsylvania

Consistency Semantics Problem (Fig 13-4): read after write


1 UNIX semantics: value read is the value stored by last write Writes to an open file are visible immediately to others that have this file opened at the same time. Easy to implement if one server and no cache. 2 Session semantics: Writes to an open file by a user is not visible immediately by other users that have files opened already. Once a file is closed, the changes made by it are visible by sessions started later. 3 Immutable-Shared-Files semantics: A sharable file cannot be modified. File names cannot be reused and its contents may not be altered. Simple to implement. 4 Transactions: All changes have all-or-nothing property. W1,R1,R2,W2 not allowed where P1 = W1;W2 and P2 = R1;R2
11/21/00

Stateful versus Stateless Service


Two approaches to server-side information
1 stateful file server

University of Pennsylvania

a client performs open on a file


the server keeps file information (e.g., file descriptor entry, offset) Adv: increased performance On server crash, it looses all its volatile state information On client crash, the server needs to know to claim state space 2 stateless file server -- each request is self-contained each request identifies the file, the position, read/write.

server failure is identical to slow server (client retries...)


each request must be idempotent. NFS employs this.
11/21/00

Caching
Four places to store files (Fig. 13-9)

University of Pennsylvania

servers disk: slow performance server caching: in main memory cache management issue, how much to cache, replacement strategy still slow due to network delay Used in high-performance web-search engine servers client caching in main memory can be used by diskless workstation faster to access from main memory than disk compete with the virtual memory system for physical memory space Three options (Fig. 13-10) client-cache on a local disk large files can be cached the virtual memory management is simpler a workstation can function even when it is disconnected from the network

11/21/00

A Comparison of Caching and Remote Service


1 reduces remote accesses (esp, when locality is capitalized) reduces network traffic and server load

University of Pennsylvania

2 total network overhead is lower for big chunks of data (caching) than a series of responses to specific requests.
3 disk access can be optimized better for large requests than random disk blocks

4 cache-consistency problem is the major drawback. If there are frequent writes, overhead due to the consistency problem is significant.
5 OS is simpler for remote service.

11/21/00

Cache Consistency

University of Pennsylvania

Reflecting changes to local cache to master copy Reflecting changes to master copy to local caches

Copy 1

write
Master copy update

Copy 2

11/21/00

Update algorithms for client caching


write-through: all writes are carried out immediately

University of Pennsylvania

Reliable: little information is lost in the event of a client crash

Slow: cache not that useful

delayed-write: delays writing at the server


possible to perform many writes to a block in the cache before it is written

if data is written and then deleted immediately, data need not be written at all (20-30 % of new data is deleted with 30 secs)

write-on-close: delay writing until the file is closed at the client


if file is open for short duration, works fine if file is open for long, susceptible to losing data in the event of client crash

11/21/00

Cache Coherence

University of Pennsylvania

How to maintain consistency between locally cached data with the master data when the data has been modified by another client?
1 Client-initiated approach -- check validity on every access: too much overhead first access to a file (e.g., file open) every fixed time interval 2 Server-initiated approach -- server records, for each client, the (parts of) files it caches. After the server detects a potential inconsistency, it reacts. 3 Not allow caching when concurrent-write sharing occurs. Allow many readers. If a client opens for writing, inform all the clients to purge their cached data.

11/21/00

Cache consistency, cont.


Potential inconsistency:

University of Pennsylvania

In session semantics, a client closes a modified file. In UNIX semantics, the server must be notified whenever a file is opened and the intended mode (read or write mode) must be indicated for every open. Disable cache when a file is opened in conflicting modes.

11/21/00

Replication
Reasons:
Increase reliability improve availability balance the servers workload

University of Pennsylvania

how to make replication transparent (Fig. 13-12) how to keep the replicas consistent
Problems -- mainly with updates 1 a replica is not updated due to its server failure 2 network partitioned
1 weighted vote for read and write 2 current synchronization site for each file group to control access

Replication Management:

11/21/00

Current research issues


Scalability Mobile Users
disconnected operation low bandwidth communication

University of Pennsylvania

Security

11/21/00

Das könnte Ihnen auch gefallen