Sie sind auf Seite 1von 32

Linux Virtual File System

Peter J. Braam

P.J.Braam/CMU -- 1
Aims
• Present the data structures in Linux VFS
• Provide information about flow of control
• Describe methods and invariants needed to
implement a new file system
• Illustrate with some examples

P.J.Braam/CMU -- 2
History

• BSD implemented VFS for File access


NFS: aim dispatch to
different filesystems
• VMS had elaborate
filesystem
• NT/Win95 have VFS type
interfaces
• Newer systems integrate
VM with buffer cache.

P.J.Braam/CMU -- 3
Linux Filesystems
• Media based • Network
– ext2 - Linux native – nfs
– ufs - BSD – Coda
– fat - DOS FS – AFS - Andrew FS
– vfat - win 95 – smbfs - LanManager
– hpfs - OS/2 – ncpfs - Novell
– minix - well…. • Special ones
– Isofs - CDROM – procfs -/proc
– sysv - Sysv Unix – umsdos - Unix in DOS
– hfs - Macintosh – userfs - redirector to user
– affs - Amiga Fast FS
– NTFS - NT’s FS
– adfs - Acorn-strongarm
P.J.Braam/CMU -- 4
Linux Filesystems (ctd)
• Forthcoming: • Linux serves (unrelated
– devfs - device file system to the VFS!)
– DFS - DCE distributed – NFS - user & kernel
FS
– Coda
• Varia:
– AppleShare -
– cfs - crypt filesystem netatalk/CAP
– cfs - cache filesystem
– SMB - samba
– ftpfs - ftp filesystem
– mailfs - mail filesystem – NCP - Novell
– pgfs - Postgres versioning
file system

P.J.Braam/CMU -- 5
Usefulness

Linux is Obsolete

Andrew Tanenbaum

P.J.Braam/CMU -- 6
Linux VFS
• Multiple interfaces build up File access
VFS:
– files
– dentries
– inodes
– superblock
– quota
• VFS can do all caching &
provides utility fctns to FS
• FS provides methods to
VFS; many are optional

P.J.Braam/CMU -- 7
User level file access
• Typical user level types and code:
– pathnames: “/myfile”
– file descriptors: fd = open(“/myfile”…)
– attributes in struct stat: stat(“/myfile”, &mybuf),
chmod, chown...
– offsets: write, read, lseek
– directory handles: DIR *dh = opendir(“/mydir”)
– directory entries: struct dirent *ent = readdir(dh)

P.J.Braam/CMU -- 8
VFS
• Manages kernel level file abstractions in one
format for all file systems
• Receives system call requests from user level (e.g.
write, open, stat, link)
• Interacts with a specific file system based on
mount point traversal
• Receives requests from other parts of the kernel,
mostly from memory management

P.J.Braam/CMU -- 9
File system level
• Individual File Systems
– responsible for managing file & directory data
– responsible for managing meta-data: timestamps,
owners, protection etc
– translates data between
• particular FS data: e.g. disk data, NFS data,
Coda/AFS data
• VFS data: attributes etc in standard format
– e.g. nfs_getattr(….) returns attributes in VFS format,
acquires attributes in NFS format to do so.
P.J.Braam/CMU -- 10
Anatomy of stat system call
sys_stat(path, buf) {
dentry = namei(path);
if ( dentry == NULL ) return -ENOENT; Establish VFS data

inode = dentry->d_inode;
rc =inode->i_op->i_permission(inode); Call into inode layer
if ( rc ) return -EPERM; of filesystem
Call into inode layer
rc = inode->i_op->i_getattr(inode, buf);
of filesystem
dput(dentry);
return rc;
}
P.J.Braam/CMU -- 11
Anatomy of fstatfs system call

sys_fstatfs(fd, buf) { /* for things like “df” */


file = fget(fd); Translate fd to VFS
if ( file == NULL ) return -EBADF; data structure

superb = file->f_dentry->d_inode->i_super;

rc = superb->sb_op->sb_statfs(sb, buf); Call into superblock


return rc; layer of filesystem
}

P.J.Braam/CMU -- 12
Data structures
• VFS data structures for:
– VFS handle to the file: inode (BSD: vnode)
– User instantiated file handle: file (BSD: file)
– The whole filesystem: superblock (BSD: vfs)
– A name to inode translation: dentry

P.J.Braam/CMU -- 13
Shorthand method notation
• super block methods: sss_methodname
• inode methods: iii_methodname
• dentry methods: ddd_methodname
• file methods: fff_methodname

• instead of :
inode i_op lookup we write iii_lookup
P.J.Braam/CMU -- 14
namei
VFS FS
struct dentry *namei(parent, name) {
if (dentry = d_lookup(parent,name)) ddd_hash(parent, name)
ddd_revalidate(dentry)
else
iii_lookup(parent, name)

struct inode *iget(ino, dev) {


/* try cache else .. */
sss_read_inode(…)

}
P.J.Braam/CMU -- 15
Superblocks
• Handle metadata only (attributes etc)
• Responsible for retrieving and storing
metadata from the FS media or peers
• Struct superblocks hold things like:
– device, blocksize, dirty flags, list of dirty inodes
– super operations
– wait queue
– pointer to the root inode of this FS
P.J.Braam/CMU -- 16
Super Operations (sss_)
• Ops on Inodes: • Superblock manips:
– read_inode – read_super (mount)
– put_inode – put_super (unmount)
– write_inode – write_super (unmount)
– delete_inode – statfs (attributes)
– clear_inode
– notify_change

P.J.Braam/CMU -- 17
Inodes
• Inodes are VFS abstraction for the file
• Inode has operations (iii_methods)
• VFS maintains an inode cache, NOT the
individual FS’s (compare NT, BSD etc)
• Inodes contain an FS specific area where:
– ext2 stores disk block numbers etc
– AFS would store the FID
• Extraordinary inode ops are good for
dealing with stale NFS file handles etc.
P.J.Braam/CMU -- 18
What’s inside an inode - 1
list_head i_hash
list_head i_list
list_head i_dentry caching
int i_count

long i_ino
int i_dev
Identifies file

{m,a,c}time
{u,g}id
mode Usual stuff
size
n_link

P.J.Braam/CMU -- 19
What’s inside an inode -2
superblock i_sb
inode_ops i_op Which FS

wait objects, semaphore


lock For mmap,
vm_area_struct
pipe/socket info
networking
waiting
page information

union { FS Specific
ext2fs_inode_info i_ext2
info:
nfs_inode_info i_nfs
coda_inode_info i_coda blockno’s
..} u fids etc
P.J.Braam/CMU -- 20
Inode state
• Inode can be on one or two lists:
– (hash & in_use) or (hash & dirty ) or unused
– inode has a use count i_count
• Transitions
– unused  hash: iget calls sss_read_inode
– dirty in_use: sss_write_inode
– hash  unused: call on sss_clear_inode, but if
i_nlink = 0: iput calls sss_delete_inode when
i_count falls to 0
P.J.Braam/CMU -- 21
Inode Cache
Players: 1. iget: if i_count>0 ++ 3. free_inodes
2. iput: if i_count>1 - - 4. syncing inodes

Inode_hashtable
sss_clear_inode sss_read_inode
(freeing inos) (iget)
or
sss_delete_inode Fs storage
(iput) Unused inodes
Fs storage

Dirty inodes sss_write_inode


media fs only (sync one)
(mark_inode_dirty) Used inodes Fs storage
P.J.Braam/CMU -- 22
Sales
Red Hat Software sold 240,000 copies of Red Hat
Linux in 1997 and expects to reach 400,000 in
1998.

Estimates of installed servers (InfoWorld):


- Linux: 7 million
- OS/2: 5 million
- Macintosh: 1 million

P.J.Braam/CMU -- 23
Inode operations (iii_)
• lookup: return inode • symbolic links
– calls iget – readlink
• creation/removal – follow link
– create • pages
– link – readpage, writepage,
– unlink updatepage - read or write
– symlink page. Generic for mediafs.
– mkdir – bmap - return disk block
– rmdir number of logical block
– mknod • special operations
– rename – revalidate - see dentry sect
– truncate
– permission
P.J.Braam/CMU -- 24
Dentry world
• Dentry is a name to inode translation structure
• Cached agressively by VFS
• Eliminates lookups by FS & private caches
– timing on Coda FS: ls -lR 1000 files after priming cache
• linux 2.0.32: 7.2secs
• linux 2.1.92: 0.6secs
– disk fs: less benefit, NFS even more
• Negative entries!
• Namei is dramatically simplified P.J.Braam/CMU -- 25
Inside dentry’s
• name
• pointer to inode
• pointer to parent dentry
• list head of children
• chains for lots of lists
• use count

P.J.Braam/CMU -- 26
Dentry associated lists
Legend: inode dentry

dentry inode relationship dentry tree relationship

inode I_dentry list head inode i_dentry list head

= d_inode pointer = d_parent pointer


d_alias chains d_child chains
place: d_instantiate place: d_alloc
remove: dentry_iput remove: d_prune, d_invalidate, d_put
P.J.Braam/CMU -- 27
Dcache
dentry_hashtable (d_hash chains) • namei tries cache:
d_lookup
dhash(parent, name) list head – ddd_compare
• Success: ddd_revalidate
– d_invalidate if fails
– proceed if success
prune namei • Failure: iii_lookup
d_invalidate iii_lookup – find inode
d_drop d_add – iget
• sss_read_inode
– finish:
unused dentries (d_lru chains) • d_add
– can give negative entry
in dcache
P.J.Braam/CMU -- 28
Dentry methods
• ddd_revalidate: can force new lookup
• ddd_hash: compute hash value of name
• ddd_compare: are names equal?
• ddd_delete, ddd_put, ddd_iput: FS cleanup
opportunity

P.J.Braam/CMU -- 29
Dentry particulars:
• ddd_hash and ddd_compare have to deal
with extraordinary cases for msdos/vfat:
– case insensitive
– long and short filename pleasantries
• ddd_revalidate -- can force new lookup if
inode not in use:
– used for NFS/SMBfs aging
– used for Coda/AFS callbacks
P.J.Braam/CMU -- 30
Style

Dijkstra probably hates me

Linus Torvalds

P.J.Braam/CMU -- 31
Memory mapping
• vm_area structure has
– vm_operations
– inode, addresses etc.
• vm_operations
– map, unmap
– swapin, swapout
– nopage -- read when page isn’t in VM
• mmap
– calls on iii_readpage
– keeps a use count on the inode until unmap
P.J.Braam/CMU -- 32

Das könnte Ihnen auch gefallen