Sie sind auf Seite 1von 32

ZFS Internal Structure

Ulrich Grf Senior SE Sun Microsystems

ZFS Filesystem of a New Generation


Integrated Volume Manager ransactions for e!ery change on the "is# $hec#sums for e!erything Self %ealing Sim&lified 'dministration
'lso accelerated $hanges online

(erformance through $ontroll of "ata&ath

E!erything new) No* +ut new in this com,ination*

'nother e-&lanation why using ZFS


$urrent rends in "atacenters
.arger filesystems "ata li!es longer on dis#s +ac#u& de!ices are sufficient Enough de!ices for /estore0 E-&ensi!e +ac#u&s are com&lemented ,y co&ies on dis# $o&ies on dis#s are more !ulnera,le to failures

ZFS and failures


ZFS can correct structural errors caused ,y
+it errors 1 2 sectorin 23425 reads6 Errors caused ,y mis7&ositioning
(hantom writes Misdirected reads Misdirected writes

"M' &arity errors +ugs in software and firmware 'dministration errors

ZFS Self %ealing


Elements0
Integrated Volume Manager 1.arge*6 $hec#sums inside of +loc# (ointer

%ow does it wor#)


/ead a ,loc# determined ,y +loc# (ointer $reate a chec#sum $om&are it with chec#sum in +loc# (ointer 8n Error0 use9com&ute ,loc# 1redundancy6

Structural Integrity 1remem,er0 Star re#6

ZFS Self %ealing


Is different from other filesystems Is a :uality not a!aila,le from other filesystems Is only &ossi,le when com,ining
Integrated Volume Manager /edundant Setu& .arge $hec#sums

Is not a!aila,le on /eiser;< e-t=9e-t>< ?'F.< -fs ?ill ,e a!aila,le on ,trfs< when it is finished 1,ut not all other ZFS features6

ZFS Self %ealing

Application

Application

Application

ZFS mirror

ZFS mirror

ZFS mirror

ZFS Structure
ZFS Structure0
U,er,loc# ree with +loc# (ointers "ata only in lea!es

ZFS Structure0 vdev


' ZFS &ool 1@&ool6 is ,uilt from
?hole dis#s "is# &artitions Files

A called physical vdev

ZFS Structure0 $onfiguration


$onfiguration can ,e
Single de!ice Mirrored 1mirror6 /'I"7B9/'I"75 1raid@< raid@C6 /ecently0 raid@= 1raid@n is in &lanning6

ZFS0 physical vdev


Each physical vdev contains
> vdev labels 1CB5 D+ each6
C la,els at the ,eginning C la,els at the end

' =EB M+ hole for ,oot code 2CF#, ,loc#s for data of the @&ool

. .

. .

ZFS0 vdev label


' vdev label contains = &arts
ga& 1a!oid conflicts with dis# la,els6 n!list 1name !alue &air list6 12CFD+6
'ttri,utes of the @&ool Including the configuration of the @&ool

u,er,loc# array 12CF entries< each 2D+6

$onfiguration also defines logical vdevs


mirror or raid@< log and cache de!ices

ZFS0 n!list in a vdev label 126


$ zdb -v -v data version=4 name='data' state=0 txg=162882 pool_guid=1442865571463645041 hostid=13464466 hostname='nunzio' vdev_tree ...

ZFS0 n!list in a vdev label 1C6


vdev_tree type='root' Id=0 guid=1442865571463645041 children[0] type='disk' id=0 guid=15247716718277951357 path='/dev/dsk/c1t0d0s7' devid='id1,sd@SATA_____SAMSUNG_HM251JJ_______S1J... phys_path='/pci@0,0/pci1179,1@1f,2/disk@0,0:h' whole_disk=0 metaslab_array=14 metaslab_shift=27 ashift=9 asize=25707413504 is_log=0

ZFS0 uberblock
Verification
Magic num,er 1 3-33,a,2oc 6 for endianess Version ransaction Grou& num,er ime7stam& $hec#sum

$ontent0
(ointer to the root of the @&ool tree

ZFS0 uberblock: E-am&le


$ zdb -v -v data ... Uberblock magic = 0000000000bab10c version = 4 txg = 262711 guid_sum = 16690582289741596398 timestamp = 1256864671 UTC = Fri Oct 23 12:04:31 2009 rootbp = ...

ZFS0 block pointer


Data virtual address 12< C or = d!a6
(oints to other ,loc# /eferences a vdev num,er defined in configuration $ontains num,er of ,loc# in vdev Grid information 1for raid@6 Gang ,it 1Ggang chainingH of smaller ,loc#s6

y&e and si@e of ,loc# 1logical< allocated6 $om&ression information 1ty&e< si@e6 ransaction grou& numer $hec#sum of ,loc# 1d!a &oints to this ,loc#6

ZFS0 block pointer0 E-am&le


rootbp = [L0 DMU objset] 400L/200P DVA[0]=<0:5c8087800:200> DVA[1]=<0:4c81a2a00:200> DVA[2]=<0:3d002ca00:200> fletcher4 lzjb LE Contiguous birth=262711 Fill=324 cksum=914be711d:3ab1cae4571 :c07d93434c9b:1ab1618a08eccd

ZFS0 some block pointers in a @&ool

.. %

..

.. %

..

.. %

..

ZFS0 ransactions
2E Starting at a consistent structure CE +loc#s may ,e changed ,y &rograms

8nly &re&ared in main memory +loc#s are ne!er o!erwritten on dis# Structure is com&leted u& to the root ,loc# +loc#s are written to vdevs 8nly free ,loc#s are used he ne-t u,er,loc# slot is written

=E ransaction is &re&ared

>E ransaction is committed

ZFS0 ransaction

ZFS "MU 8,Iects


'll data in a @&ool is structured in o,Iects
dnode defines an o,Iect
y&e and si@e< indirection de&th .ist of block pointers +onus ,uffer 1fEeE for standard file attri,utes6

"MU o,Iect set


8,Iect that contains an array of dnodes U,er,loc#0 &oints to the Meta Object Set

ZFS0 8,Iect Structure

ZFS0 Intent .og


Stores all synchronously written data Uses unallocated ,loc#s Is rooted in the Object Set

ZFS0 "ataset and Sna&shot .ayer


"S. "ataset and Sna&shot .ayer
Filesystems Sna&shots< clones ZFS !olumes

Meta 8,Iect Set contains 8,Iect Set and


Num,er of "S. directory 1Z'( o,Iect6 $o&y of the !de! configuration +loc#&ointers to ,e freed

ZFS0 "S. Structure


ZFS hierarchical names
$hild "ataset Entries in the "S. "irectory Each $hild has own "S. "irectory

"S. "ataset
Im&lemented ,y a "MU dnode

Sna&shots and $lones


.in#ed .ist rooted at the "S. "ataset

ZFS0 "S. Structure

ZFS 'ttri,ute (rocessor


Z'( ZFS 'ttri,ute (rocessor
Name 9 !alue &airs %ash ta,le with o!erflow lists Used for
"irectories ZFS hierarchical names ZFS attri,utes

ZFS microZ'( 9 FatZ'(


microZ'(
8ne ,loc# 1u& to 2CF#6 Sim&le 'ttri,utes 15> ,it num,er6 Name length limited 1B3 ,ytes6

FatZ'(
8,Iect %ash into (ointer a,le (ointers go to Name9Value storage

ZFS (osi- .ayer 9 Volume


ZFS (osi- .ayer
Im&lements a (osi- filesystem with o,Iects "irectories are Z'( o,Iects Files are "MU o,Iects 'dditional0 "elete Jueue

ZFS Volume
8nly one o,Iect in "S. 8,Iect set the Volume

ZFS0 Misc
"ata is com&ressed when s&ecified Metadata is com&ressed ,y default
'll internal nodes Z'( "S. "irectories< "S. "atasets

$o&ies are im&lemented with "V' in +(


Z&ool data is stored in = co&ies ZFS data is stored in C co&ies "ata can ,e stored in u& to = co&ies

ZFS Internal Structure

Juestions)

Das könnte Ihnen auch gefallen