Sie sind auf Seite 1von 22

Lawrence Livermore National Laboratory

ZFS on Linux for Lustre LUG11


April 13, !11

Brian Behlendorf

Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA 94551 This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore ational Laboratory under !ontract DE"#!$%"&' #%'())

LLNL-PRES-479831

ZFS"Lustre #istory
2007 Livermore raises ldiskfs scalability/performance concerns Fsck, filesystem size, random IO, data integrity, etc Alternate backend is needed for large l stre filesystems !F" identified as tec#nically t#e best sol tion Addresses all kno$n ldiskfs limitations %roven prod ction & ality implementation Licensing concerns can be addressed ' st be ported to Lin ( )F"/" n start !F"/L stre ser space implementation

Lawrence Livermore National Laboratory


LLNL-PRES-479831
2

ZFS"Lustre #istory
200* Livermore starts porting !F" to t#e kernel Intended to determine viability of a kernel port +o ns rmo ntable tec#nical iss es discovered Initial performance res lts are enco raging " n L stre,osd development "#ift in strategy, t#e Livermore kernel port is adopted -rian .oins t#e " n L stre,osd development team )ontin ed L stre,osd development Licensing concerns nresolved/// $ork contin es///

Lawrence Livermore National Laboratory


LLNL-PRES-479831
3

ZFS"Lustre #istory
2000 Livermore !F" development Foc s on a prod ction & ality !F" port - ilt & arter scale prototype !F"/L stre filesystem " n/Oracle L stre,osd development Oracle ac& ires " n L stre,osd development contin es nc#anged !erocopy, grants, large dnodes, & otas, tilities, etc Licensing concerns nresolved/// $ork contin es///

Lawrence Livermore National Laboratory


LLNL-PRES-479831
4

ZFS"Lustre #istory
2010 Livermore !F" development Lin ( integration 2 tilities, dev, zevents, disk fail res3 - ilt a f ll scale !F"/L stre filesystem Oracle L stre,osd development Anno nced !F"/L stre only available for "olaris L stre,osd development contin es on Lin ( Oracle cancels L stre/// progress is delayed/// Licensing concerns nresolved/// $ork contin es at LL+L///

Lawrence Livermore National Laboratory


LLNL-PRES-479831
5

ZFS"Lustre #istory
2011 Livermore !F" development !F" %osi( Layer 2!%L3 added L stre,osd development branc# p blicly available 4#amclo d L stre,osd development )ontracted by Livermore to complete L stre,osd 'ost of t#e original L stre,osd developers are at 4#amclo d Licensing concerns nresolved/// $ork contin es/// Late 2011 Livermore plans a !F"/L stre filesystem for "e& oia 50 %- capacity, 512 6-/s 7 1 8-/s band$idt#
Lawrence Livermore National Laboratory
LLNL-PRES-479831
6

ZFS $verview 9eveloped by " n 2no$ Oracle3 on "olaris )ombined filesystem, logical vol me manager, :AI9 )opy,on,$rite - ilt,in data integrity Intelligent online scr bbing and resilvering ;ery large filesystem limits

Lawrence Livermore National Laboratory


LLNL-PRES-479831
7

LLNL%s &easons for portin' ZFS


L stre servers c rrently se e(t< 2ldiskfs3 :andom $rites bo nd by disk IO%" rate, not disk band$idt# O"8 size limits fsck time is nacceptable =(pensive #ard$are re& ired to make disks reliable Late 2011 re& irement> 50%-, 5126-/s 7 1 8-/s At a price $e can afford )O4 se& entializes random $rites +o longer bo nd by drive IO%" "ingle vol me size limit of 1? =i !ero fsck time/ On,line data integrity and error #andling =(pensive :AI9 controllers are nnecessary
Lawrence Livermore National Laboratory
LLNL-PRES-479831
8

Licensin' (oncerns

6%L )99L 6%L 6%L

L stre !F" "%L Lin ( @ernel

)99L A )ommon 9evelopment and 9istrib tion License 6%L A 26n 3 6eneral % blic License
Lawrence Livermore National Laboratory
LLNL-PRES-479831
9

Licensin' (oncerns
9istrib ting "o rce )99L is an open so rce license )99L provides an e(plicit patent license !F" c#anges contrib ted as )99L code !F" so rces kept separate from all 6%L code 9istrib ting -inaries Lin ( kernel allo$s non,6%L t#ird party mod les +vidia, A8I, etc/// Lin s vie$s t#e kernel mod le interface as L6%L !F" ses no 6%L,only symbols Incl ded #eaders do not make a derived $ork

Lawrence Livermore National Laboratory


LLNL-PRES-479831
10

Licensin' (oncerns
!F" is +O8 a derived $ork of Lin ( BIt $o ld be rat#er prepostero s to call t#e Andre$ File"ystem a Cderived $orkC of Lin (, for e(ample, so I t#ink itCs perfectly O@ to #ave a AF" mod le, for e(ample/D Lin s 8orvalds BO r vie$ is t#at . st sing str ct re definitions, typedefs, en meration constants, macros $it# simple bodies, etc/, is +O8 eno g# to make a derivative $ork/ It $o ld take a s bstantial amo nt of code 2coming from inline f nctions or macros $it# s bstantial bodies3 to do t#at/D :ic#ard "tallman 28#e F"FCs vie$3

Lawrence Livermore National Laboratory


LLNL-PRES-479831
11

Solaris )ortin' Layer Linux"ZFS Glue

!F" "%L Lin ( @ernel

Lawrence Livermore National Laboratory


LLNL-PRES-479831
12

ZFS an* Lustre (omponents


!F" )LI Eser @ernel '98 '99 Interface Layer !%L !;OL /dev/zfs O"8 OF9 L stre libzfs

!F" O"9

8ransactional Ob.ect Layer

!IL

!A% 9'E

8raversal
9"L

%ooled "torage Layer

A:) !IO ;9=; )onfig ration

Lawrence Livermore National Laboratory


LLNL-PRES-479831
13

)orte* by LLNL
!F" )LI Eser @ernel '98 '99 Interface Layer !%L !;OL /dev/zfs O"8 OF9 L stre libzfs

!F" O"9

8ransactional Ob.ect Layer

!IL

!A% 9'E

8raversal
9"L

%ooled "torage Layer

A:) !IO ;9=; )onfig ration

Lawrence Livermore National Laboratory


LLNL-PRES-479831
14

(FS + Sun + $racle + ,-amclou*


!F" )LI Eser @ernel '98 '99 Interface Layer !%L !;OL /dev/zfs O"8 OF9 L stre libzfs

!F" O"9

8ransactional Ob.ect Layer

!IL

!A% 9'E

8raversal
9"L

%ooled "torage Layer

A:) !IO ;9=; )onfig ration

Lawrence Livermore National Laboratory


LLNL-PRES-479831
15

ZFS"Lustre )rototype .Zeno/

Lawrence Livermore National Laboratory


LLNL-PRES-479831
16

$SS SSU .Zeno/


)omponent F9: IGost "A" H-O9 "A" 9isk -and$idt# 25/? 6-/s 0?/0 6-/s 0?/0 6-/s 5?/0 6-/s

*0? 8- / ""E 25/? 6-/s 70 28- 9isks / Gost 7 7 *I2 :aid,!2 gro ps 1 7 112 8- O"8 / Gost

Lawrence Livermore National Laboratory


LLNL-PRES-479831
17

$SS SSU .Zeno3/


)omponent F9: IGost "A" H-O9 "A" 9isk -and$idt# J*/< 6-/s J*/< 6-/s 0?/0 6-/s ?0/0 6-/s

0?0 8- / ""E J*/< 6-/s 50 28- 9isks / Gost 5 7 *I2 :aid,!2 gro ps 1 , *08- O"8 / Gost

Lawrence Livermore National Laboratory


LLNL-PRES-479831
18

ZFS )erformance (omparison


J0

25

20

15

10

4rite
) rrent ""E L streIIO: !F" ""E L streIIO:

:ead
) rrent ""E Gard$are Limit !F" ""E !%IO"

"ame n mber of drives "A8A vs "A" disk :AI9,!2 vs :AI9,? 4rite %erformance is Limited by t#e !F" %ort :ead %erformance is Limited by L stre/)%E !F" is noptimized, t#is can all be improvedK

6i-/s

!F" ""E Gard$are Limit

Lawrence Livermore National Laboratory


LLNL-PRES-479831
19

Sin'le No*e ,rite )erformance


!%IO" 4rite %erformance
%ool "ize vs 'i-/s
1?00

1<00

1200

1000

*00

?00

"tripe 'irror :aid,! :aid,!2 :aid,!J

<00

200

0 10 20 J0 <0 50 ?0 70

4rite performance is consistent $it# L stre L stre $orkload :andom 1'i- I/Os 12* t#rs to <00? ob.s ?0 'i-/s per disk for small pools 210 disks3 Limited by task& $#en scaled p 8#is is fi(able

'i-/s

8otal 9isks 210 per vdev3

Lawrence Livermore National Laboratory


LLNL-PRES-479831
20

Sin'le No*e &ea* )erformance


!%IO" :ead %erformance
%ool "ize vs 'i-/s
<500 <000

J500

J000

'i-/s

2500

2000

1500

"tripe 'irror :aid,! :aid,!2 :aid,!J

1000

500

0 10 20 J0 <0 50 ?0 70

8otal 9isks 210 per vdev3

:ead performance is significantly better t#an L stre L stre 4orkload :andom 1'i- I/Os 12* t#rs to <00? ob.s "#o$s good scaling %refetc# disabled 50,?0 'i-/s per disk even for large pools L00M )%E tilization $#en sing 70 disks )an be optimized

Lawrence Livermore National Laboratory


LLNL-PRES-479831
21

0ore 1nformation
!F" N "%L #ttp>//zfsonlin (/org 'ailing Lists

zfs,anno nceOzfsonlin (/org zfs,disc ssOzfsonlin (/org zfs,develOzfsonlin (/org

9o$nload soft$are 9oc mentation L stre s pport for !F" #ttp>//zfsonlin (/org/l stre/#tml Licenses )99L , #ttp>//# b/opensolaris/org/bin/vie$/'ain/licensingPfa& 6%Lv2 , #ttp>//$$$/gn /org/licenses/gpl,2/0/#tml Lin s , #ttp>//lin (mafia/com/fa&/@ernel/proprietary,kernel,mod les/#tml :'" , #ttp>//lkml/indiana/ed /#ypermail/lin (/kernel/0J01/1/0J?2/#tml

Lawrence Livermore National Laboratory


LLNL-PRES-479831
22

Das könnte Ihnen auch gefallen