Sie sind auf Seite 1von 5

Prof.

Hasso Plattner
Dictionary Encoding
Motivation

! Maln memory access ls Lhe new bouleneck
! Compresslon reduces number of l/C operauons Lo
maln memory
! Cperauon dlrecLly on compressed daLa
! Csemng wlLh blL-encoded xed-lengLh daLa Lypes
! 8ased on llmlLed value domaln
2
Dictionary Encoding
Example
! 8 bllllon humans
! AurlbuLes:
" rsL name
" lasL name
" gender
" counLry
" clLy
" blrLhday
# 200 byLe per Luple
! Lach aurlbuLe ls dlcuonary encoded
3
Sample Data
4
recID fname |name gender c|ty country b|rthday
. . . . . . .
39 !ohn SmlLh m Chlcago uSA 12.03.1964
40 Mary 8rown f London uk 12.03.1964
41 !ane uoe f alo AlLo uSA 23.04.1976
42 !ohn uoe m alo AlLo uSA 17.06.1932
43 eLer SchmldL m oLsdam CL8 11.11.1973
. . . . . .
1able: world_populauon
Dictionary Encoding
a Column
! A column ls spllL lnLo a dlcuonary and an aurlbuLe vecLor
! ulcuonary sLores all dlsuncL values wlLh lmpllclL valuelu
! AurlbuLe vecLor sLores valuelus for all enLrles ln Lhe column
! osluon ls sLored lmpllclLly
! Lnables osemng wlLh blL-encoded xed-lengLh daLa Lypes
3
recID fname
. .
39 !ohn
40 Mary
41 !ane
42 !ohn
43 eLer
. .
D|cnonary for "fname"
va|ueID Va|ue
. .
23 !ohn
24 Mary
23 !ane
26 eLer
. .
Aur|bute Vector for "fname"
pos|non va|ueID
. .
39 23
40 24
41 23
42 23
43 26
. .
Querying Data using
Dictionaries
Search for AurlbuLe value
(l.e. reLrleve all persons wlLh fname Mary")

1. Search valuelus for requesLed value (Mary")
2. Scan AurlbuLe vecLor for valuelu (24")
3. 1o creaLe Lhe resulL seL subsuLuLe valuelus wlLh Lhelr
correspondlng dlcuonary values
6
Sorted Dictionary
! ulcuonary enLrles are sorLed elLher by Lhelr numerlc value or
lexlcographlcally
# ulcuonary lookup complexlLy: C(log(n)) lnsLead of C(n)
! Selecuon crlLerla wlLh ranges are less expenslve
! ulcuonary enLrles can be furLher compressed Lo reduce Lhe
amounL of requlred sLorage
7
Compression Rate
! uepends on cardlnallLy / enLropy
! CardlnallLy
" 1able cardlnallLy: number of Luples ln a relauon
" Column cardlnallLy: number of dlsuncL values ln a column
! LnLropy
" Measure for lnformauon denslLy
" LnLropy = column cardlnallLy / Lable cardlnallLy
8
Data Size Examples
9
Co|umn Card|-
na||ty
8|ts
Needed
Item S|ze |a|n S|ze S|ze w|th D|cnonary
(D|cnonary + Co|umn)
Compress|on
Iactor
llrsL
names
3 mllllons 23 blL 30 8yLe 400C8 230M8 + 23C8 = 17
LasL
names
8 mllllons 23 blL 30 8yLe 400C8 400M8 + 23C8 = 17
Cender 2 1 blL 1 8yLe 8C8 2b + 1C8 = 8
ClLy 1 mllllon 20 blL 30 8yLe 400C8 30M8 + 20C8 = 20
CounLry 200 8 blL 47 8yLe 376C8 9.4k8 + 8C8 = 47
8lrLhday 40000 16 blL 2 8yLe 16C8 80k8 + 16C8 = 1
1ota|s 200 8yte = 1.618 = 92G8 = 17

Das könnte Ihnen auch gefallen