Beruflich Dokumente
Kultur Dokumente
PARSER
PI value = 38
Hashing
Algorithm
For example:
Assume PI value is 38
Hashing
Algorithm
Row Hash
DSW
PI values
and data
1177 7C3C
DSW
AMP 1
...
...
AMP x
AMP n - 1
Hash Maps
AMP n
AMP #
Data Table
Row ID
Row Hash
Row Data
Uniq Value
x '00000000'
RH
Data
x 'FFFFFFFF'
38
Summary
The MPL uses the DSW of
1177 and uses this value to
locate bucket #1177 in the
Hash Map.
Bucket# 1177 contains the
AMP number that has this
hash value effectively the
AMP with this row.
hashing algorithm
Row Hash
DSW or
Hash Bucket #
Hash Map
AMP #
{
{
{
{
Two systems with the same number of AMPs will have the same
Hash Map.
Changing the number of AMPs in a system requires a change to
the Hash Map.
A Hashing Example
Order
Order
Number
PK
UPI
7325
7324
7415
7415
7103
7225
7384
7402
7188
7202
Customer
Number
Order
Date
Order
Status
7202
2
3
3
1
1
2
1
3
1
2
4/13
4/13
4/13
4/13
4/10
4/15
4/12
4/12
4/13
4/09
O
O
O
C
O
C
C
C
C
C
Hashing Algorithm
691B 14AE
Remaining 16 bits
Hashing Algorithm
691B 14AE
Remaining 16 bits
(Hexadecimal)
HASH MAP
690
691
692
693
694
695
07
15
01
07
04
11
06
08
00
06
12
11
07
02
15
15
11
12
06
04
11
13
13
10
07
01
14
11
05
03
04
00
14
06
10
02
05
14
13
15
07
06
06
14
13
08
07
13
05
03
14
15
03
01
05
02
14
15
02
00
A B C
14
03
08
08
11
06
09
09
09
08
04
05
14
01
15
11
01
07
D E
13
00
10
07
00
06
03
02
09
05
11
05
F
04
15
09
10
13
12
AMP 9
7202 2 4/09 C
Note: This partial Hash Map is based on a 16 AMP system and AMPs are shown in decimal format.
Identifying Rows
A row hash is not adequate to uniquely identify a row.
Consideration #1
A Row Hash = 32 bits = 4.2 billion possible
values
Because there is an infinite number of
possible data values, some data values will
have to share the same row hash.
Consideration #2
A Primary Index may be non-unique (NUPI).
Different rows will have the same PI value
and thus the same row hash.
1254
7769
Hash Algorithm
10A2 2936
(John)
'Smith'
10A2 2936
Hash Synonyms
(Dave)
'Smith'
NUPI Duplicates
Hash Algorithm
0016 5557
Conclusion
A row hash is not adequate to uniquely identify a row.
0016 5557
Rows have
same hash
The Row ID
To uniquely identify a row, we add a 32-bit uniqueness value.
The combined row hash and uniqueness value is called a Row ID.
Row ID
Each stored row
has a Row ID as a
prefix.
Row Hash
(32 bits)
Uniqueness Id
(32 bits)
Row ID
Row Data
Row ID
Row Data
Row Hash
Unique ID
Emp_No
3B11 5032
3B11 5032
3B11 5032
3B11 5033
3B11 5034
3B11 5034
:
0000 0001
0000 0002
0000 0003
0000 0001
0000 0001
0000 0002
:
1018
1020
1031
1014
1012
1021
:
Last_Name
Reynolds
Davidson
Green
Jacobs
Chevas
Carnet
:
First_Name
Jane
Evan
Jason
Paul
Jose
Jean
:
Storing Rows (1 of 2)
Assumptions:
Last_Name is defined as a NUPI.
All rows in this example hash to the same AMP.
Add a row for 'John Smith'
'Smith'
Hash Algorithm
0016 5557
Hash Map
Row ID
AMP #3
Row Data
Row Hash
Unique ID
Last_Name
First_Name
0016 5557
0000 0001
Smith
John
Etc.
Hash Algorithm
1058 9829
Hash Map
Row ID
AMP #3
Row Data
Row Hash
Unique ID
Last_Name
First_Name
0016 5557
1058 9829
0000 0001
0000 0001
Smith
Adams
John
Sam
Etc.
Storing Rows (2 of 2)
Add a row for 'Fred Smith' - (NUPI Duplicate)
'Smith'
Hash Algorithm
0016 5557
Hash Map
Row ID
AMP #3
Row Data
Row Hash
Unique ID
Last_Name
First_Name
0016 5557
0016 5557
1058 9829
0000 0001
0000 0002
0000 0001
Smith
Smith
Adams
John
Fred
Sam
Etc.
Hash Algorithm
0016 5557
Hash Map
Row ID
AMP #3
Row Data
Row Hash
Unique ID
Last_Name
First_Name
0016 5557
0016 5557
0016 5557
1058 9829
0000 0001
0000 0002
0000 0003
0000 0001
Smith
Smith
Jones
Adams
John
Fred
Dan
Sam
Etc.
Given the row hash, what other information would be needed to find the 'Dan Jones' row?
The 'Fred Smith' row?
M
a
s
t
e
r
Cyl 1
Index
d
lin
Cy
er
Cyl 2
Index
I
n
d
e
x
Row Hash
START WITH:
AMP #3
APPLY TO:
Table Id
Row Hash
Master
Index
Table Id
Row Hash
Cylinder #
Cylinder
Index
Row Hash
PI Value
Data
Block
FIND:
Cylinder #
Cyl 3
Index
Cyl 4
Index
PI Value
DATA
BLOCK
Data Row
Row
Data
Cyl 5
Index
Cyl 6
Index
Cyl 7
Index
Review Questions
Fill in the Blanks
1. The output of the hashing algorithm is called the _____ _____.
2. To determine the target AMP, the Message Passing Layer must lookup an entry in the
Hash Map based on the ________ number.
3. Two different PI values which hash to the same value are called Hash ___________ .
4. A Row ID consists of a row hash plus a ____________ value.
5. A uniqueness value is required to produce a unique Row ID because of _______
_________ and ______ ___________ .
6. Once the target AMP has been determined for a PI search, the _______ ________ for that
AMP must be consulted.
7. The Cylinder Index points us to the address and length of the data _______ .