Beruflich Dokumente
Kultur Dokumente
SNU-OOPSLA-LAB
File Structures
SNU-OOPSLA Lab.
Chapter Objectives(1)
File Structure
SNU-OOPSLA Lab.
Chapter Objectives(2)
File Structure
SNU-OOPSLA Lab.
Contents
8.1 Cosequential operations
8.2 Application of the OO Model to a General Ledger Program
8.3 Extension of the OO Model to Include Multiway Merging
8.4 A Second Look at Sorting in Memory
8.5 Merging as a Way of Sorting Large Files on Disk
8.6 Sorting Files on Tape
8.7 Sort-Merge Packages
8.8 Sorting and Cosequential Processing in Unix
File Structure
SNU-OOPSLA Lab.
Cosequential operations
merging, or union
matching, or intersection
combination of above
File Structure
SNU-OOPSLA Lab.
File Structure
SNU-OOPSLA Lab.
if the names are the same, output the name and read the
next names from the two lists
File Structure
SNU-OOPSLA Lab.
List 1
use input() & initialize() procedure
same
name
List 2
Item(1) > Item(2)
File Structure
Item(2)
SNU-OOPSLA Lab.
return 1;
File Structure
SNU-OOPSLA Lab.
File Structure
SNU-OOPSLA Lab.
10
A Subclass to support lists that are files of strings, one per line
File Structure
SNU-OOPSLA Lab.
11
File Structure
SNU-OOPSLA Lab.
12
HighValue
File Structure
SNU-OOPSLA Lab.
13
File Structure
SNU-OOPSLA Lab.
14
InitializeList (1 List1Name);
InitializeList (2, List2Name);
InitializeOutput (OutputListName);
MoreItems1 = NextItemInList(1);
MoreItems2 = NextItemInLIst(2);
while (MoreItems1 || MoreItems(2) ) { // if either file has more
if (Item(1) < Item(2)) { // list 1 has next item to be processed
ProcessItem(1);
MoreItem1 = NextItemInList(1);
}
else if (Item(1) == Item(2) ) {
ProcessItem(1);
MoreItems1 = NextItemInList(1);
MoreItems2 = NextItemInList(2);
}
else // Item(1) > Item(2) {
ProcessItem(2);
MoreItem2 = NextItemInList(2);
}
}
FinishUp(); return 1;
File Structure
SNU-OOPSLA Lab.
15
List 1
NAME_1
OutputList
List 2
NAME_2
File Structure
SNU-OOPSLA Lab.
16
Assumptions
File Structure
SNU-OOPSLA Lab.
17
Essential Components
initialization - reads from first logical records
one main synchronization loop
- continues as long as relevant records remain
if
(Item(1) > Item(2) then ..........
else if ( Item(1) < Item(2)) then .........
else ........... /* current keys equal */
endif
File Structure
SNU-OOPSLA Lab.
18
Essential
components (contd)
substitute
File Structure
SNU-OOPSLA Lab.
19
Feb
200
270
129
Mar Apr
170
320
230
Debit/Credit
-30
-39
+60
-78
-30
File Structure
SNU-OOPSLA Lab.
20
File Structure
SNU-OOPSLA Lab.
21
File Structure
SNU-OOPSLA Lab.
22
begin loop
determine which list has the key with the lowest value
output that key
move ahead one key in that list
in duplicate input entries, move ahead in each list
loop again
File Structure
SNU-OOPSLA Lab.
23
K-way merge
File Structure
SNU-OOPSLA Lab.
24
Selection Tree
7, 10, 17....List 0
9, 19, 23....List 1
7
11
input
5
5
5
8
8, 16, 29....List 7
File Structure
SNU-OOPSLA Lab.
25
SNU-OOPSLA Lab.
26
Heap
File Structure
SNU-OOPSLA Lab.
27
B (2)
E (4)
G (8)
File Structure
c (3)
H (5) I (6)
F (9)
D (7)
SNU-OOPSLA Lab.
28
File Structure
SNU-OOPSLA Lab.
29
File Structure
SNU-OOPSLA Lab.
30
1 2 3 4 5 6 7 8 9
F
1 2 3 4 5 6 7 8 9
DF
1 2 3 4 5 6 7 8 9
CFD
G
H
Selected heaps
in tree form
C
F
1 2 3 4 5 6 7 8 9
CF D G
1 2 3 4 5 6 7 8 9
CFD GH
File Structure
(continued....)
SNU-OOPSLA Lab.
31
1 2 3 4 5 6 7 8 9
CF D GH I
1 2 3 4 5 6 7 8 9
BFC GH I D
1 2 3 4 5 6 7 8 9
B EC F H I D G
E
A
1 2 3 4 5 6 7 8 9
A BC E HI D G F
File Structure
Selected heaps
in tree form
C
F
G
D
H
B
C
F
G
(continued....)
SNU-OOPSLA Lab.
32
Selected heaps
in tree form
1 2 3 4 5 6 7 8 9
A BC E HI D G F
A
C
B
H
E
G
File Structure
SNU-OOPSLA Lab.
33
File Structure
SNU-OOPSLA Lab.
34
File Structure
File Structure
SNU-OOPSLA Lab.
36
File Structure
SNU-OOPSLA Lab.
37
.............
80runs, each containing 10,000 sorted records
.............
Merge
File Structure
SNU-OOPSLA Lab.
38
File Structure
SNU-OOPSLA Lab.
39
Assumptions
Four I/Os (
refer to page of 39 )
File Structure
SNU-OOPSLA Lab.
40
Four Steps(1)
Step1: Reading records into RAM for sorting and forming runs
File Structure
SNU-OOPSLA Lab.
41
Four Steps(2)
File Structure
SNU-OOPSLA Lab.
42
Four Steps(3)
Step4:
need
4,000 seeks
total
File Structure
SNU-OOPSLA Lab.
43
800MB file
800,000
sorted records
:
:
:
80 buffers(10MB)
File Structure
SNU-OOPSLA Lab.
44
Sort phase
Merge phase
File Structure
SNU-OOPSLA Lab.
45
File Structure
SNU-OOPSLA Lab.
46
Hardware-base Improvements
File Structure
SNU-OOPSLA Lab.
47
K is proportional to N
File Structure
SNU-OOPSLA Lab.
48
Multiple-step merge(1)
File Structure
SNU-OOPSLA Lab.
49
......
32 runs
......
......
32 runs
......
......
File Structure
SNU-OOPSLA Lab.
50
Multiple-step merge(2)
File Structure
SNU-OOPSLA Lab.
51
Facts of Life
Replacement Selection
Idea
always select the key from memory that has the lowest value
output the key
replace it with a new key from the input list
use 2 heaps in the memory buffer
File Structure
Implementation
step1: read records and sort using heap sort
this heap is the primary heap
step2: write out only the record with the lowest value
step3: bring in new record and compare its key with that of
record just output
step3-a: if the new key is higher, insert new record into its proper in the
primary heap along with the other records selected for output
step3-b: if the new key is lower, place the record in a secondary heap
with key values lower than already written out
step4: repeat step 3 while there are records in the primary heap and
there are records to be read in. When the primary heap is empty, make
the secondary heap into the primary heap and repeat step2 & step3
File Structure
SNU-OOPSLA Lab.
53
Remaining input
21, 67, 12
21, 67
21
-
File Structure
Memory(p=3)
5
12
67
67
67
67
-
47
47
47
47
47
-
16
16
16
21
-
SNU-OOPSLA Lab.
Output run
5
12, 5
16, 12, 5
21, 16, 12, 5
47, 21, 16, 12, 5
67, 47, 21, 16, 12, 5
54
Replacement Selection(1)
What happens if a key arrives in memory too late to be output into ins
proper position relative to the other keys? (if 4th key is 2 rather than 12)
Two questions
File Structure
SNU-OOPSLA Lab.
(continued...)
55
Approach
# of Records
per Seek to
Form Runs
Size of
Runs
Formed
# of Seeks
Required to
Form Runs
Merge
Order
Used
Total
Number
of Seeks
800 RAM
sorts followed 10,000
by an 800-way
merge
Replacement
selection followed
by 534-way merge 2,500
(records in random
order)
Replacement
selection followed
by 200-way merge 2,500
(records partially
ordered)
File Structure
10,000
800
1,600
(min)
681,600
58
15,000
534
6,400
521,134
48
40,000
200
200
206,400
30
SNU-OOPSLA Lab.
56
(Heap sort!)
Remaining input
Memory(P=3)
File Structure
5 47 16
12 47 16
67 47 16
67 47 21
67 47 ( 7)
67 (17) ( 7)
(14) (17) ( 7)
SNU-OOPSLA Lab.
Output run(A)
5
12, 5
16, 12, 5
21, 16, 12, 5
47, 21, 16, 12, 5
67, 47, 21, 16, 12, 5
57
Memory(P=3)
Output run(B)
File Structure
14
14
24
24
24
-
17
17
17
18
33
33
-
7
58
58
58
58
58
58
SNU-OOPSLA Lab.
7
14, 7
17, 14, 7
18, 17, 14, 7
24, 18, 17, 14, 7
33, 24, 18, 17, 14, 7
58, 33, 24, 18, 17, 14, 7
58
the two-step merges & replacement selection are still better, but the
results are less dramatic
File Structure
SNU-OOPSLA Lab.
59
Number
of Seeks
for Sorts
and Merges
Seek +
Rotational
Delay
Time(min)
Total
Passes
over the
File
Total
Transmission
Time(min)
298
43
341
341
10,000
800way
681,700
replacement
selection
2,500
(records in
random order)
534way
521,134
228
43
replacement
2,500
selection
(records part
-ially ordered)
200way
206,400
90
43
File Structure
SNU-OOPSLA Lab.
Total of Seek,
Rotation, and
Transmission
Times(min)
341
(continued...) 60
Number
of Seeks
for Sorts
and Merges
10,000
25 x 32
127,200
-way
(one 25-way)
replacement
selection
2,500
(records in
random order)
19 x 28
-way
124,438
(one 19-way)
replacement
2,500
selection
(records part
-ially ordered)
20 x 10
110,400
-way
(one 20-way)
RAM sorts
File Structure
Seek +
Rotational
Delay
Time(min)
Total
Passes
over the
File
56
55
48
SNU-OOPSLA Lab.
Total
Transmission
Time(min)
Total of Seek,
Rotation, and
Transmission
Times(min)
65
121
65
120
65
113
61
Sort phase
Merge phase
SNU-OOPSLA Lab.
62
disk1
input
buffers
heap
disk2
output
buffers
File Structure
SNU-OOPSLA Lab.
63
More drives?
More processors?
mainframes
vector and array processors
massively parallel machines
very fast local area networks
File Structure
SNU-OOPSLA Lab.
64
Effects of Multiprogramming
File Structure
SNU-OOPSLA Lab.
65
the number of initial runs is so long that seek and rotation time is much
greater than transmission time
File Structure
SNU-OOPSLA Lab.
66
Step1
T1
T2
T3
T4
contains runs
R1 R3 R5
R2 R4 R6
---
R7
R8
R9
R10
File Structure
SNU-OOPSLA Lab.
67
(Fig 8.30)
T1
T2
T3
T4
Step1
Step2
Step3
Step4
11111
-4
--
11111
-4
--
-2 2 2
.. 2
--
-2 2
-10
(Fig 8.31)
T1
11111
1 1 1
11
. 1
--
T2
111
.. 1
-4
--
T3
11
-5
5
--
T4
-3 3
.3
-10
Step1
Step2
Step3
Step4
Step5
File Structure
SNU-OOPSLA Lab.
68
Given N runs and J tape drives, is there some way to compute the
optimal merging performance so we have a yardstick against which
to compare the performance of any specific algorithm?
File Structure
SNU-OOPSLA Lab.
69
Sorting in Unix
Compares: cmp
Difference: diff
Common: comm
File Structure
SNU-OOPSLA Lab.
70
Lets Review !!
8.1 Cosequential operations
8.2 Application of the Model to a General Ledger Program
8.3 Extension of the Model to Include Multiway Merging
8.4 A Second Look at Sorting in Memory
8.5 Merging as a Way of Sorting Large Files on Disk
8.6 Sorting Files on Tape
8.7 Sort-Merge Packages
8.8 Sorting and Cosequential Processing in Unix
File Structure
SNU-OOPSLA Lab.
71