Sie sind auf Seite 1von 43

Advanced Sorting

CS221N, Data Structures

1
Radar
Midterm
Two weeks from this Friday on 3/27
In class
Closed book

2
Improved Sorting Techniques
What we have seen
Bubble, selection and insertion
 Easy to implement
 Slower: O(n2)
Mergesort
 Faster: O(n log n)
 More memory (temporary array)

What we’ll now see


Shellsort, O(n log2 n)
Quicksort, O(n log n)
Radix sort, O(k n)
 Limited to integers
3
Shellsort
Insertion sort, but adds a new feature
Not quite as fast as Quicksort and Mergesort
But:
Easier to implement
Still much faster than the basic sorts

Best: For arrays of a few thousand items

4
Recall Insertion Sort....
A subarray to the left is
‘partially sorted’
Start with the first element
The player immediately to
the right is ‘marked’.
The ‘marked’ player is
inserted into the correct
place in the partially
sorted array
Remove first
Marked player ‘walks’ to
the left
Shift appropriate elements
until we hit a smaller one
5
The problem
If a small item is very
far to the right
Like in this case ->
You must shift many
intervening large items
one space to the right
Almost N copies
Average case N/2
N items, N2/2 copies
Better if:
Move a small item

6
many spaces, without
shifting
Remember
What made insertion
sort the best of the
basic sorts?
If the array is almost
sorted, O(n)

Shellsort has two


steps:
“Almost sort” the
array
Run insertion sort

7
The “Almost Sort” step
Say we have a 10-element array:
60 30 80 90 0 20 70 10 40 50
Sort indices 0, 4, and 8:
0 30 80 90 40 20 70 10 60 50
Sort indices 1, 5, and 9:
0 20 80 90 40 30 70 10 60 50
Sort indices 2, 6:
0 20 70 90 40 30 80 10 60 50
Sort indices 3, 7
0 20 70 10 40 30 80 90 60 50

8
The “Almost Sort” step
This is called a “4-sort”:
60 30 80 90 0 20 70 10 40 50
0 30 80 90 40 20 70 10 60 50
0 20 80 90 40 30 70 10 60 50
0 20 70 90 40 30 80 10 60 50
0 20 70 10 40 30 80 90 60 50

Once we’ve done this, the array is almost sorted, and we can run insertion
sort on the whole thing
Should be about O(n) time

9
Interval Sequencing
4-sort was sufficient for a 10-element array
For larger arrays, you’ll want to do many of these to achieve an array that
is really almost sorted. For example for 1000 items:
364-sort
121-sort
40-sort
13-sort
4-sort
insertion sort

10
Knuth’s Interval Sequence
How to determine?
Knuth’s algorithm:
Start at h=1
Repeatedly apply the function h=3*h+1, until you pass the number of items in
the array:
h = 1, 4, 13, 40, 121, 364, 1093….
Thus the previous sequence for a 1000-element array

So what interval sequences should we use for a 10000 element array?

11
Why is it better?
When h is very large, you are sorting small numbers of elements and
moving them across large distances
Efficient
When h is very small, you are sorting large numbers of elements and
moving them across small distances
Becomes more like traditional insertion sort
But each successive sort, the overall array is more sorted
So we should be nearing O(n)

That’s the idea!

12
Let’s do our own example…
Sort these fifteen elements:
8 10 1 15 7 4 12 13 2 6 11 14 3 9 5

What sequence of h should we use?


Let’s do it on the board.

13
Java Implementation, page 322
What are function needs to do:
Initialize h properly
Have an outer loop start at outer=h and count up
Have an inner loop which sorts outer, outer-h, outer-2h, etc.

For example, if h is 4:
 We must sort (8, 4, 0), and (9, 5, 1), etc.

Let’s check it out

14
Other Interval Sequences
Original Shellsort:
h = h / 2
Inefficient, leads to O(n2)
Variation:
h = h / 2.2
Need extra effort to make sure we eventually hit h=1
Flamig’s Approach (yields similar to Knuth)
if (h < 5) h = 1; else h = (5*h-1) / 11;

Rule of thumb: Values of h should be relatively prime


More likely that they will intermingle all items on previous pass
15
Number of Operations
Too difficult to perform analytically

Estimates have been made


Range: O(N3/2) to O(N7/6)
Accepted: O(n log2 n)
Even if we assume worst case, it still is much better than insertion sort
 n=10, Shellsort 32 operations, Insertion Sort 100
 n=100, Shellsort 1000 operations, Insertion Sort 10000
 n=1000, Shellsort 32000 operations, Insertion Sort 1 million
 n=10000, ShellSort 1 million operations, Insertion Sort 100 million
Optimistic yields 160,000 in the case of n=10000

16
Quicksort
This will be the sort that is the most optimal in terms of performance and
memory
One must exercise care to avoid the worst case, which is still very
expensive

Central mechanism is partitioning, which we’ll look at first

17
Partitioning
Idea: Divide data into two groups, such that:
All items with a key value higher than a specified amount (the pivot) are in
one group
All items with a lower key value are in another

Applications:
Divide employees who live within 15 miles of the office with those who live
farther away
Divide households by income for taxation purposes
Divide computers by processor speed

18 Let’s see an example with an array


Partitioning
Say I have 12 values:
175 192 95 45 115 105 20 60 185 5 90 180
I pick a pivot=104, and partition (NOT sorting yet):
95 45 20 60 5 90 | 175 192 115 105 185 180
Note: In the future the pivot will be an actual element
Also: Partitioning need not maintain order of elements and usually won’t
 Although I did in this example

The partition is the leftmost item in the right array:


95 45 20 60 5 90 | 175 192 115 105 185 180
Which we return to designate where the division is located.

19
Java Implementation, page 328
Partition function must:
Accept a pivot value, and a starting and ending index
Start an integer left=start-1, and right=end
Increment left until we find an A[left] bigger than the pivot:
 Then decrement right until we find an A[right] bigger than the pivot.
 Swap A[left] and A[right].
Stop when left and right cross
Then left is the partition
 All items smaller are on the left, and larger on the right
 There can’t be anything smaller to the right, because we would’ve swapped.

20
Efficiency: Partitioning
O(n) time
left starts at 0 and moves one-by-one to the right
right starts at n-1 and moves one-by-one to the left
When left and right cross, we stop.
 So we’ll hit each element just once

Number of comparisons is n+1


Number of swaps is worst case n/2
Worst case, we swap every single time
Each swap involves two elements
Usually, it will be less than this
 Since in the random case, some elements will be on the correct side of the pivot
21
Modified Parititioning
In preparation for Quicksort:
Choose our pivot value to be the rightmost element
Partition the array around the pivot
Ensure the pivot is at the location of the partition
 Meaning, the pivot should be the leftmost element of the right subarray

Example:
Unpartitioned: 42 89 63 12 94 27 78 10 50 36
Partitioned around Pivot: 3 27 12 36 63 94 89 78 42 50

What does this imply about the pivot element after the partition?
22
Placing the Pivot
Goal: Pivot must be in the leftmost position in the right subarray
3 27 12 36 63 94 89 78 42 50
Our algorithm does not do this currently.
It currently will not touch the pivot
left increments till it finds an element < pivot
right decrements till it finds an element > pivot
So the pivot itself won’t be touched, and will stay on the right:
3 27 12 63 94 89 78 42 50 36

23
Options
We have this:
3 27 12 63 94 89 78 42 50 36
Our goal is the position of 36:
3 27 12 36 63 94 89 78 42 50
We could either:
Shift every element in the right subarray up (inefficient)
Just swap the leftmost with the pivot! Better 
We can do this because the right subarray is not in any particular order
3 27 12 36 94 89 78 42 50 63

24
Swapping the Pivot
Just takes one more line to our Java method
Basically, a single call to swap()
Swaps A[end-1] (the pivot) with A[left] (the partition index)

25
Quicksort
The most popular sorting algorithm
For most situations, it runs in O(n log n)
Remember partitioning. It’s the key step. And it’s O(n).

The basic algorithm (recursive):


1. Partition the array into left (smaller keys) and right (larger keys) groups
2. Call ourselves and sort the left group
3. Call ourselves and sort the right group

Base case: We partition just one element, which is just the element itself

26
Java Implementation, page 333
We can probably just do this in our heads:
1. Choose the pivot to be the rightmost element
2. Partition the array into left (smaller keys) and right (larger keys) groups
3. Call ourselves and sort the left group
4. Call ourselves and sort the right group

Base case: We partition just one element, which is just the element itself

27
Shall we try it on an array?
10 70 50 30 60 90 0 40 80 20
Let’s go step-by-step on the board

28
Best case…
We partition the array each time into two equal subarrays
Say we start with array of size n = 2i
We recurse until the base case, 1 element

Draw the tree


First call -> Partition n elements, n operations
Second calls -> Each partition n/2 elements, 2(n/2)=n
operations
Third calls -> Each partition n/4, 4(n/4) = n operations
…
(i+1)th calls -> Each partition n/2i = 1, 2i(1) = n(1) = n ops
29
The VERY bad case….
If the array is inversely sorted.
Let’s see the problem:
90 80 70 60 50 40 30 20 10 0
What happens after the partition? This:
0 20 30 40 50 60 70 80 90 10
This is almost sorted, but the algorithm doesn’t know it.
It will then call itself on an array of zero size (the left subarray) and an
array of n-1 size (the right subarray).
Producing:
0 10 30 40 50 60 70 80 90 20

30
The VERY bad case…
In the worst case, we partition every time into an array of 0 elements and
an array of n-1 elements
This yields O(n2) time:
First call: Partition n elements, n operations
Second calls: Partition 0 and n-1 elements, n-1 operations
Third calls: Partition 0 and n-2 elements, n-2 operations
Draw the tree

Yielding:
Operations = n + n-1 + n-2 + … + 1 = n(n+1)/2 -> O(n2)

31
Summary
What caused the problem was “blindly” choosing the pivot from the right
end.
In the case of a reverse sorted array, this is not a good choice at all

Can we improve our choice of the pivot?


Let’s choose the middle of three values

32
Median-Of-Three Partitioning
Everytime you partition, choose the median value of the left, center and
right element as the pivot

Example:
44 11 55 33 77 22 00 99 101 66 88

Pivot: Take the median of the leftmost, middle and rightmost


44 11 55 33 77 22 00 99 101 66 88 - Median: 44
Then partition around this pivot:
11 33 22 00 44 55 77 99 101 66 88
Increases the liklihood of an equal partition
33
Also, it cannot possibly be the worst case
Let’s see how this fixes the worst case for
Quicksort
Here’s our array:
90 80 70 60 50 40 30 20 10 0

Let’s see on the board how this fixes things


In fact in a perfectly reversed array, we choose the middle element as the
pivot!
Which is optimal
We get O(n log n)

Vast majority of the time, if you use QuickSort with a Median-Of-Three

34
partition, you get O(n log n) behavior
One final optimization…
After a certain point, just doing insertion sort is faster than partitioning
small arrays and making recursive calls

Once you get to a very small subarray, you can just sort with insertion sort
You can experiment a bit with ‘cutoff’ values
Knuth: n=9

35
(Time Pending) Java Implementation
QuickSort with maximum optimization
Median-Of-Three Partitioning
Insertion Sort on arrays of size less than 9

36
Operation Count Estimates
For QuickSort
n=8: 30 comparisons, 12 swaps
n=12: 50 comparisons, 21 swaps
n=16: 72 comparisons, 32 swaps
n=64: 396 comparisons, 192 swaps
n=100: 678 comparisons, 332 swaps
n=128: 910 comparisons, 448 swaps

The only competitive algorithm is mergesort


But, takes much more memory like we said.

37
Radix Sort
Idea: Disassemble elements into individual digits
Sort digit by digit

Does not require any comparisons

Arrange data items according to digit values


Makes sense:
Arrange by the least significant digit first
Then the second least, etc.

38
Assumptions
Assumption
The assumption is: base 10, positive integers!

If we can do that, we can assemble 10 groups of values


Group array values by 1s digit
Then reassemble by 10s digit, using group 0 first and group 9 last
 So sort by 10s digit, without losing the order of the sort by 1s
 So within each group by 10s, the elements are still sorted by 1s
Then reassemble by 100s digit, same process
Repeat for all digits

39
Example: On the Board
421 240 35 532 305 430 124
Remember: 35 has a 100s digit of zero!

40
Operations
n Elements
Copy each element once to a group, and once back again:
2n copies -> O(n)
Then you have to copy k times, where k is the maximum number of digits
in any value
So, 2*k*n copies -> O(kn)
Zero comparisons

What will k typically be?


For n=100 (unique values), you have at least 3 digits
For n=1000 (unique values), you have at least 4 digits
41
Generally, for n = 10i, you have (i+1) digits or log10n
Logarithms
Independently of the base, logarithmic functions grow at about the same
rate
So really, you are dealing with roughly O(n log n) complexity still
Plus Houston, you’ve got memory…
Need to store extra space for the groups!
Like mergesort, double the memory
So a solid argument can be made that Quicksort is still better
Best case:
Small numbers of digits
And remember underlying assumption: Positive integers

42
Java Implementation
Not in the book; will be left as an exercise
This is your two-week assignment! Implement the entire Radix Sort.

43

Das könnte Ihnen auch gefallen