Sie sind auf Seite 1von 9

CSE211 Lecture Notes – 2004/2005-II

CSE 211: Data Structures


Lecture Notes III
by
Ender Ozcan, Şebnem Baydere

Algorithm Analysis and Performance Prediction


and
Recursion
1 Algorithm Analysis and Performance Prediction
Algorithm is a finite sequence of instructions that the computer follows to solve a problem. Each of
the instructions in an algorithm has a clear meaning and can be performed with a finite amount of
effort in a finite length of time. When you face with a problem, the first thing you need to do is to
find out an algorithm to solve it. Once you determine the algorithm to be correct, next step is to find
out the resources (time and space) the algorithm will require. This is known as algorithm analysis. If
your algorithm requires more resources than your computer has (such as gigabytes of main
memory), it is useless.

Data structures and algorithms are interrelated and should be studied together. Because, the
algorithms are the methods used in systematic problem solving. Without methods for storing data in
them, retrieving data from them and performing computational operations on the data in them data
structures are meaningless. Thus, we have to study algorithms as well. The computation time and
memory space required by data structures and algorithms that operate on them are important.

1,1 Algorithm Analysis


The finiteness condition mentioned above implies that an algorithm never goes into an infinite loop
no matter what input we give it. It is difficult to predict the actual computation time of an algorithm
without knowing the intimate details of the computer architecture, the compiler, the quality of the
program and the other factors. But, we can measure the time for a given algorithm by using some
special performance programs called benchmarks.

It is also possible to predict the performance by looking at the growth rate of an algorithm. It is
known that the running time of the algorithm is a function of the input size such as the number of
elements in an array, the number of records in a file etc... The amount of time that any algorithm
takes to run depends on the amount of input it must process. Ex. Sorting an array of 10000 elements
require more processing time than sorting an array of 100 elements. Another example: it is common
to write programs whose running time varies with the square of the problem size. Thus, a program
taking 1 sec to complete a file handling problem with 10 records in the file, require 4 sec for 20
records (not 2 sec). Increasing the file size by a factor of 10; ie 100 records will increase the running
time to 100 sec or 1000 records will require 10000sec (3 hours) to complete!! And 10000 records
will require almost two weeks to finish. This is a long time compared to 1 sec for 10 records test.

This example shows that we need to know something about the growth rate of our algorithm as test
program running time may grow to unacceptable values when real-world-sized data is used. An
experienced programmer estimates the performance of the algorithm and takes some actions if there
is any. In some cases there may be no alternative solution to the program running in "squared" time
but at least the programmer will not be surprised at the end.
CSE211 Lecture Notes – 2004/2005-II

1.2 Growth Rates


We try to estimate the approximate computation time by formulating it in terms of the problem size
N. If we consider that the system dependent factor (such as the compiler, language, computer) is
constant; not varying with the problem size we can factor it out from the growth rate. The growth
rate is the part of the formula that varies with the problem size. We use a notation called O-notation
("growth rate", "big-O"). The most common growth rates in data structure are:

O(1), constant;
O(log(N)), logarithmic
O(N), linear (directly proportional to N);
O(N log (N)) (usually just called N log N);
O(N2), Quadratic
O(N3), Cubic
O(2N), Exponential

If you calculate these values you will see that as N grows log(N) remains quite small and
Nlog(N) grow fairly large but not as large as N2 . Ex. Most sorting algorithms have growth rates of
Nlog(N) or N2 . Following table shows the growth rates for a given N.

Sample Values

N Constant log(N) Nlog(N) N2


1 1 0 0 1
2 1 1 2 4
4 1 2 8 16
8 1 3 24 64
32 1 5 160 1024
256 1 8 2048 65536
2048 1 11 22528 4194304

1.3 Estimating the Growth Rate


Algorithms are developed in a structured way; they combine simple statements into complex blocks
in four ways:

• Sequence, writing one statement below another


• Decision, if-then or if-then-else
• Loops
• Subprogram call

Let us estimate the big-O of some algorithm structures.

Simple statements: We assume that statement does not contain a function call. It takes a fixed amount
to execute. We denote the performance by O(1), if we factor out the constant execution time we are
left with 1.
CSE211 Lecture Notes – 2004/2005-II

Sequence of simple statements: It takes an amount of execution time equal to the sum of execution
times of individual statements. If the performance of individual statements are O(1), so is their sum.

Decision: For estimating the performance, then and else parts of the algorithm are considered
independently. The performance estimate of the decision is taken to be the largest of the two
individual big Os. For the case structure, we take the largest big O of all the case alternatives.

Simple counting loop: This is the type of loop in which the counter is incremented or decrement each
time the loop is executed (for loop). If the loop contains simple statements and the number of times
the loop executes is a constant; in other words, independent of the problem size then the
performance of the whole loop is O(1). On the other hand if the loop is like
Ex:
for (i=0; i< N; i++)

the number of trips depends on N; the input size, so the performance is O(N).

Nested loops: The performance depends on the counters at each nested loop. For ex:
Ex:
for (i=0; i< N; i++) { N 1 N 1 N 1 N 1
for (j=0; j< N; i++) {
sequence of simple statements
  1  N N 1 N 2
i 0 j 0 i 0

i 0
}
}

the outer loop count is N but the inner loop executes N times for each time. So the body of the inner
loop will execute N*N and the entire performance will be O(N2).
Ex:
for (i=1; i<=N; i++) { N i 1 N
N ( N  1) N 2 N
for (j=0; j< i; j++) {
sequence of simple statments 1   i 
i 1 j  0 i 1 2

2

2
}
}
In this case outer count trip is N, but the trip count of the inner loop depends not only N, but the
value of the outer loop counter as well. If outer counter is 1, the inner loop has a trip count 1 and so
on. If outer counter is N the inner loop trip count is N.
How many times the body will be executed?
1+2+3+……(N-1)+N = N(N+1) / 2 = ((N2) +N )/2
Therefore the performance is O(N2). For large N the contribution of the N/2 term is negligible.
Home Exercise:
for (i=1; i<=N; i++) {
for (j=i; j<=N ; j++) {
simple statements
}
}
Only difference with the above example is the trip count of the inner loop decreases rather than
increases.

Generalization: A structure with k nested counting loops where the counter is just incremented or
decrement by one has performance O(Nk) if the trip counts depends on the problem size only.
CSE211 Lecture Notes – 2004/2005-II

While loops: The control variable is multiplied or divided, each time the loop iteration is performed.
Each loop has an initialization step, a termination condition and a modification step indicating how
the control variable should be changed. In while-structure, the termination condition is checked
before the iteration. Let's consider the following:

control=1;
while (control < n) {

Simple statements;

control=2*control;
}

In the above example performance depends on the problem size N. The control variable is
multiplied by 2 until it gets larger than N. The initial value of control is 1, after k iterations we will
have

control = 2k
In order to find k we take the log of both sides; log2(control) = log22k , lg(control)=k
(assuming log2 is lg)
Since the loop stops when control > N , the performance of the algorithm is O(lg(N)).

Generalization: Assume that we multiply the control by some other constant; say fact.
Then after k iterations;
control = fact k

and so, the performance is O(log(N)) where the log is taken to the base fact. In considering the
performance base does not matter as form one base to another the additional factor is a constant.

Home Exercise: Find out the performance of the following structure:

1) control=n;
while (control > 0) {
simple statements;
control = control / 2;
}

2) for (c=1; c<= n; c++) {

i=1;
while (i<= n){

i=2*i;
}
}

Function Call: function has its own big-O, and consider that the function appear in line with the
calling program.
CSE211 Lecture Notes – 2004/2005-II

1.4 General
• Quadratic algorithms are impractical for input sizes exceeding a few thousand.
• Cubic algorithms are impractical for input sizes exceeding a few hundred.

2. Static Searching Problems

An important use of computers is looking up data. If the data are not allowed to change, it is called
static. A static search accesses static data.

Static searching problem:


Given an integer X and an array A, return the position of X in A or an indication that it
is not present. If X occurs more than once, return any occurrence. The array is never
altered during the search.

1) Sequential Search: If the input array is not sorted, we have little choice but to do a linear
sequential search. An unsuccessful search requires the examination of every item in the array, so
it is O(N). The worst-case running is also linear.

2) Binary Search: If the input array is sorted, then we have an alternative to do binary search.
Divide the list in half and look at the name right in the middle. If it is the one then exit, if the
name you are looking for is smaller than the name in the middle than search in the first half.
After k comparisons the data remaining to be searched is of size at most n/2 k . Hence in the worst
case this method requires O(log(N)) comparisons.
For large values of N, the binary search outperforms the sequential search. For instance if N is
1000, then on average a successful sequential search requires about 500 comparisons. The
average binary search will require 8 iterations in total 16 comparisons for a successful search.
For small N, say 6; binary search may not be worth using.

Additional Reading: Weiss, Chapter 5., Langsam Section 1.2

3. Recursion
In this lecture we will discuss recursion and recursive algorithms.

A recursive algorithm is an algorithm that is defined in terms of itself. In other words, it either
directly or indirectly makes a call to itself. Recursion is a powerful problem solving tool. Many
interesting programming problems can be expressed easily using recursive formulation. But we must
be careful not to create circular logic that may result in infinite loops.

A function is said to be recursive if in the course of its execution, the function makes a call to itself.
This call may occur inside the function, in which case the function is directly recursive. In other
cases a function may call another function, which in turn makes a call to the first one. This situation
is known as indirect recursion.
The objective of a recursive function is for the program to proceed through a sequence of calls until,
at a certain point, the sequence terminates.

If the function is improperly defined the program might cycle through a never-ending sequence. To
ensure that recursive functions are well behaved, you should observe the following guidelines:
CSE211 Lecture Notes – 2004/2005-II

1. Every time a recursive function is called, the program should first check to see some basic
condition, such as a particular parameter being equal to zero is satisfied. If this is the case the
function should stop recursing.
2. Each time the function is recursively called, one or more of the arguments passed to the function
should be reduced in size in some way. That is the parameters are nearer to the basic condition.
For example: a positive integer may be smaller on each recursion so that eventually it reaches
zero.

Sometimes mathematical functions are defined recursively. Two classical examples are:

1. The sum of n integers; sum(n): we can write sum(1)=1 and sum(n)= sum(n-1)+n. Here, we have
defined the function sum in terms of itself.
Remember that the recursive definition of sum is identical to the closed form n(n+1)/2. But the
recursive definition is only defined for positive integers.

unsigned long sum(int n)


{
if (n==1)
return 1;
else
return sum(n-1)+ n;
}

2. The factorial of a positive integer n; n! fact(n): we can write a formal definition as follows:
fact(1)=1 and fact(n) = fact(n-1)*n

Observe that a workable recursive algorithm must always reduce the size of the data set, that it
is working with each time it is recursively called and must always provide a terminating
condition such as the first line in our algorithm.

Recursive calculation of 3!

4! = 4 * 3!
= 4 * (3 * 2!)
= 4 * (3 * (2 * 1!))
= 4 * (3 * (2 * 1))
= 4 * (3 * 2)
=4*6
=24

Recursion vs Iteration
As a general rule, avoid recursion when an iterative solution is easy to find. Do not use recursion as
a substitute for a simple loop. Too much recursion can be dangerous. Do not do redundant work
recursively, the program will be incredibly inefficient. Recursion should be preferred when the
underlying data structures in the problem are themselves recursive such as trees.

Lets see this with an example:


The use of recursion in the sum is poor because a simple loop would do the same thing.
Another problem is illustrated by an attempt to calculate the Fibonacci numbers recursively. Lets
assume that we have written a recursive algorithm to calculate the Fib numbers.
CSE211 Lecture Notes – 2004/2005-II

long fib(int n)
{
if(n =1)
return 1;
else
return fib(n-1) +fib(n-2);
}

This routine works but has a serious problem? It performs quite badly. (fib(40) takes 4 minutes to
compute)
Problem: Redundant calculations.To compute fib(n) we recursively compute fib(n-1), when this call
returns we compute fib(n-2). But we have already computed f(n-2) in the process of computing f(n-
1), so the call to fib(n-2) is wasted. It is a redundant calculation. Note that: The redundancy
increases for each recursive call.
F5

F4 F3

F3 F2 F2 F1

F2 F1 F1 F0 F1 F0

F1 F0

Compound Interest Rule: Never duplicate work by solving the same instance of a problem in
separate recursive calls.

Example 1
Tower of Hanoi
The Tower of Hanoi puzzle was invented by the French
mathematician Edouard Lucas in 1883. We are given a tower of
eight disks (initially four in the applet below), initially stacked in
increasing size on one of three pegs. The objective is to transfer the
entire tower to one of the other pegs (the rightmost one in the
applet below), moving only one disk at a time and never a larger
Src Aux Dst one onto a smaller.

The puzzle is well known to students of Computer Science since it appears in virtually any
introductory text on data structures or algorithms. Its solution touches on two important topics
discussed later on:
• recursive functions and stacks
• recurrence relations
CSE211 Lecture Notes – 2004/2005-II

Assume there is a function Solve with for arguments - number of disks and three pegs (source,
intermediary and destination - in this order). Then the body of the function might look like

Solve(N, Src, Aux, Dst)


if N is 0 exit
Solve(N-1, Src, Dst, Aux)
Move from Src to Dst
Solve(N-1, Aux, Src, Dst)

This actually serves as the definition of the function Solve. The function is recursive in that it calls
itself repeatedly with decreasing values of N until a terminating condition (in our case N=0) has been
met. To me the sheer simplicity of the solution is breathtaking. For N=3 it translates into

1. Move from Src to Dst


2. Move from Src to Aux
3. Move from Dst to Aux
4. Move from Src to Dst
5. Move from Aux to Src
6. Move from Aux to Dst
7. Move from Src to Dst

Of course "Move" means moving the topmost disk. For N=4 we get the following sequence

1. Move from Src to Aux


2. Move from Src to Dst
3. Move from Aux to Dst
4. Move from Src to Aux
5. Move from Dst to Src
6. Move from Dst to Aux
7. Move from Src to Aux
8. Move from Src to Dst
9. Move from Aux to Dst
10. Move from Aux to Src
11. Move from Dst to Src
12. Move from Aux to Dst
13. Move from Src to Aux
14. Move from Src to Dst
15. Move from Aux to Dst

Source: http://www.cut-the-knot.org/recurrence/hanoi.shtml

Example 2: Reverse of a string. A palindrome is a phrase that reads the same forward and backward.
Ex. radar, "madam, I'm Adam.". One way to discover whether a string of characters is a palindrome
is to find the reverse of the string and check if they are identical.

Algorithm:
1. If the string contains only one character, its reverse is identical to it and finish
2. Otherwise, remove and save the first character
3. Find the reverse of the remaining string and then concatenate the saved character on to the right-
hand end.
CSE211 Lecture Notes – 2004/2005-II

To find the reverse of 'ABCD'


Append 'A' to the reverse of 'BCD'
Now, to find the reverse of 'BCD'
Append 'B' to the reverse of 'CD'
Now, to find the reverse of 'CD'
Append 'C' to the reverse of 'D'
The reverse of 'D' is 'D'
Appending 'C' gives 'DC'
Appending 'B' gives 'DCB'
Appending 'A' gives 'DCBA'

Define functions TextHead(S): returns the first character of string S,


TextTail(S): returns the string S with the first character removed.
AppendChar: adds the char to the end of string S
StringReverse(S): returns the reverse of string S.
Write a function called Boolean IsPalindrome(char *s): which returns true or false for a given string
s.
Homework 2: Given a set of elements print out all possible permutations of this set.

Example 3
Binary Search: If the input array is sorted, then we have an alternative to do binary search. Divide
the list in half and look at the name right in the middle. If it is the one then exit, if the name you are
looking for is smaller than the name in the middle then search in the first half.
Write a recursive algorithm to lookup a number in a long telephone list.

int list[];
lower=1;
upper=N;

BINSRCH (list,lower,upper,num)

Compute middle = (lower+upper)/2;

Case
num > list [middle] BINSRCH (list, middle+1,upper,num);
num = list [middle] return;
num < list [middle] BINSRCH(list, lower, middle-1,num);

end BINSRCH

Additional Reading: Weiss, Chapter 7-1, 7-3, 7-4.

Das könnte Ihnen auch gefallen