Sie sind auf Seite 1von 49

ESD 611 Data Structures and Algorithms Unit 1: INTRODUCTION TO ALGORITHMS (3 hrs) 1.1 Notion of Algorithm 1.

2 Fundamentals of Algorithmic problem Solving 1.3 Important problem types 1.4 Analysis Frame work 1.5 Asymptotic Notations &Basic efficiency classes. Fundamentals of Algorithmic Problem Solving 1. Understanding the Problem 2. Ascertaining the Capabilities of a computational device 3. Choosing between Exact and Approximate Problem Solving 4. Deciding on Appropriate Data Structures 5. Algorithm Design Techniques 6. Methods of Specifying an Algorithm 7. Proving an Algorithms Correctness 8. Analyzing an Algorithm 9. Coding an Algorithm An Input to an algorithm specifies an instance of the problem the algorithm solves Ascertaining the Capabilities of a computational device: If the machine executes the instructions one after the other, one operation at a time then the algorithm is said to be Sequential algorithms. If the machine executes the instructions in parallel then the algorithms are said to Parallel Algorithms. Choosing between Exact and Approximate Problem Solving : The next principle decision is to choose between solving the problem exactly or solving it approximately. Why would one opt for an approximation algorithm? 1. There are important problems that simply cannot be solved exactly, such as Extracting square roots, solving non linear equations and evaluating definite integrals. 2. Available algorithms for solving a problem exactly can be unacceptably slow because of the Problems intrinsic Complexity. The most well known of them is the Traveling Salesman Problem of finding the shortest tour through n cities. 3. Deciding on Appropriate Data Structures such as Stacks, Queues , Sets , Graphs etc. 4. An Algorithm Design Technique is a general approach to solving problems algorithmically that is applicable to a variety of problems from different area of computing.

5. Methods of Specifying an Algorithm : Algorithm can be specified in Pseudo code or Flowchart. 6. A Pseudo code is a mixture of natural language and programming language like constructs. 7. A Flowchart is a method of expressing an algorithm by a collection of connected geometric shapes containing descriptions of the algorithms steps. Proving an Algorithms Correctness:
You have to prove that the algorithm yields a required result for every legitimate input

in a finite amount of time. A common technique for proving correctness is to use mathematical induction because an algorithms iterations provide a natural sequence of steps needed for such proofs. To show that an algorithm is incorrect, you need just one instance of its input for which the algorithm fails. If the algorithm is found to be incorrect, you need to either redesign it under same decisions regarding the data structures, the design technique and so on. For an approximation algorithm, we usually would like to be able to show that the error produced by algorithm does not exceed a predefined limit. Analyzing an algorithm : There are two kinds of algorithm efficiency : Time efficiency and Space efficiency. Time efficiency indicates how fast the algorithm runs. Space efficiency indicates how much extra memory the algorithm needs. Another desirable characteristics of an algorithm is Simplicity. Simpler algorithms are easier to understand and easier to program, consequently, the resulting programs usually contain fewer bugs. Another desirable characteristics of an algorithm is generality. There are two issues : generality of the problem the algorithm solves and the range of inputs it accepts. It is Some times easier to design an algorithm for a problem posed on general terms.

Algorithm Specification : 1. Comments begin with // and continue until the end of line. 2. Blocks are indicated with matching braces { } 3. An identifier begins with a letter . The Data types of variables are not explicitly declared.

4. Assignment of values to variables is done using the assignment statement < variable > : = < expression > ; 5. There are two boolean values True and False. The Logical operators and , or , not and relational operators <, <=,=, <> , >, >= are provided. 6. Elements of multidimensional arrays are accessed using [ and ] . 7. The following loop statements are employed.

While loop While < condition > do { < Statement 1 > . < Statement n > } For loop for variable : = value 1 to value 2 step step do { < Statement 1 > . < Statement n > }

Repeat Until loop repeat < Statement 1 > < Statement n > until < condition >
8. A Conditional statement has the following forms :

if < condition > then < statement > if < condition > then < statement 1 > else < statement 2 > Case Statement

case { : < condition > : < statement 1> . : < condition n > : < statement n > : else : < statement n+1 > }
9. Input and Output are done using instructions read and write

10. There is only one type of procedure : Algorithm . An Algorithm consists of a heading and a body . The heading takes the form Algorithm Name ( < parameter list > ) where Name is the name of the procedure and (<parameter list >) is a listing of the procedure parameters. Algorithm to find maximum of n
1. Algorithm Max ( A , n)

2. // A is an array of size n. 3. { 4.
5. 6. 7.

Result := A[1]; for i:= 2 to n do if A[i] > Result then Result := A[i]; return Result ;

8. } A and n are procedure parameters, Result and i are local variables. Space Complexity: The Space needed by an algorithm is seen to be the sum of the following components: A fixed part that is independent of the Characteristics (ex: number, size) of the inputs and outputs. This part includes the Instruction Space (Space for the code), Space for constants A variable part that consists of the space needed by component variables whose size is dependent on the particular problem instance being solved, the space needed by the referenced variables and the recursion stack space. The Space requirement S (P) of any algorithm P may be written as S (P) = c + Sp (Instance characteristics), where c is constant. Example 1: 1. Algorithm abc(a,b,c)

2. { 3. return a+ b+ b*c+ (a +b- c) /(a +b)+ 4.0 ; 4. }

If we assume that one word is adequate to store the values of each of a, b, c and the result then the space needed by abc algorithm is 4 words Since the space needed by abc is independent of the instance characteristic, Sp = 0. Example 2:

1. Algorithm Sum(a,n) 2. { 3. 4. 5. 6. 7. } The Space needed by n is one word since it is of type integer. The Space needed by a is the space needed by variables of type array of floating point numbers. This requires at least n words to store n elements in array and 3 words to store the values of n, i and s.The Space required for Sum algorithm is S >= (n+3). s := 0.0 ; for i: = 1 to n do s: = s + a[i]; return s ;

Example 3 : 1. Algorithm RSum(a,n) 2. { 3. 4. 5. } Assume that the return address requires one word of memory, each call to RSum requires at least 3 words (one word each for n , return address and a pointer to a[ ]) The depth of the recursion is n+1. The space required is > = 3(n+1) if (n<= 0 ) then return 0.0 ; else return RSum (a,n-1) + a[n] ;

Time Complexity :

The Time T (P) taken by a program P is the sum of the Compile time and Run time.

The Compile time does not depend on the instance characteristics.

l We may assume that a compiled program will be run several times with out re compilation.
l l

The run time is denoted by tp (instance characteristics)

tp (n) = ca ADD (n) + cs SUB (n) + cm MUL (n) + cd DIV (n)+ .Where ca , cs, cm ,cd respectively denote the time needed for an addition, subtraction ,multiplication ,division and so on and ADD,SUB ,MUL , DIV denotes the functions of addition, subtraction ,multiplication ,division respectively.

The time complexity is depending on the number of program steps. l A program step is defined as a syntactically or semantically meaningful segment of a program that has an execution time that is independent of instance characteristics. l Ex : The entire statement

return a +b + b*c + (a+ b- c) / (a+ b) + 4.0; could be regarded as a step since its execution time is independent of instance characteristics

l We can determine the number of steps needed by a program to solve a particular problem instance in one of two ways. Introduce a new variable, count, into the program. This is a global variable with initial value 0. Count is incremented by the step count of that statement in the program.

Build a table in which we list total number of steps contributed by each statement. Determine the number of steps per execution (s/e) of that statement and the total number times each statement is executed. The total number of contributions of all statements will give the step count of the program. Example 1 : 1. Algorithm Sum(a,n) 2. { 3. 4. 5. 6. { 7. 8. 9. } 10. count := count + 1; // for last time of for 11. count := count + 1; // for return 12. return s; count := count + 1; // for For s: = s +a[i]; count := count + 1; // for assignment s: = 0.0 ; count := count +1; for i: = 1 to n do

13. } 14. The above algorithm can be simplified for count variable as follows. 1. 2. 3. 4. 5. } Algorithm Sum(a,n) { for i:=1 to n do count := count+2; count:= count + 3;

From the algorithm, the value of count will increase by a total of 2n .If count is zero to start with, and then it will be 2n+3 on termination. So each invocation above algorithm executes a total of 2n +3 steps.

Example 2: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. { Algorithm RSum(a,n) { count := count +1; if (n<=0)then { count := count + 1 ; // for the return return 0.0; } else // for the if conditional

11.count := count +1 ; 12. // for the addition ,function invocation and return 13. 14. 15. } return RSum( a,n-1) + a[n]; }

Let t RSum(n) be the increase in the value of count when algorithm terminates.The recursive formula for step count is given by

t RSum(n)

=2 if n>0

if n=0

= 2 + t RSum(n-1)

The above recurrence relation can be solved by substitution method. t RSum(n) = 2 + t RSum(n-1) = 2 + 2 + t RSum(n-2) = 2(2) + t RSum(n-2) = 2(3) + t RSum(n-3) = 2(n) + t RSum(0) = 2(n) + 2 for n >= 0. = 2 + 2 + 2 + t RSum(n-3)

So, the step count for RSum algorithm is 2n + 2

The Second method to determine the step count of an algorithm is to build a table in which we list the total number of steps contributed by each statement. The s/e (Steps per execution) of a statement is the amount by which the count changes as a result of the execution of that statement. The total number times the step executed is known as Frequency. By combing these two quantities, the total contribution of each statement is obtained. By adding contributions of all statements, the step count for the entire algorithm is obtained. Asymptotic Notation (O,,) Big O notation : The function f(n) = O(g(n)) iff there exist positive constants c and n0 such that f(n) <= c * g(n) for all n , n>= n0 . Examples: 1. The function 3n + 2 = O(n) as 3n + 2 <= 4n 2. The function 3n + 3 = O(n) as 3n + 3 <= 4n 3. The function 100n + 6 = O(n) as 100n + 6 <= 101n 4. The function 10n2 + 4n+2 = O(n2) as 10n2 + 4n + 2 5. The function 6*2n + n2 = O(2n) as 6*2n + n2 <= for all n >= 2 for all n >= 3 for all n >= 6 <= 11n2 for all n >= 5 7*2n for all n>=4 10n4 for all n >= 2 equal to c for any

6. The function 10n2 + 4n+2 = O(n4) as 10n2 + 4n + 2 <= 7. 8.

The function 3n + 2 O(1) as 3n + 2 is not less than or constant c and all n >= n0 The function 10n2 + 4n+2 O(n)

The statement f(n) = O(g(n)) states only that g(n) is an upper bound on the value of f(n) for all n , n >= n0 . If f(n) = amnm + +a1n + a0 then f(n) = O(nm) .

Omega notation () : The function f(n) = (g(n)) iff there exist positive constants c and n0 such that f(n) >= c * g(n) for all n , n>= n0 . Examples: 1. The function 3n + 2 = (n) as 3n + 2 >= 3n for all n >= 2 2. The function 3n + 3 = (n) as 3n + 3 >= 3n for all n >= 1 3. The function 100n + 6 = (n) as 100n + 6 >= 100n for all 4. n >= 1

The function 10n2 + 4n+2 = (n2) as 10n2 + 4n + 2 >= n2 for all n >= 1

5. The function 6*2n + n2 = (2n) as 6*2n + n2 >= 2n for all n>=1 For the statement f(n) = (g(n)) to be informative , g(n) should be as large a function of n as possible for which the statement f(n) = (g(n)) is True. If f(n) = amnm + +a1n + a0 and am > 0 then f(n) = (nm) .

Theta notation(): The function f(n) = (g(n)) iff there exist positive constants c1,c2 and n0 such that f(n) >= c1 g(n) and f(n) <= c2g(n) for all n , n>= n0 . Examples:
The function 3n + 2 = (n) as 3n + 2 >= 3n for all n >= 2 and 3n + 2 <= 4n for all n >=

2 so c1 = 3 , c2 = 4 and

n0 = 2

The function 10n2 + 4n+2 = (n2) The function 6*2n + n2 = (2n)

The function 3n + 2 (1)

The function f(n) = (g(n)) iff g(n) is both an upper and lower bound on f(n). If f(n) = amnm + +a1n + a0 and am > 0 then f(n) = (nm) .

Little oh Notation (o): The function f (n) = o (g(n)) iff lim n f(n)/ g(n) = 0. Examples:
1. The function 3n + 2 = o (n2) since lim n 3n + 2 / n2 = 0

2. 3. 4. 5.

The function 3n + 2 = o (n log n) The function 3n + 2 = o (n log log n) The function 6*2n + n2 =o (3n) The function 3n + 2 o (n )
1. The function 6*2n + n2 o (2n)

Little Omega Notation (): The function f(n) = (g(n)) iff f(n) = 0.

lim n g(n)/

Unit 2 :INTRODUCTION TO DATA STRUCTURES 1 hrs 2.1 Information & meaning 2.2 Arrays 2.3 Structures.

Unit 3 :STACKS , RECURSION & QUEUES 5 hrs 3.1 Definition & examples 3.2 Representing (operations) Stacks 3.3 Applications 3.4 Recursive Definition & processes 3.5 Applications 3.6 Queues & its representation 3.7 Different types of Queues. Stacks and Queues Two of the more common data objects found in computer algorithms are stacks and queues. Both of these objects are special cases of the more general data object, an ordered list. A stack is an ordered list in which all insertions and deletions are made at one end, called the top. A queue is an ordered list in which all insertions take place at one end, the rear, while all deletions take place at the other end, the front. Given a stack S=(a[1],a[2],.......a[n]) then we say that a[1] is the bottommost element and element a[i]) is on top of element a[i-1], 1<i<=n. When viewed as a queue with a[n] as the rear element one says that a[i+1] is behind a[i], 1<i<=n.

Adding into stack procedure add(item : items); {add item to the global stack stack; top is the current top of stack and n is its maximum size} begin if top = n then stack full; top := top+1; stack(top) := item; end: {of add}

Deletion in stack procedure delete(var item : items); {remove top element from the stack stack and put it in the item} begin if top = 0 then stack empty; item := stack(top); top := top-1; end; {of delete} Procedure delete actually combines the functions TOP and DELETE, stackfull and stackempty are procedures which are left unspecified since they will depend upon the particular application. Often a stackfull condition will signal that more storage needs to be allocated and the program re-run. Stackempty is often a meaningful condition. Addition into a queue procedure addq (item : items); {add item to the queue q} begin if rear=n then queuefull else begin rear :=rear+1; q[rear]:=item; end; end;{of addq} Deletion in a queue procedure deleteq (var item : items); {delete from the front of q and put into item} begin if front = rear then queueempty else begin front := front+1 item := q[front]; end; end; {of deleteq}

Unit 4: LINKED LISTS 3 hrs 4.1 Introduction 4.2 Different types of lists & their implementation. Linked Lists Simple data structures such as arrays, sequential mappings, have the property that successive nodes of the data object are stored a fixed distance apart. These sequential storage schemes proved adequate given the functions one wished to perform (access to an arbitrary node in a table, insertion or deletion of nodes within a stack or queue). However, when a sequential mapping is used for ordered lists, operations such as insertion and deletion of arbitrary elements become expensive. For example, consider the following list of all of the three letter English words ending in AT: (BAT, CAT, EAT, FAT, HAT, JAT, LAT, MAT, OAT, PAT, RAT, SAT, TAT, VAT, WAT) To make this list complete we naturally want to add the word GAT. If we are using an array to keep this list, then the insertion of GAT will require us to move elements already in the list either one location higher or lower. We must either move HAT, JAT, LAT,..., WAT or else move BAT, CAT, EAT, FAT. If we have to do many such insertions into the middle, then neither alternative is attractive because of the amount of data movement. Or suppose we decided to move the word LAT. Then again, we have to move many elements so as to maintain the sequential representation of the list. When our problem called for several ordered lists of varying sizes, sequential representation again proved to be inadequate. By storing each list in a different array of maximum size, storage may be wasted. By maintaining the lists in a single array a potentially large amount of data movement is needed. "Ordered lists" reduce the time needed for arbitrary insertion and deletion which are explained in this section. Sequential representation is achieved by using linked representations. Unlike a sequential representation where successive items of a list are located a fixed distance apart, in a linked representation these items may be placed anywhere in memory. Another way of saying this is that in a sequential representation the order of elements is the same as in the ordered list, while in a linked representation these two sequences need not be the same.

To access elements in the list in the correct order, with each element we store the address or location of the next element in that list. Thus associated with each data item in a linked representation is a pointer to the next item. This pointer is often referred to as a link. In general a node is a collection of data, data(1), ... ,data(n) and links link(1), ... , link(m). Each item in a node is called a field. A field contains either a data item or a link. The elements of the list are stored in a one dimensional array called data. But the elements of the list no longer occur in sequential order, BAT before CAT before EAT, etc. Instead we relax this restriction and allow them to appear anywhere in the array and in any order. In order to remind us of the real order, a second array link is added. The values in this array are pointers to elements in the data array. Since the list starts at data[8] = BAT, let us set a variable f=8. link[8] has the value 3, which means it points to data[3] which contains CAT. The third element of the list is pointed at by link[3] which is EAT. By continuing in this way we can list all the words in the proper order. We recognize that we have come to the end when link has a value of zero. It is customary to draw linked lists as an ordered sequence of nodes with links being represented by arrows. We shall use the name of the pointer variable that points to the list as the name of the entire list. Thus the list we consider is the list f. Notice that we do not explicitly put in the values of the pointers but simply draw arrows to indicate they are there. This is so that we reinforce in our own mind the facts that (i) the nodes do not actually reside in sequential locations, and that (ii) the locations of nodes may change on different runs. Therefore, when we write a program which works with lists, we almost never look for a specific address except when we test for zero. It is much more easier to make an arbitrary insertion or deletion using a linked list rather than a sequential list. To insert the data item GAT between FAT and HAT the following steps are adequate: get a node which is currently unused; let its address be x; set the data field of this node to GAT; set the link field of x tp point the node after FAT which contains HAT;

set the link field of the node containing FAT to x. The important thing is that when we insert GAT we do not have to move any other elements which are already in the list. We have overcome the need to move data at the expense of the storage needed for the second field, link. Now suppose we want to delete GAT from the list. All we need to do is find the element which immediately precedes GAT, which is FAT, and set link[9] to the position of HAT which is 1. Again, there is no need to move the data around. Even though the link field of GAT still contains a pointer to HAT, GAT is no longer in the list.

Unit 5: TREES & GRAPHS 7 hrs 5.1 Binary Trees 5.2 Binary tree Representation 5.3 The Huffman Algorithm 5.4 Representing lists as trees 5.5 Balanced Search Trees 5.6 Expression Trees 5.7 Tree Traversal Techniques 5.8 Introduction to Graphs and their Representations. 5.9 DFS &BFS Search 5.10 Topological Sorting Binary Trees A binary tree is an important type of structure which occurs very often. It is characterized by the fact that any node can have at most two branches, i.e.,there is no node with degree greater than two. For binary trees we distinguish between the subtree on the left and on the right, whereas for trees the order of the subtreewas irrelevant. Also a binary tree may have zero nodes. Thus a binary tree is really a different object than a tree. Definition: A binary tree is a finite set of nodes which is either empty or consists of a root and two disjoint binary trees called the left subtree and the right subtree. We can define the data structure binary tree as follows: structure BTREE declare CREATE( ) --> btree ISMTBT(btree,item,btree) --> boolean MAKEBT(btree,item,btree) --> btree LCHILD(btree) --> btree DATA(btree) --> item RCHILD(btree) --> btree for all p,r in btree, d in item let ISMTBT(CREATE)::=true ISMTBT(MAKEBT(p,d,r))::=false LCHILD(MAKEBT(p,d,r))::=p; LCHILD(CREATE)::=error DATA(MAKEBT(p,d,r))::d; DATA(CREATE)::=error RCHILD(MAKEBT(p,d,r))::=r; RCHILD(CREATE)::=error end end BTREE This set of axioms defines only a minimal set of operations on binary trees. Other operations can usually be built in terms of these. The distinctions between a binary tree and a tree should be analyzed. First of all there is no tree having zero nodes, but there is an empty binary tree. The two binary trees below are different. The first one has an empty right subtree while the second has an empty left subtree. If these are regarded as trees, then they are the same despite the fact that they are drawn slightly differently.

Binary Tree Representations A full binary tree of depth k is a binary tree of depth k having pow(2,k)-1 nodes. This is the maximum number of the nodes such a binary tree can have. A very elegant sequential representation for such binary trees results from sequentially numbering the nodes, starting with nodes on level 1, then those on level 2 and so on. Nodes on any level are numbered from left to right. This numbering scheme gives us the definition of a complete binary tree. A binary tree with n nodes and a depth k is complete iff its nodes correspond to the nodes which are numbered one to n in the full binary tree of depth k. The nodes may now be stored in a one dimensional array tree, with the node numbered i being stored in tree[i]. Lemma 5.3: If a complete binary tree with n nodes (i.e., depth=[LOG2N]+1) is represented sequentially as above then for any node with index i, 1 (i) parent(i) is at [i/2] if is not equal to 1. When i=1, i is the root and has no parent. (ii) lchild(i) is at 2i if 2in, then i has no left child. (iii) rchild(i) is at 2i+1 if 2i+1n, then i has no right child. Proof: We prove (ii). (iii) is an immediate consequence of (ii) and the numbering of nodes on the same level from left to right. (i) follows from (ii) and (iii). We prove (ii) by induction on i. For i=1, clearly the left child is at 2 unless 2>n in which case 1 has no left child. Now assume that for all j, 1n in which case i+1 has no left child. This representation can clearly be used for all binary trees though in most cases there will be a lot of unutilized space. For complete binary trees the representation is ideal as no space is wasted. In the worst case a skewed tree of k will require pow(2,k)-1 spaces. Of these only k will be occupied. While the above representation appears to be good for complete binary trees it is wasteful for many other binary trees. In addition, the representation suffers from the general inadequacies of sequential representations. Insertion or deletion of nodes from the middle of a tree requires the movement of potentially many nodes to reflect the change in level number of these nodes. These problems can be easily overcome through the use of a linked representation. Each node will have three fields leftchild, data and rightchild and is defined in Pascal as type treepointer = ^treerecord; treerecord = record leftchild : treepointer; data : char; rightchild : treepointer; end; Binary Tree Traversal There are many operations that we often want to perform on trees. One notion that arises frequently is the idea of traversing a tree or visiting each node in the three exactly once. A full traversal produces a linear order for the information in a tree. This linear order may be familiar and useful. When traversing a binary tree we want treat each node and its subtrees in the same fashion. If we let L, D, R stand for moving left, printing the data, and moving right when at a node then there are six possible combinations of traversal: LDR, LRD, DLR, DRL, RDL, and RLD. If we adopt the convention that we traverse left before right then only three traversals remain: LDR, LRD, and DLR. To these we assign the names inorder, postorder and preorder because there is a natural correspondence between these traversals and producing the infix, postfix and prefix forms of an expression. Inorder Traversal: informally this calls for moving down the tree towards the left untilyou can go no farther. Then you "visit" the node, move one node to the right and continue again. If you cannot move to the right,

go back one more node. A precise way of describing this traversal is to write it as a recursive procedure. procedure inorder(currentnode:treepointer); {currentnode is a pointer to a noder in a binary tree. For full tree traversal, pass inorder the pointer to the top of the tree} begin {inorder} if currentnode <> nil then begin inorder(currentnode^.leftchild); write(currentnode^.data); inorder(currentnode^.rightchild); end end; {of inorder} Recursion is an elegant device for describing this traversal. A second form of traversal is preorder: procedure preorder(currentnode:treepointer); {currentnode is a pointer to a node in a binary tree. For full tree traversal, pass preorder the ponter to the top of the tree} begin {preorder} if currentnode <> nil then begin write(currentnode^.data); preorder(currentnode^.leftchild); preorder(currentnode^.rightchild); end {of if} end; {of preorder} In words we would say "visit a node, traverse left and continue again. When you cannot continue, move right and begin again or move back until you can move right and resume. At this point it should be easy to guess the next thraversal method which is called postorder: procedure postorder(currentnode:treepointer); {currentnode is a pointer to a node in a binary tree. For full tree traversal, pass postorder the pointer to the top of the tree} begin {postorder} if currentnode<> nil then begin postorder(currentnode^.leftchild); postorder(currentnode^.rightchild); write(currentnode^.data); end {of if} end; {of postorder}

Depth First Search Algorithm Algorithm DFS(G) // Implements a depth first search traversal of a given graph

// Input : Graph = (V,E) // Output : Graph g with its vertices marked with consecutive // // // integers in the order they have been first encountered by the DFS traversal mark each vertex in V with 0 as a mark of being unvisited count = 0 for each vertex v in V do if v is marked with 0 dfs(v) dfs(v) // visits recursively all the unvisited vertices connected to vertex // v and assigns them the numbers in the order they are // encountered via global variable count count count + 1; mark v with count

for each vertex w in V adjacent to v do if w is marked with 0 dfs(w)

With the adjacency matrix representation of the graph , the traversals time efficiency is

in (|V|2) and for the adjacency linked list representation , it is in (|V| + |E|) , where | V| and |E| are the number of the graphs vertices and edges respectively.
Important elementary application of DFS include checking connectivity and checking

acyclycity of a Graph. Graphs connectivity can be done as follows. Start a DFS traversal at an arbitrary vertex and check , after the algorithms halts, whether all the graphs vertices will have been visited. If they have , the graph is connected , otherwise it is not connected. If there is a back edge from the vertex to its ancestor then the graph has a cycle.
A vertex of a connected graph is said to be its articulation point it its removal with all

edges incident to it brakes the graph into disjoint pieces. Breadth First Search Algorithm It proceeds in a concentric manner by visiting first all the vertices that are adjacent to a starting vertex, then all unvisited vertices that are adjacent to a starting vertex, then all

unvisited vertices two edges apart from it , and so on , until all the vertices in the same connected component as the starting vertex are visited. If there still remain unvisited vertices , the algorithm has to be restarted at an arbitrary vertex of another connected component of the graph. It is convenient to use a Queue to trace the operation of Breadth First Search. The Queue is initialized with the traversals starting vertex, which is marked as visited. On each iteration, the algorithm identifies all unvisited vertices that are adjacent to the front vertex, marks them as visited , and adds them to the queue, after that , the front vertex is removed from the queue. The Starting Vertex serves as the root of the tree. Whenever a new unvisited vertex is reached for the first time, the vertex is attached as a child to the vertex it is being reached from with an edge called a Tree edge. If an edge leading to a previously visited vertex other than immediate predecessor is encountered , the edge noted as a Cross edge.

Algorithm BFS(G) // Implements a depth first search traversal of a given graph // Input : Graph = (V,E) // Output : Graph g with its vertices marked with consecutive // // // count = 0 for each vertex v in V do if v is marked with 0 bfs(v) bfs(v) // visits recursively all the unvisited vertices connected to vertex // v and assigns them the numbers in the order they are // encountered via global variable count integers in the order they have been first encountered by the DFS traversal mark each vertex in V with 0 as a mark of being unvisited

count = count + 1; mark v with count and initialize a queue with v while the queue is not empty do for each vertex w in V adjacent to the fronts vertex v do if w is marked with 0 count = count + 1; mark w with count add w to the queue remove vertex v from the front of the queue With the adjacency matrix representation of the graph , the traversals time efficiency is in (|V|2) and for the adjacency linked list representation , it is in (|V| + |E|) , where | V| and |E| are the number of the graphs vertices and edges respectively. Important elementary application of DFS include checking connectivity and checking acyclycity of a Graph.

Directed Graph Basic Concepts A directed Graph or Digraph is a graph with directions specified for all its edges. A Digraph can be represented by Adjacency Matrix and Adjacency Linked list. There two basic differences between Directed graph and Undirected Graph . 1. The Adjacency matrix of a Directed graph does not have to be Symmetric 2. An edge in a Digraph has just one (not two) corresponding nodes in the digraphs adjacency linked lists. Depth First Search forest exhibits all four types of edges possible in a DFS forest of a directed graph. A directed cycle in a digraph is a sequence of its vertices that starts and ends at the same vertex in which every vertex is connected to its immediate predecessor by an edge directed from the predecessor to the successor. If a DFS forest of a directed graph has no back edges , the digraph is a dag ( directed acyclic graph).

Topological Sorting Consider a set of five required courses {C1,C2,C3,C4,C5} a part-time student has to take in some degree program. The courses can be taken in any order as long as the following course prerequisites are met :

C1 and C2 has no prerequisites . C3 requires C1 and C2 , C4 requires C3, and C5 requires C3 and C4. The Student can take only one course per term. In which order should the student take the courses? The situation can be modeled by a graph in which vertices represent courses and directed edges indicate prerequisite requirements. In terms of this digraph, the question is whether we can list its vertices in such an order that, for every edge in the graph, the vertex where the edge starts is listed before the vertex where the edge ends. Can you find such ordering of this digraphs vertices? This problem is called Topological Sorting. It can be posed for an arbitrary digraph, but it is easy to see that the problem cannot have a solution if a digraph has a directed cycle.

Thus, for topological sorting to be possible, a digraph must be dag.

If a digraph has no cycles , the topological sorting problem for it has a solution. There are two efficient algorithms that both verify whether a digraph is a dag and if it is , produce an ordering of vertices that solves the topological sorting problem. First algorithm is simple application of the DFS. Perform a DFS traversal and note the order in which vertices become dead ends. Reversing this order yields a solution to the topological sorting problem provided. Of course , no back edge has been encountered, the digraph is not a dag and topological sorting of its vertices is impossible. The second algorithm is based on a direct implementation of the Decrease by One technique, repeatedly , identify in a remaining digraph a source, which is a vertex with no incoming edges, and delete it along with all edges outgoing from it. If there are several sources , break the tie arbitrarily .If there is none , stop because the problem can not be solved.
The order in which the vertices are deleted yields a solution to the topological sorting

problem.

Imagine a large project e.g., in construction or research that involves thousands of interrelated tasks with known prerequisites. The first thing is to make sure that the set of given prerequisites is not contradictory. The convenient way of doing this is to solve the topological sorting problem for the projects digraph. Only then we can start scheduling the tasks to minimize the total completion of the project.

Unit 6: DIVIDE & CONUQER 3 hrs 6.1 Merge Sort

6.2 Quick sorts. 6.3 Binary search 6.4 Strassens Matrix Multiplication

DIVIDE & CONUQER: Divide-and-conquer is a top-down technique for designing algorithms that consists of dividing the problem into smaller sub problems hoping that the solutions of the sub problems are easier to find and then composing the partial solutions into the solution of the original problem. Little more formally, divide-and-conquer paradigm consists of following major phases:

Breaking the problem into several sub-problems that are similar to the original problem but smaller in size, Solve the sub-problem recursively (successively and independently), and then Combine these solutions to sub problems to create a solution to the original problem. (OR) Divide-and-Conquer The most-well known algorithm design strategy:

1.

Divide instance of problem into two or more smaller instances

2. Solve smaller instances recursively 3. Obtain solution to original (larger) instance by combining these solutions

Mergesort

1. Split array A[0..n-1] into about equal halves and make copies of each half in arrays B

and C 2. Sort arrays B and C recursively 3. Merge sorted arrays B and C into array A as follows: Repeat the following until no elements remain in one of the arrays:

compare the first elements in the remaining unprocessed portions of the arrays copy the smaller of the two into A, while incrementing the index indicating the unprocessed portion of that array

Once all elements in one of the arrays are processed, copy the remaining unprocessed elements from the other array into A.

Pseudocode of Mergesort

Merge sort Example

8 3 2 9 7 1 5 4

8 3 2 9

7 1 5 4

8 3

2 9

71

5 4

3 8

2 9

1 7

4 5

2 3 8 9

1 4 5 7

1 2 3 4 5 7 8 9

Analysis of Mergesort
1. All cases have same efficiency: (n log n) 2. Number of comparisons in the worst case is close to theoretical minimum for

comparison-based sorting:
i. log2 n!

n log2 n - 1.44n

3. Space requirement: (n) (not in-place)

4. Can be implemented without recursion (bottom-up)

Quicksort
b b

Select a pivot (partitioning element) here, the first element Rearrange the list so that all the elements in the first s positions are smaller than or equal to the pivot and all the elements in the remaining n-s positions are larger

than or equal to the pivot (see next slide for an algorithm)

Exchange the pivot with the last element in the first (i.e., ) subarray the pivot is now in its final position Sort the two subarrays recursively Partitioning Algorithm

Time complexity: (r-l) comparisons Quicksort Example Solution: 2 3 1 4 5 8 9 7 1 2 3 4 5 7 8 9 1 2 3 4 5 7 8 9 1 2 3 4 5 7 8 9 1 2 3 4 5 7 8 9 5 3 1 9 8 2 4 7

Analysis of Quicksort
1. Best case: split in the middle (n log n) 2. Worst case: sorted array! (n2)

3. Average case: random arrays (n log n)

4. Improvements: a. better pivot selection: median of three partitioning b. switch to insertion sort on small subfiles c. elimination of recursion
5. These combine to 20-25% improvement 6. Considered the method of choice for internal sorting of large files (n 10000)

Binary Search (simplest application of divide-and-conquer) Binary Search is an extremely well-known instance of divide-and-conquer paradigm. Given an ordered array of n elements, the basic idea of binary search is that for a given element we "probe" the middle element of the array. We continue in either the lower or upper segment of the array, depending on the outcome of the probe until we reached the required (given) element. Problem Let A[1 . . . n] be an array of non-decreasing sorted order; that is A [i] A [j] whenever 1 i j n. Let 'q' be the query point. The problem consist of finding 'q' in the array A. If q is not in A, then find the position where 'q' might be inserted. Formally, find the index i such that 1 i n+1 and A[i-1] < x A[i].

Strassen's Matrix Multiplication Basic Matrix Multiplication Suppose we want to multiply two matrices of size N x N: for example A x B = C.

C11 = a11b11 + a12b21 C12 = a11b12 + a12b22 C21 = a21b11 + a22b21 C22 = a21b12 + a22b22 2x2 matrix multiplication can be accomplished in 8 multiplication.(2log28 =23)

algorithm void matrix_mult (){ for (i = 1; i <= N; i++) { for (j = 1; j <= N; j++) { compute Ci,j; } }} Time analysis Ci , j = ai ,k bk , j
k =1 N

Thus T ( N ) = c = cN 3 = O( N 3 )
i =1 j =1 k =1

Strassen showed that 2x2 matrix multiplication can be accomplished in 7 multiplication and 18 additions or subtractions. .(2log27 =22.807) This reduce can be done by Divide and Conquer Approach.

Divide and Conquer Matrix Multiply A A0 A2 A1 A3 B0 B2 B1 B3 B = R

A0 B0+A1 B2 A2 B0+A3 B2

A0 B1+A1 B3 A2 B1+A3 B3

Divide matrices into sub-matrices: A0 , A1, A2 etc Use blocked matrix multiply equations Recursively multiply sub-matrices a0 X b0 = a0 b0 Terminate recursion with a simple base case

P1 = (A11+ A22)(B11+B22) P2 = (A21 + A22) * B11 P3 = A11 * (B12 - B22) P4 = A22 * (B21 - B11) P5 = (A11 + A12) * B22 P6 = (A21 - A11) * (B11 + B12) P7 = (A12 - A22) * (B21 + B22) C11 = P1 + P4 - P5 + P7 C12 = P3 + P5 C21 = P2 + P4 C22 = P1 + P3 - P2 + P6 Comparison C11 = P1 + P4 - P5 + P7 = (A11+ A22)(B11+B22) + A22 * (B21 - B11) - (A11 + A12) * B22+ (A12 - A22) * (B21 + B22) = A11 B11 + A11 B22 + A22 B11 + A22 B22 + A22 B21 A22 B11 A11 B22 -A12 B22 + A12 B21 + A12 B22 A22 B21 A22 B22 = A11 B11 + A12 B21

Strassen Algorithm void matmul(int *A, int *B, int *R, int n) { if (n == 1) { (*R) += (*A) * (*B); } else { matmul(A, B, R, n/4); matmul(A, B+(n/4), R+(n/4), n/4); matmul(A+2*(n/4), B, R+2*(n/4), n/4); matmul(A+2*(n/4), B+(n/4), R+3*(n/4), n/4);

matmul(A+(n/4), B+2*(n/4), R, n/4); matmul(A+(n/4), B+3*(n/4), R+(n/4), n/4); matmul(A+3*(n/4), B+2*(n/4), R+2*(n/4), n/4); matmul(A+3*(n/4), B+3*(n/4), R+3*(n/4), n/4); } Divide matrices in sub-matrices and recursively multiply sub-matrices Time Analysis

Unit 7: TRANSFORM & CONQUER 3 hrs 7.1 Balanced search trees, AVL Trees, 2-3 Trees, Splay Trees 7.2 Heaps and Heap sort Heaps and Heapsort Definition A heap is a binary tree with keys at its nodes (one key per node) such that:
It is essentially complete, i.e., all its levels are full except possibly the last level, where

only some rightmost keys may be missing The key at each node is keys at its children
10 5 7 5 10 7 5 10 7

a heap

not a heap

not a heap

Note: Heaps elements are ordered top down (along any path down from its root), but they are not ordered left to right

Some Important Properties of a Heap


Given n, there exists a unique binary tree with n nodes that

is essentially complete, with h = log2 n The root contains the largest key The subtree rooted at any node of a heap is also a heap
A heap can be represented as an array

Heaps Array Representation


Store heaps elements in an array (whose elements indexed, for convenience, 1 to n) in

top-down left-to-right order


Left child of node j is at 2j Right child of node j is at 2j+1 Parent of node j is at j/2 Parental nodes are represented in the first n/2

locations

Heap Construction (bottom-up)

Step 0: Initialize the structure with keys in the order given Step 1: Starting with the last (rightmost) parental node, fix the heap rooted at it, if it doesnt satisfy the heap condition: keep exchanging it with its largest child until the heap condition holds Step 2: Repeat Step 1 for the preceding parental node Example of Heap Construction Construct a heap for the list 2, 9, 7, 6, 5, 8
2 9 2 2 8 9

>
6

2 9

9 8

>
6

>
2

Heapsort Stage 1: Construct a heap for a given list of n keys Stage 2: Repeat operation of root removal n-1 times: Exchange keys in the root and in the last (rightmost) leaf Decrease heap size by 1 If necessary, swap new root with larger child until the heap condition holds

Sort the list 2, 9, 7, 6, 5, 8 by heapsort Stage 1 (heap construction) Stage 2 (root/max removal

Both worst-case and average-case efficiency: (nlogn)

Unit 8: DYNAMIC PROGRAMMING 3 hrs 8.1 Wars halls and Floyds Algorithm 8.2 Knapsack and Memory function

Warshalls algorithm Main idea: a path exists between two vertices i, j, iff

there is an edge from i to j; or there is a path from i to j going through vertex 1; or there is a path from i to j going through vertex 1 and/or 2; or there is a path from i to j going through vertex 1, 2, and/or k; or ... there is a path from i to j going through any of the other vertices

Idea: dynamic programming


Let V={1, , n} and for kn, Vk={1, , k} For any pair of vertices i, jV, identify all paths from i to j whose intermediate vertices are all drawn from Vk: Pijk={p1, p2, }, if Pijk then Rk[i, j]=1 For any pair of vertices i, j: Rn[i, j], that is Rn Starting with R0=A, the adjacency matrix, how to get R1 Rk-1 Rk Rn

Idea: dynamic programming


pPijk: p is a path from i to j with all intermediate vertices in Vk If k is not on p, then p is also a path from i to j with all intermediate vertices in Vk-1: pPijk-1

In the kth stage determine if a path exists between two vertices i, j using just vertices among 1, , k R(k-1)[i,j] (path using just 1, , k-1) R(k)[i,j] = or (path from i to k and from k to j using just 1, , k-1)

(R(k-1)[i,k] and R(k-1)[k,j])

FLOYD WARSHALL ALGORITHM PROBLEM STATEMENT Find the shortest path between all pairs of vertices and determine the cost of each path.

AIM To implement Floyd-Warshall algorithm to find the shortest paths between all pairs of vertices and to determine the cost of each path. ALGORITHM FLOYD-WARSHALL (W) 1.
2.

3. 4. 5.
6. 7.

n<-rows(w) D(0)<-w for k<-1 to n do for i<-1 to n do for j<-1 to n do dij(k)<-min(dij(k-1), dik(k-1)+ dkj(k-1)) return D(n)

Constructing shortest path 1. 2. 3. for k=0 do for i<-1 to n do for j<-1 to n 4. do Tij=NIL if i=j or Wij= 5. else 6. do Tij=i if i<>j and wij< 7. for k=1 to n 8. do for i<-1 to n 9. do for j<-1 to n 10. if dij(k-1)+dkj(k-1) 11. do Tij = Tij(k-1) 12. else 13. do Tij = Tkj(k-1) PRINT-ALL-PAIRS-SHORTEST-PATH(i,j) 1. if i=j 2. then print j 3. else print PRINT-ALL-PAIRS-SHORTEST-PATH(i,Tij) 4. print j

Dynamic programming is a very powerful algorithmic paradigm in which a problem is solved by identifying a collection of sub problems and tackling them one by one, smallest first, using the answers to small problems to help figure out larger ones, until the whole lot of them is solved. In dynamic programming we are not given a dag; the dag is implicit. Its nodes are the sub problems we define, and its edges are the dependencies between the subproblems: if to solve subproblem B we need the answer to subproblem A, then there is a (conceptual) edge from A to B. In this case, A is thought of as a smaller subproblem than B.and it will always be smaller, in an obvious sense.

Unit 9: GREEDY TECHNIQUE. 3 hrs 9.1 Prims Algorithm 9.2 Kruskals Algorithm 9.3 Dijkstras Algorithm

Minimum Spanning Trees Spanning trees A spanning tree of a graph is just a sub graph that contains all the vertices and is a tree. A graph may have many spanning trees; for instance the complete graph on four vertices has sixteen spanning trees: Minimum spanning trees Now suppose the edges of the graph have weights or lengths. The weight of a tree is just the sum of weights of its edges. Obviously, different trees have different lengths. The problem: how to find the minimum length spanning tree? Why minimum spanning trees? The standard application is to a problem like phone network design. You have a business with several offices; you want to lease phone lines to connect them up with each other; and the phone company charges different amounts of money to connect different pairs of cities. You want a set of lines that connects all your offices with a minimum total cost. It should be a spanning tree, since if a network isn't a tree you can always remove some edges and save money. A less obvious application is that the minimum spanning tree can be used to approximately solve the traveling salesman problem. A convenient formal way of defining this problem is to find the shortest path that visits each point at least once. How to find minimum spanning tree? A better idea is to find some key property of the MST that lets us be sure that some edge is part of it, and use this property to build up the MST one edge at a time. Kruskal's algorithm sort the edges of G in increasing order by length keep a subgraph S of G, initially empty for each edge e in sorted order if the endpoints of e are disconnected in S add e to S return S Note that, whenever you add an edge (u,v), it's always the smallest connecting the part of S reachable from u with the rest of G, so by the lemma it must be part of the MST.

This algorithm is known as a greedy algorithm, because it chooses at each step the cheapest edge to add to S. if you want to find a shortest path from a to b, it might be a bad idea to keep taking the shortest edges. The greedy idea only works in Kruskal's algorithm because of the key property we proved. Analysis: The line testing whether two endpoints are disconnected looks like it should be slow (linear time per iteration, or O(mn) total). But actually there are some complicated data structures that let us perform each test in close to constant time; this is known as the union-find problem and is discussed in Baase section 8.5 (I won't get to it in this class, though). The slowest part turns out to be the sorting step, which takes O(m log n) time. Prim's algorithm Rather than build a subgraph one edge at a time, Prim's algorithm builds a tree one vertex at a time. Prim's algorithm: let T be a single vertex x while (T has fewer than n vertices) { find the smallest edge connecting T to G-T add it to T } Since each edge added is the smallest connecting T to G-T, the lemma we proved shows that we only add edges that should be part of the MST. Again, it looks like the loop has a slow step in it. But again, some data structures can be used to speed this up. The idea is to use a heap to remember, for each vertex, the smallest edge connecting T with that vertex. Prim with heaps: make a heap of values (vertex,edge,weight(edge)) initially (v,-,infinity) for each vertex let tree T be empty while (T has fewer than n vertices) { let (v,e,weight(e)) have the smallest weight in the heap remove (v,e,weight(e)) from the heap add v and e to T for each edge f=(u,v) if u is not already in T find value (u,g,weight(g)) in heap if weight(f) < weight(g) replace (u,g,weight(g)) with (u,f,weight(f)) } Analysis: We perform n steps in which we remove the smallest element in the heap, and at most 2m steps in which we examine an edge f=(u,v). For each of those steps, we might replace a value on the heap, reducing it's weight. (You also have to find the right value on the heap, but that can be done easily enough by keeping a pointer from the vertices to the corresponding values.) I haven't described how to reduce the weight of an element of a binary heap, but it's easy to do in O(log n) time. Alternately by using a more complicated data structure known as a Fibonacci heap, you can reduce the weight of an element in constant time. The result is a total time bound of O(m + n log n).

The shortest path problem Consider the problem of finding the shortest path between nodes s and t in a graph (directed or undirected). We already know an algorithm that will solve it for unweighted graphs BFS. Now, what if the egdes have weights? Consider the dist[] array that we used in BFS to store the current shortest known distance from the source to all other vertices. BFS can be thought of as repeatedly taking the closest known vertex, u, and applying the following procedure to all of its neighbours, v. bool relax( int u, int v ) { if( dist[v] <= dist[u] + 1 ) return false; dist[v] = dist[u] + 1; return true; } The procedure relax() returns true if we can improve our current best known shortest path from s to v by using the edge (u, v). In that case, BFS also updates dist[v] and adds v to the back of the queue. Imagine colouring all vertices white before running BFS. Then all the vertices on the queue can be considered gray, and all the vertices that have been processed and removed from the queue are black. We can prove that BFS works by demonstrating the following invariant: at the beginning of each iteration, dist[v] is equal to the shortest path distance from s to v for all black vertices, v. At the beginning, the invariant is true because we have no black vertices. During each iteration of BFS, we pick the closest known vertex, u, (one of them, if there are several) and execute relax(u, v) on all of its neighbours, v. Finally, we colour u black (pop it from the queue). Since u was the the closest vertex (to the source), any other path to u that we might discover during subsequent iterations must be longer than dist[u]. Hence, the invariant holds for u the only new black vertex that we get during one iteration. Eventually, when BFS terminates, dist[v] will be set to the length of the shortest path for all black (visited) vertices, v. All other vertices will have dist[] set to infinity " unreachable". Dijkstra's algorithm The reason why BFS does not work for weighted graphs is very simple we can no longer guarantee that the vertex at the front of the queue is the vertex closest to s. It is certainly the closest in terms of the number of edges used to reach it, but not in terms of the sum of edge weights. But we can fix this easily. Instead of using a plain queue, we can use a priority queue in which vertices are sorted by their increasing dist[] value. Then at each iteration, we will pick the vertex, u, with smallest dist[u] value and call relax(u, v) on all of its neighbours, v. The only difference is that now we add the weight of the edge (u, v) to our distance instead of just adding 1. bool relax( int u, int v ) { int newDist = dist[u] + weight[u][v]; if( dist[v] <= newDist ) return false; dist[v] = newDist; return true; } The proof of correctness is exactly the same as for BFS the same loop invariant holds. However, the algorithm only works as long as we do not have edges with negative weights. Otherwise, there is no guarantee that when we pick u as the closest vertex, dist[v] for some other vertex v will not become smaller than dist[u] at some time in the future.

There are several ways to implement Dijkstra' as algorithm. The main challenge is maintaining a priority queue of vertices that provides 3 operations inserting new vertices to the queue, removing the vertex with smallest dist[], and decreasing the dist[] value of some vertex during relaxation. We can use a set to represent the queue. This way, the implementation looks remarkably similar to BFS. In the following example, assume that graph[i][j] contains the weight of the edge (i, j). Example 1: O(n2+(m+n) log(n)) Dijkstra's int graph[128][128]; // -1 means "no edge" int n; // number of vertices (at most 128) int dist[128]; // Compares 2 vertices first by distance and then by vertex number struct ltDist { bool operator()( int u, int v ) const { return make_pair( dist[u], u ) < make_pair( dist[v], v ); } } void dijkstra( int s ) { for( int i = 0; i < n; i++ ) dist[i] = INT_MAX; dist[s] = 0; set< int, ltDist > q; q.insert( s ); while( !q.empty() ) { int u = *q.begin(); // like u = q.front() q.erase( q.begin() ); // like q.pop() for( int v = 0; v < n; v++ ) if( graph[u][v] != 1 ){ int newDist = dist[u] + graph[u][v]; if( newDist < dist[v] ) // relaxation { if( q.count( v ) ) q.erase( v ); dist[v] = newDist; q.insert( v ); } } } } First, we define a comparator that compares vertices by their dist[] value. Note that we can't simply do "return dist[u] < dist[v];" because a set keeps only one copy of each unique element, and so using this simpler comparison would disallow vertices with the same dist[] value. Instead, we exploit the built in lexicographic comparison for pairs. The dijkstra() function takes a source vertex and fills in the dist[] array with shortest path distances from s. First, all distances are initialized to infinity, except for dist[s], which is set to 0. Then s is added to the queue and we proceed like in BFS: remove the first vertex, u, and scan all of its neighbours, v. Compute the new distance to v, and if it's better than our current known distance, update it. The order of the 3 lines inside the innermost 'f'i statement is crucial. Note that the set q is sorted by dist[] values, so we can't simply change dist[v] to a new value what if v is in q? This is why we first need to remove v from the set, then change dist[v] and after that add it.

The running time is n*log(n) for removing n vertices from the queue, plus m*log(n) for inserting into and updating the queue for each edge, plus n*n for running the 'for(v)' loop for each vertex u. We can avoid the quadratic cost by using an adjacency list, for a total of O((m+n)log(n)). Another way to implement the priority queue is to scan the dist[] array every time to find the closest vertex, u. Example 2: O(n^2) Dijkstra's int graph[128][128], n; // -1 means "no edge" int dist[128]; bool done[128]; void dijkstra( int s ) { for( int i = 0; i < n; i++ ) { dist[i] = INT_MAX; done[i] = false; } dist[s] = 0; while( true ) { // find the vertex with the smallest dist[] value int u = 1, bestDist = INT_MAX; for( int i = 0; i < n; i++ ) if( !done[i] && dist[i] < bestDist ) { u = i; bestDist = dist[i]; } if( bestDist == INT_MAX ) break; // relax neighbouring edges for( int v = 0; v < n; v++ ) if( !done[v] && graph[u][v] != 1 ){ if( dist[v] > dist[u] + graph[u][v] ) dist[v] = dist[u] + graph[u][v]; } done[u] = true; } } We have to introduce a new array, done[]. We could also call it "black[]" because it is true for those vertices that have left the queue. First, we initialize done[] to false and dist[] to infinity. Inside the main loop, we scan the dist[] array to find the vertex, u, with minimal dist[] value that is not black yet. If we can't find one, we break from the loop. Otherwise, we relax all of us neighbouring edges. This seemingly low tech method is actually pretty clever in terms of running time. The main while() loop executes at most n times because at the end we always set done[u] to true for some u, and we can only do that n times before they are all true. Inside the loop, we do O(n) work in two simple loops. The total is O(n2), which is faster than the first implementation as long as the graph is fairly dense ( m >n2 /log(n) ). This is if we do use an adjacency list in the first implementation; otherwise, the second one will almost always be faster). Dijkstra's algorithm is very fast, but it suffers from its inability to deal with negative edge weights. Having negative edges in a graph may also introduce negative weight cycles that make us rethink the very definition of "shortest path".

Unit 9: GREEDY TECHNIQUE. 3 hrs 9.1 Prims Algorithm 9.2 Kruskals Algorithm 9.3 Dijkstras Algorithm Minimum Spanning Tree (MST) A minimum spanning tree is a subgraph of an undirected weighted graph G, such that it is a tree (i.e., it is acyclic) it covers all the vertices V contains |V| - 1 edges the total cost associated with tree edges is the minimum among all possible spanning trees not necessarily unique Applications of MST Any time you want to visit all vertices in a graph at minimum cost (e.g., wire routing on printed circuit boards, sewer pipe layout, road planning) Internet content distribution $$$, also a hot research topic Idea: publisher produces web pages, content distribution network replicates web pages to many locations so consumers can access at higher speed MST may not be good enough! content distribution on minimum cost tree may take a long time!

Prims Algorithm Let V ={1,2,..,n} and U be the set of vertices that makes the MST and T be the MST Initially : U = {1} and T = while (U V) let (u,v) be the lowest cost edge such that u U and v V-U T = T {(u,v)} U = U {v} Prims Algorithm implementation Initialization a. Pick a vertex r to be the root b. Set D(r) = 0, parent(r) = null c. For all vertices v V, v r, set D(v) = d. Insert all vertices into priority queue P, using distances as the keys

Vertex e Prims Algorithm 1. Select the next vertex u to add to the tree u = P.deleteMin() 2. Update the weight of each vertex w adjacent to (i.e., w P) If weight(u,w) < D(w), a. parent(w) = u b. D(w) = weight(u,w)

Parent -

u which is not in the tree

c. Update the priority queue to reflect new distance for w

Vertex e b c d

Parent e e e

The MST initially consists of the vertex e, and we update the distances and parent for its adjacent vertices

The final minimum spanning tree

Vertex e b c d a

Parent e d e d

Running time of Prims algorithm Initialization of priority queue (array): O(|V|) Update loop: |V| calls

Choosing vertex with minimum cost edge: O(|V|) Updating distance values of unconnected vertices: each edge is considered only once during entire execution, for a total of O(|E|) updates O(|E| + |V| 2)

Overall cost:

Another Approach Kruskals

Create a forest of trees from the vertices Repeatedly merge trees by adding safe edges until only one tree remains A safe edge is an edge of minimum weight which does not create a cycle

forest: {a}, {b}, {c}, {d}, {e}

Initialization a. Create a set for each vertex v V b. Initialize the set of safe edges A, comprising the MST to the empty set. c. Sort edges by increasing weight {a}, {b}, {c}, {d}, {e} A= E = {(a,d), (c,d), (d,e), (a,c), (b,e), (c,e), (b,d), (a,b)} For each edge (u,v) E in increasing order while more than one set remains: If u and v, belong to different sets a. A = A {(u,v)} b. merge the sets containing u and v Return A

Use Union-Find algorithm to efficiently determine if u and v belong to different sets Forest {a}, {b}, {c}, {d}, {e} {a,d}, {b}, {c}, {e} {a,d,c}, {b}, {e} {a,d,c,e}, {b} {a,d,c,e,b} A {(a,d)} {(a,d), (c,d)} {(a,d), (c,d), (d,e)} {(a,d), (c,d), (d,e), (b,e)}

After each iteration, every tree in the forest is a MST of the vertices it connects Algorithm terminates when all vertices are connected into one tree Like Dijkstras algorithm, both Prims and Kruskals algorithms are greedy algorithms The greedy approach works for the MST problem; however, it does not work for many other problems!

Dijkstra's Algorithm Djikstra's algorithm (named after its discover, E.W. Dijkstra) solves the problem of finding the shortest path from a point in a graph (the source) to a destination. It turns out that one can find the shortest paths from a given source to all points in a graph in the same time, hence this problem is sometimes called the single-source shortest paths problem. The somewhat unexpected result that all the paths can be found as easily as one further demonstrates the value of reading the literature on algorithms! This problem is related to the spanning tree one. The graph representing all the paths from one vertex to all the others must be a spanning tree - it must include all vertices. There will also be

no cycles as a cycle would define more than one path from the selected vertex to at least one other vertex. For a graph, G = (V,E) where

V is a set of vertices and E is a set of edges.

Dijkstra's algorithm keeps two sets of vertices: S the set of vertices whose shortest paths from the source have already been determined and V-S the remaining vertices. The other data structures needed are: d array of best estimates of shortest path to each vertex pi an array of predecessors for each vertex The basic mode of operation is:
1. Initialise d and pi, 2. Set S to empty, 3. While there are still vertices in V-S, i. Sort the vertices in V-S according to the current best estimate of their distance ii. iii.

from the source, Add u, the closest vertex in V-S, to S, Relax all the vertices still in V-S connected to u

Relaxation The relaxation process updates the costs of all the vertices, v, connected to a vertex, u, if we could improve the best estimate of the shortest path to v by including (u,v) in the path to v. The relaxation procedure proceeds as follows: initialise_single_source( Graph g, Node s ) for each vertex v in Vertices( g ) g.d[v] := infinity g.pi[v] := nil g.d[s] := 0; This sets up the graph so that each node has no predecessor (pi[v] = nil) and the estimates of the cost (distance) of each node from the source (d[v]) are infinite, except for the source node itself (d[s] = 0). Note that we have also introduced a further way to store a graph (or part of a graph - as this structure can only store a spanning tree), the predecessor subgraph - the list of predecessors of each node, pi[j], 1 <= j <= |V| The edges in the predecessor sub-graph are (pi[v],v). The relaxation procedure checks whether the current best estimate of the shortest distance to v (d[v]) can be improved by going through u (i.e. by making u the predecessor of v): relax( Node u, Node v, double w[ ] [ ] ) if d[v] > d[u] + w[u,v] then d[v] := d[u] + w[u,v] pi[v] := u

The algorithm itself is now: shortest_paths( Graph g, Node s ) initialise_single_source( g, s ) S := { 0 } /* Make S empty */ Q := Vertices( g ) /* Put the vertices in a PQ */ while not Empty(Q) u := ExtractCheapest( Q ); AddNode( S, u ); /* Add u to S */ for each vertex v in Adjacent( u ) relax( u, v, w ) Operation of Dijkstra's Algorithm This sequence of diagrams illustrates the operation of Dijkstra's Algorithm.

Initial graph All nodes have infinite cost except the source

Choose the closest node to s. As we initialised d[s] to 0, it's s. Add it to S Relax all nodes adjacent to s. Update predecessor (red arrows) for all nodes updated.

Choose the closest node, x Relax all nodes adjacent to x Update predecessors for u, v and y.

Now y is the closest, add it to S. Relax v and adjust its predecessor.

u is now closest, choose it and adjust its neighbour, v.

Finally, add v. The predecessor list now defines the shortest path from each node to s.

Unit 10: BACK TRACKIN, BRANCH &BOUND 5 hrs 10.1 n-queens problem 10.2 subset- sum problem 10.3 Assignment problem 10.4 Knapsack problem 10.5 Travelling-salesman problem. Sum of Subsets Problem. Given n distinct positive numbers (usually called weights ) and we desire to find all combinations of these numbers whose sums are M. This is called the Sum of Subsets Problem. Ex : w=(5,7,10,12,16,18,20) M=35 No of elements in set n=6 Solution space is 26

we have to search solution space to determine the solution of the problem instance This searching is facilitated by using tree organization for solution space which is called state space tree. If depth first node generation strategy is used for the generation of problem state with bounding functions it is called Backtracking Many problems which deal with searching for a set of solutions satisfying some constraints can be solved using Backtracking Conditions 1)Weights are in non decreasing order 2)Weight (w1)<=m 3)Sum of all the weights is >=m AlgorithmSumofsub(s,k,r) { //Generate the left child X[ k ] := 1; If (s + w[ k ] = m then write (X [1:K] ) Else if (s + w[ k ] + w[ k+1 ] =< m) Then SumOfSub( s + w[ k ], k+1, r -w[ k ]); //generate the right child If ((s + r w[ k ]) >= m) and ((s + w[ k+1 ]) =< m) then { X[ k ] :=0; SumOfSub(s, k+1, r -w[ k ]); } } Comparison between Backtracking and Branch & Bound Both are used to generate problem states in tree organization. Bounding function is used in both the techniques to kill the nodes. Backtracking is used for constraint satisfaction problems whereas B&B is used for optimization problem. Strategy used in backtracking is depth first search.

Das könnte Ihnen auch gefallen