Sie sind auf Seite 1von 182

1

Data Structure Defn:


A data structure is a specialized format for organizing and storing data. General data structure types include the array, the file, the record, the table, the tree, and so on. Data structure is designed to organize data to suit a specific purpose so that it can be accessed and worked with in appropriate ways The logical or mathematical model of a particular organization of data is called a data structure. A carefully chosen data structure will allow the most efficient algorithm to be used. A well-designed data structure allows a variety of critical operations to be performed, using as few resources, both execution time and memory space, as possible.

Abstract Data Type (ADT): An ADT is a set of elements with a collection of well defined
operations. The operations can take as operands not only instances of the ADT but other types of operands or instances of other ADTs. Similarly results need not be instances of the ADT At least one operand or the result is of the ADT type in question.

Examples of ADTs include list, stack, queue, set, tree, graph, etc.

Complexity of Algorithms
It is very convenient to classify algorithms based on the relative amount of time or relative amount of space they require and specify the growth of time /space requirements as a function of the input size. Thus, we have the notions of: Time Complexity: Running time of the program as a function of the size of input Space Complexity: Amount of computer memory required during the program execution, as a function of the input size.

Algorithm Analysis and Performance Prediction:


Algorithm is a finite sequence of instructions that the computer follows to solve a problem. Once an algorithm for a problem is found to be correct, next step is to find out the resources (time and space) the algorithm will require. This is known as algorithm analysis. If your algorithm requires more resources than your computer has (such as gigabytes of main memory), it is useless. Data structures and algorithms are interrelated and should be studied together. Because, the algorithms are the methods used in systematic problem solving. Without methods for storing data in them, retrieving data from them and performing computational operations on the data in them data structures are meaningless. Thus, we have to study algorithms as well. The computation time and memory space required by data structures and algorithms that operate on them are important.
.

Algorithm Analysis
The finiteness condition implies that an algorithm never goes into an infinite loop no matter what input we give it. It is difficult to predict the actual computation time of an algorithm without knowing the intimate details of the computer architecture, the compiler, the quality of the program and the other factors. But, we can measure the time for a given algorithm by using some special performance programs called benchmarks. It is also possible to predict the performance by looking at the growth rate of an algorithm. It is known that the running time of the algorithm is a function of the input size such as the number of elements in an array, the number of records in a file etc... The amount of time that any algorithm takes to run depends on the amount of input it must process. Ex. Sorting an array of 10000 elements require more processing time than sorting an array of 100 elements. Another example: it is common to write programs whose running time varies with the square of the problem size. Thus, a program taking 1 sec to complete a file handling problem with 10 records in the file, require 4 sec for 20 records (not 2 sec). Increasing the file size by a factor of 10; ie 100 records will increase the running time to 100 sec or 1000 records will require 10000sec (3 hours) to complete!! And 10000 records will require almost two weeks to finish. This is a long time compared to 1 sec for 10 records test. This example shows that we need to know something about the growth rate of our algorithm as test program running time may grow to unacceptable values when real-world-sized data is used. An experienced programmer estimates the performance of the algorithm and takes some actions if there is any. In some cases there may be no alternative solution to the program running in "squared" time but at least the programmer will not be surprised at the end.

Asymptotic Notation
Suppose we are considering two algorithms, A and B, for solving a given problem. Furthermore, let us say that we have done a careful analysis of the running times of each of the algorithms and determined them to be TA(n)and TB(n), respectively, where n is a measure of the problem size. Then it should be a fairly simple matter to compare the two functions TA(n)and TB(n),to determine which algorithm is the best! In the general case, we have no a priori knowledge of the problem size. However, if it can be shown that TA(n)TB(n), say, that for all n0, then algorithm A is better than algorithm B regardless of the problem size. Unfortunately, we usually don't know the problem size before hand, nor is it true that one of the functions is less than or equal the other over the entire range of problem sizes. In this case, we consider the asymptotic behavior of the two functions for very large problem siz Definition: Asymptotic notation is used to give a rough estimate of the rate of growth of a formula. The formula usually gives the run time of an algorithm. Informally, O notation is the leading (i.e. quickest growing) term of a formula with the coefficient stripped. An Asymptotic Upper Bound-Big Oh:

3 In 1892, P. Bachmann invented a notation for characterizing the asymptotic behavior of functions. His invention has come to be known as big oh notation: Definition (Big Oh) Consider a function f(n) which is non-negative for all integers n0 . We say that ``f(n) is big oh g(n),'' which we write f(n)=O(g(n)), if there exists an integer n0 and a constant c>0 such that for all integers n n0, f(n) cg(n).

E.g.

n, 1000n and 5n+2 - O(n) n+log n are O(n) n2 +n+log n, 10n2 +n - O(n2) n log n+10n - O(n log n) 10 log2 n - O(log2 n) 10000, 250 and 4 - O(1)

Computing big-O of an Algorithm: Following is a shorter way to compute big-O for an algorithm: Atomic operations- Constant time Consecutive statements- Sum of times Conditionals- Larger branch time plus test time Loops- Sum of iterations Function Calls - Time of function body Recursive Functions - Solve Recurrence Relation

Conventions for Writing Big Oh Expressions


Certain conventions have evolved which concern how big oh expressions are normally written:

First, it is common practice when writing big oh expressions to drop all but the most significant terms. Thus, instead of we simply write . , we simply

Second, it is common practice to drop constant coefficients. Thus, instead of

write . As a special case of this rule, if the function is a constant, instead of, say O(1024), we simply write O(1). An Asymptotic Lower Bound-Omega The big oh is an asymptotic upper bound. In this section, we introduce a similar notation for characterizing the asymptotic behavior of a function, but in this case it is a lower bound.

4 Consider a function f(n) which is non-negative for all integers , if there exists an integer . , . We say and a constant

Definition (Omega)

that ``f(n) is omega g(n),'' which we write c>0 such that for all integers

The definition of omega is almost identical to that of big oh. The only difference is in the comparison-for big oh it is ; for omega, it is .

1.2 Growth Rates


The approximate computation time is formulatingd it in terms of the problem size N. If we consider that the system dependent factor (such as the compiler, language, computer) is constant; not varying with the problem size we can factor it out from the growth rate. The growth rate is the part of the formula that varies with the problem size. We use a notation called O-notation ("growth rate", "big-O"). The most common growth rates in data structure are: expression name O(1) constant logarithmic log squared O(n) linear n log n quadratic cubic exponential Table: The names of common big oh expressions. If you calculate these values you will see that as N grows log(N) remains quite small and Nlog(N) grow fairly large but not as large as N2 . Ex. Most sorting algorithms have growth rates of Nlog(N) or N2 . Following table shows the growth rates for a given N.

1.3 Estimating the Growth Rate


Algorithms are developed in a structured way; they combine simple statements into complex blocks in four ways: Sequence, writing one statement below another Decision, if-then or if-then-else Loops Subprogram call Let us estimate the big-O of some algorithm structures: Simple statements: We assume that statement does not contain a function call. It takes a fixed amount to execute. We denote the performance by O(1), if we factor out the constant execution time we are left with 1. Sequence of simple statements: It takes an amount of execution time equal to the sum of execution times of individual statements. If the performance of individual statements are O(1), so is their sum. Decision: For estimating the performance, then and else parts of the algorithm are considered independently. The performance estimate of the decision is taken to be the largest of the two individual big Os. For the case structure, we take the largest big O of all the case alternatives. Simple counting loop: This is the type of loop in which the counter is incremented or decrement each time the loop is executed (for loop). If the loop contains simple statements and the number of times the loop executes is a constant; in other words, independent of the problem size then the performance of the whole loop is O(1). On the other hand if the loop is like Ex: for (i=0; i< N; i++) the number of trips depends on N; the input size, so the performance is O(N). Nested loops: The performance depends on the counters at each nested loop. For ex: Ex: for (i=0; i< N; i++) { for (j=0; j< N; i++) {
sequence of simple statements

} }

the outer loop count is N but the inner loop executes N times for each time. So the body of the inner loop will execute N*N and the entire performance will be O(N2). Ex: for (i=1; i<=N; i++) { for (j=0; j< i; j++) {
sequence of simple statments

}
}

In this case outer count trip is N, but the trip count of the inner loop depends not only N, but the value of the outer loop counter as well. If outer counter is 1, the inner loop has a trip count 1 and so on. If outer counter is N the inner loop trip count is N. How many times the body will be executed? 1+2+3+(N-1)+N = N(N+1) / 2 = ((N2) +N )/2 Therefore the performance is O(N2). For large N the contribution of the N/2 term is negligible Generalization: A structure with k nested counting loops where the counter is just incremented or decrement by one has performance O(Nk) if the trip counts depends on the problem size only. While loops: The control variable is multiplied or divided, each time the loop iteration is performed. Each loop has an initialization step, a termination condition and a modification step indicating how the control variable should be changed. In while-structure, the termination condition is checked before the iteration. Let's consider the following: control=1; while (control < n) {
Simple statements;

control=2*control; } In the above example performance depends on the problem size N. The control variable is multiplied by 2 until it gets larger than N. The initial value of control is 1, after k iterations we will have control = 2k In order to find k we take the log of both sides; log2 (control) = log22k , log2 (control)=k . Since the loop stops when control > N , the performance of the algorithm is O(log2 N ). Generalization: Assume that we multiply the control by some other constant; say fact.

7 Then after k iterations; control = fact k so, the performance is O(log(N)) where the log is taken to the base fact. In considering the performance base does not matter as form one base to another the additional factor is a constant. Quadratic algorithms are impractical for input sizes exceeding a few thousand. Cubic algorithms are impractical for input sizes exceeding a few hundred.

2.4.2. General Rules


RULE 1-FOR LOOPS:

The running time of a for loop is at most the running time of the statements inside the for loop (including tests)times the number of iterations.
RULE 2-NESTED FOR LOOPS:

Analyze these inside out. The total running time of a statement inside a group of nested for loops is the running time of the statement multiplied by the product of the sizes of all the for loops. As an example, the following program fragment is O(n2):
for( i=0; i<n; i++ ) for( j=0; j<n; j++ ) k++;

RULE 3-CONSECUTIVE STATEMENTS:

These just add (which means that the maximum is the one that counts). As an example, the following program fragment, which has O(n) work followed by O (n2) work, is also O (n2):
for( a[i] for( for( a[i] i=0; i<n; = 0; i=0; i<n; j=0; j<n; += a[j] + i++) i++ ) j++ ) i + j;

RULE 4-lF/ELSE:

For the fragment


if( cond ) S1 else S2

the running time of an if/else statement is never more than the running time of the test plus the larger of the running times of S1 and S2.

8 STACK A stack is a linear data structure for collection of items , with the restriction that items can be added one at a time and can only be removed in the reverse order in which they were added (i-e Last In First Out LIFO). The last item represents the top of the stack. Such a stack resembles a stack of trays in a cafeteria, or stack of boxes. Only the top tray can be removed from the stack and it is the last one that was added to the stack. A tray can be removed only if there are some trays on the stack, and a tray can be added only if there is enough room to hold more trays. The common operations associated with a stack are as follows: 1. push: adds a new item on top of a stack. 2. pop: removes the item on the top of a stack 3. isEmpty: Check to see if the stack is empty 4. isFull: Check to see if stack is already full 5. returnTop: Indicate which item is at the top Applications: A stack is very useful in situations when data have to be stored and then retrieved in the reverse order. 1. Function Calls 2. convert an infix expression into postfix form. e.g. Infix a+b*c (a+b)*c (a + b) * (c d) Postfix abc*+ ab+c* ab+cd-*

3. Evaluation of arithmetic expressions


e.g

5 * 3 +2 + 6 * 4 = Infix to postfix conversion:

53*2+64*+

= 41

A stack can also be used to convert an infix expression in standard form into postfix form. Infix: operator is between operands A + B Postfix : operator follows operands A B + We shall assume that the expression is a legal one (i.e. it is possible to evaluate it). 1. When an operand is read, it will be placed on output list (printed out straight away).

9 2. The operators are pushed on a stack. However, if the priority of the top operator in the stack is higher than the operator being read, then it will be put on output list, and the new operator pushed on to the stack. The priority is assigned as follows. 1. ( Left parenthesis in the expression 2. * / 3. + 4. ( Left parenthesis inside the stack The association is assumed to be left to right. The left parenthesis has the highest priority when it is read from the expression, but once it is on the stack, it assumes the lowest priority. Algorithm: while there are more characters in the input { Read next symbol ch in the given infix expression. If ch is an operand put it on the output. If ch is an operator i.e.* , /, +, -, or ( { If stack is empty push ch onto stack; Else check the item op at the top of the stack; end while (more items in stack && priority(ch)<= priority(op)) { pop op and append it to the output, provided it is not an open parenthesis op = top element of stack } push ch onto stack } If ch is right parenthesis ) Pop items from stack until left parenthesis is reached Pop left parenthesis and discard both left and right parenthesis }/* now no more characters in the infix expression*/ Pop remaining items in the stack to the output.

10 Graphical representation: e-g 1

Resulting Postfix Expression : M P K T * + E.g 2 convert 2*3/(2-1)+5*(4-1) into Postfix expression

e.g 3: a + (( b * c ) / d ) a + ( ( b c * ) /d ) (precedence of * and / are same and they are left associative) a+(bc*d/) abc*d/+

11 Evaluating a Postfix Expression We can evaluate a postfix expression using a stack. 1. Each operator in a postfix string corresponds to the previous two operands . 2. Each time we read an operand we push it onto a stack. 3. When we reach an operator its associated operands (the top two elements on the stack ) are popped out from the stack. 4. We then perform the indicated operation on them and push the result on top of the stack so that it will be available for use as one of the operands for the next operator .

12 Implementation of stacks using arrays: /* Program of stack using array*/ #include <STDIO.H> #define MAX 50 int top = -1; int stack_arr[MAX]; main() { int choice; while(1) { switch(choice) { case 1 : push(); break; case 2: pop(); break; case 3: display(); break; case 4: exit(1); default: printf("Wrong choice\n"); }/*End of switch*/ }/*End of while*/ }/*End of main()*/

push() { int pushed_item; if(top = = (MAX-1)) printf("Stack Overflow\n"); else { printf("Enter the item to be pushed in stack : "); scanf("%d",&pushed_item); top=top+1; stack_arr[top] = pushed_item; }

13 }/*End of push()*/

pop() { if(top = = -1) printf("Stack Underflow\n"); else { printf("Popped element is : %d\n",stack_arr[top]); top=top-1; } }/*End of pop()*/

display() { int i; if(top = = -1) printf("Stack is empty\n"); else { printf("Stack elements :\n"); for(i = top; i >=0; i--) printf("%d\n", stack_arr[i] ); } }/*End of display()*/

/* Program for conversion of infix to postfix and evaluation of postfix. It will take only single digit in expression */ #include <stdio.h> #define Blank ' ' #define Tab '\t' #define MAX 50 long int pop (); long int eval_post(); char infix[MAX], postfix[MAX]; long int stack[MAX]; int top;

14 main() { long int value; top = 0; printf("Enter infix : "); gets(infix); infix_to_postfix(); // calling infix to postfix conversion function printf("Postfix : %s\n",postfix); value=eval_post(); // caling the function for evaluating the expression printf("Value of expression : %ld\n",value); printf("Want to continue(y/n) : "); }/*End of main()*/ infix_to_postfix() { int i,p=0,type,precedence; char next ; for(i=0; infix[i]!=\0; i++) { if( !white_space(infix[i])) { switch(infix[i]) { case '(': push(infix[i]); break; case ')': while((next = pop()) != '(' ) postfix[p++] = next; break; case '+': case '-': case '*': case '/': case '%': case '^': precedence = prec(infix[i]); while(top!=len && precedence<= prec(stack[top])) postfix[p++] = pop(); push(infix[i]); break; default: /*if an operand comes */ postfix[p++] = infix[i];

15 } } }
/*End of switch */ /*End of if */ /*End of for */

while(top!=len) postfix[p++] = pop(); postfix[p] = '\0' ; /*End postfix with'\0' to make it a string*/ }
/*End of infix_to_postfix()*/

/* This function returns the precedence of the operator */ prec(char symbol ) { switch(symbol) { case '(': return 0; case '+': case '-': return 1; case '*': case '/': case '%': return 2; case '^': return 3; } /*End of switch*/ } /*End of prec()*/

push(long int symbol) { if(top > MAX) { printf("Stack overflow\n"); exit(1); } else { top=top+1; stack[top] = symbol; } } /*End of push()*/ long int pop() {

16 if (top == -1 ) { printf("Stack underflow \n"); exit(2); } else return (stack[top--]); }


/*End of pop()*/

// function to check the symbol is a blank space or not white_space(char symbol) { if( symbol == Blank || symbol == Tab || symbol == '\0') return 1; else return 0; } /*End of white_space()*/

// function to evaluate an expression long int eval_post() { long int a,b,temp,result,len; int i; len=strlen(postfix); postfix[len]='#'; for(i=0;postfix[i]!='#';i++) { if(postfix[i]<='9' && postfix[i]>='0') push( postfix[i]-48 ); else { a=pop(); b=pop(); switch(postfix[i]) { case '+': temp=b+a; break; case '-': temp=b-a;

17 break; case '*': temp=b*a; break; case '/': temp=b/a; break; case '%': temp=b%a; break; case '^': temp=pow(b,a); } push(temp);
/*End of switch */

} /*End of else*/ } /*End of for */ result=pop(); return result; }


/*End of eval_post */

Program to convert decimal number to binary number /* Program of stack using array*/ #include <STDIO.H> #define MAX 50 int top = -1; int stack_arr[MAX]; main() { int dec; printf( enter a decimal number\n) scanf(%d,&dec); while(n>0) { rem = dec/2; push(rem); dec=dec/2; } printf(binary number : \n); while(top!=-1) { b=pop(); printf(%d,b) } }

18

19 Queue A queue is simply a waiting line that grows by adding elements to its end and shrinks by removing elements from the front. Compared to stack, it reflects the more commonly used maxim in real-world, namely, first come, first served. Waiting lines in supermarkets, banks, food counters are common examples of queues. Definition: It is a list from which items may be deleted at one end (front) and into which items may be inserted at the other end (rear). It is also referred to as a first-in-first-out (FIFO) data structure.

Operations on Queue: 1. enqueue (q, x) inserts item x at the rear of the queue q 2. x = dequeue (q) removes the front element from q and returns its value. 3. isEmpty(q) Check to see if the queue is empty. 4. isFull(q) checks to see if there is space to insert more items in the queue. Queue overflow results from trying to add an element onto a full queue and queue underflow happens when trying to remove an element from an empty queue. Applications of Queues: Queues have many applications in computer systems: 1. Direct application Handling jobs in a single processor computer print spooling transmitting information packets in computer networks 2. Indirect applications Auxiliary data structure for algorithms Component of other data structures

Program for Queue implementation through Array: #include <stdio.h>

20 #include<ctype.h> # define MAXSIZE 200 int q[MAXSIZE]; int front=-1; int rear = -1; void enqueue(int); int dequeue(); void main() { int choice=1,i,num; while(choice ==1) { printf(" MAIN MENU: 1.Add element to queue 2.Delete element from the queue "); scanf("%d",& choice); switch(choice) { case 1: printf("Enter the data... "); scanf("%d",&num); enqueue(num); break; case 2: i=dequeue(); printf("dequeued value is %d ",i); break; default: printf("Invalid Choice ... "); } printf("Do you want to continue press 1 for yes, any other key to exit"); scanf("%d" , & choice); } }
//end of outer while //end of main

void enqueue(int a) { if(rear==MAXSIZE-1) { printf(" QUEUE FULL"); return;

21 } else { rear = rear+1 q[rear]=a; printf(" Value of %d",rear,front); } }

rear

%d

and

the

value

of

front

is

int dequeue() { int a; if(front == rear) { printf(" QUEUE EMPTY"); return(0); } else { front = front+1 a=q[front]; } return(a); }

22

Circular Queue
Linear queues have a very big drawback that once the queue is FULL, even though we delete few elements from the "front" and relieve some occupied space, we are not able to add anymore elements, as the "rear" has already reached the Queue's rear most position.

Solution: Once "rear" reaches the Queue's maximum limit, the "first" element will become the queue's new "rear". Now the Queue is not straight but circular. i-e once the Queue is full the "First" element of the Queue becomes the "Rear" most element, if and only if the "Front" has moved forward;

Circular Queue Boundary Conditions:


When the queue is empty, there is no front item and no rear item But if the front and rear variables point to valid array items, the difference between an empty queue and a non-empty queue can be identified by o Method 1: set front and/or rear to an out of range value (such as -1) to indicate an empty queue o Method 2: use a separate counter or Boolean flag to indicate an empty queue, in which case (rear+1)%MAX == front while the queue is empty (note that this is also the case when the queue is full!) o Method 3: always keep at least one queue entry unused. Then an empty queue has condition (rear+1)%MAX == front , while a full queue has condition (rear+2)%MAX == front

23 Program for Circular Queue implementation through Array #include <stdio.h> #include<ctype.h> # define MAXSIZE 8 int cq[MAXSIZE]; int front=-1, rear=-1; void enqueue(int); int dequeue( ); void main() { int choice=1,i,num; clrscr(); while(choice = =1) { printf(" MAIN MENU: \n 1.Add element to Circular Queue \n 2.Delete element from the Circular Queue \n "); scanf("%d",& choice); switch(choice) { case 1: printf("Enter the data... "); scanf("%d",&num); enqueue(num); break; case 2: i=dequeue(); printf("Value returned from dequeue function is break; default: printf("Invalid Choice . "); }

%d ",i);

printf(" Do you want to do more operations on Circular Queue ( 1 for yes, any other key to exit) "); scanf("%d" , &choice); } } //end of //end of main outer while

24 void enqueue(int item) { rear=rear+1; rear= (rear%MAX); if(front = =rear) { printf("CIRCULAR QUEUE FULL"); return; } else { cq[rear]=item; printf("Rear = %d Front = %d ",rear,front); } }

int dequeue() { int a; if(front = = rear) { printf("CIRCULAR STACK EMPTY"); return (0); } else { front=front+1; front = front%MAX; a=cq[front]; printf("Rear = %d Front = %d ",rear,front); return(a); } } Applications: 1. Round robin scheduling It is one of the oldest, simplest, and most widely used scheduling algorithms, designed especially for time-sharing systems. A small unit of time, called timeslice or quantum, is defined. All runnable processes are kept in a circular queue. The CPU scheduler goes around this queue, allocating the CPU to each process for a time interval of one quantum. New processes are added to the tail of the queue. The CPU scheduler picks the first process from the queue, sets a timer to interrupt after one quantum, and dispatches the process.

25 If the process is still running at the end of the quantum, the CPU is preempted and the process is added to the tail of the queue. If the process finishes before the end of the quantum, the process itself releases the CPU voluntarily. In either case, the CPU scheduler assigns the CPU to the next process in the ready queue. Every time a process is granted the CPU, a context switch occurs, which adds overhead to the process execution time.

26

LINKED LISTS List is a ordered collections of objects. The linked list is a data structure which consists of a series of structures, which are not necessarily adjacent in memory. Each structure contains the data and a pointer to a structure containing its successor. We call this the next pointer. The last cell's next pointer points to NULL Arrays In an array each node (element) follows the previous one physically (i.e. contiguous spaces in the memory) Arrays are fixed size: either too big (unused space ) or not big enough (overflow problem) Maximum size of the array must be predicted which is sometimes not possible. Inserting and deleting elements into an array is difficult. Have to do lot of data movement, if in array of size 100, an element is to be inserted after the 10th lement, then all remaining 90 have to be shifted down by one position. Linked Lists Linked lists are appropriate when the number of data elements to be represented in the data structure are not known in advance. Linked lists are dynamic, so the length of a list can increase or decrease as necessary. A linked list is a collection of nodes, each node containing a data element. Each node does not necessarily follow the previous one physically in the memory. Nodes are scattered at random in memory. Insertion and Deletion can be made in Linked lists, by just changing links of a few nodes, without disturbing the rest of the list. This is the greatest advantage. But getting to a particular node may take large number of operations. Every node from start needs to be traversed to reach the particular node. Types: Single linked list Circular linked list Doubly linked list Single linked list : Node structure: A node in a linked list is a structure that has at least two fields. One of the fields is a data field; the other is a pointer that contains the address of the next node in the sequence. struct node { int data; struct node *next; } The pointer variable next is called a link. Each structure is linked to a succeeding structure by way of the field next. The pointer variable next contains an address of the location in memory of the

27 successor structnode element or the special value NULL.

Linked list with actual pointer values

More examples of Nodes: 1. A node with one data field:

struct node { int number; struct node * link; }; 2. A node with 3 data fields:

struct student { char name[20]; int id; double grdPts; struct student *next_student; }; 3. A structure in a node:

struct person { char name[20];

28 char address[30]; char phone[10]; }; struct person_node { struct person data; struct person_node *next; };

A simple Linked List:

The head pointer addresses the first node of the list, and each node points at its successor node. The last node has a link value NULL. Empty List: Empty Linked list is a single pointer having the value of NULL. pHead = NULL; Basic Linked List Operations: 1. Add a node 2. Delete a node 3. Looking up a node 4. List Traversal (e.g. Counting nodes) 1. Add a Node: There are four steps to add a node to a linked list: 1. Allocate memory for the new node. 2. Determine the insertion point (you need to know only the new nodes predecessor) 3. Point the new node to its successor. 4. Point the predecessor to the new node. E.g. adding Two nodes to an Empty List: To form linked list which contains two integers 39 and 60, first define a structure to hold two pieces of information in each node- an integer value, and the address corresponding to the next structure of same type. struct node { int data; struct node *next; };

29 struct node *pNew, *pHead; Use malloc to fetch one node from the memory, and let pNew point to this node. pNew = (struct node *) malloc(sizeof(struct node)); pHead = NULL; Now to store 39 in the data part of the node, we can use (*pNew).data = 39; A more convenient way is to use the notation pNew->data = 39; pNew->next = NULL;

At this moment there are no elements in the list. Pointer pHead points to NULL. First element 39 is stored in node pNew. So make it the head node. pNew->next = pHead; /* set link to NULL*/ pHead = pNew; /* point list to first node*/

Now we have got one node in the list, which is being pointed to by pHead. To put the next value 60 on the list, we must use malloc to fetch another node from the memory. pNew = (struct node *) malloc(sizeof(struct node)); Let us now assign the data value to this node: pNew->data = 60; pNew->next = NULL;

Now we need to link this new node to the pHead node, which can be done by following statement: pHead->next = pNew; This gives us the list of two nodes:

30

Add a node at the end of the list: Given the list pHead

let us say we are interested in adding a node containing 90 at the end of this list. As a first step, use malloc to get a new node pNew from the memory and load 90 in the data field and NULL in the next field. pNew->data = 90; pNew->next = NULL;

Now, the last node containing 80 would become the predecessor of the node containing 90 after the node is added. Let us call this node as pPre. Use a temporary variable pPre and initialize it to pHead. Now hop through the nodes till you get a node whose next link is NULL. Then pPre will be pointing to the last node. pPre = pHead; while ( pPre->next != NULL ) pPre = pPre->next; Now link pPre with pNew . pPre->next = pNew; This would result in the node pNew to be attached to the list as the last node.

Inserting a node in a Linked List: (General case)

Insertion into a linked list

31 So far we have been discussing separate codes for inserting a node in the middle, at the end or at front of a list. We can combine all the codes and write a general code for inserting a node anywhere in the list. Given the head pointer (pHead), the predecessor (pPre) and the data to be inserted (item), we first allocate memory for the new node (pNew) and adjust the link pointers. struct node { int data; struct node *next; }; struct node *pHead; pHead=NULL:

void insert_begin(int val) { struct node *pNew; pNew = (struct node *)malloc(sizeof(struct node)); pNew->data = val; if (pHead == NULL) { /*Adding to empty list*/ pHead = pNew; pHead->next= NULL; } else { /* Adding as first node*/ pNew->next = pHead; pHead = pNew; } } void insert_last(int val) { struct node *pNew, *pTemp; pNew = (struct node *)malloc(sizeof(struct node)); pNew->data = val; pNew->next= NULL; if (pHead == NULL) { /*Adding to empty list*/ pHead = pNew; pHead->next= NULL; } else

32 { /* Adding as last node*/ pTemp = pHead; while(pTemp->next!=NULL) pTemp = pTemp->next; pTemp->next = pNew; pNew->next = NULL; } } Insertion a node into single link List( General case) void insert_between(int val) { struct node *pNew, *pPre, *pTemp; int key; printf(Enter the value after which node to be inserted); scanf(%d, &key); // Allocate memory for the new node pNew = (struct node *)malloc(sizeof(struct node)); pNew->data = val; pNew->next = NULL; if (pHead == NULL) { pHead = pNew; pHead->next= NULL; } else { pTemp = pHead; /* Searching insertion point*/ while(pTemp!=NULL) { if(pTemp->data == key) break; else { pPre = pTemp; pTemp = pTemp->next; } } if( pTemp==NULL) printf( The insertion point cant be found\n); else { /*Adding to empty list*/

33 if ( pTemp==pHead) // inserted before head node { pNew->next = pTemp; pHead = pNew; } else // insert as intermediate or last node { pNew->next = pTemp->next; pTemp->next = pNew; } } } } 2. Delete a node: Deletinganoderequiresthatwelogicallyremovethenodefromthelistbychangingvariouslinkpointers andthenphysicallydeletingthenodefromtheheap. Wecandelete thefirstnode anynodeinthemiddle theendnode Tologicallydeleteanode: 1. first locate the node itself , name the current node as pTemp and its predecessor node as pPre. 2.changethepredecessorslinkfieldtopointtosuccessorofthecurrentnode. 3.recyclethenode(senditbacktomemory)usingfree. void delete (int key, struct node *pHead) { struct node *pTemp, *pPre; pTemp=pHead; if (pHead == NULL) printf( Empty linked list\n);; else // Search node { pTemp = pHead; while (pTemp!=NULL) { if(pTemp->data == key) break; else { pPre = pTemp pTemp = pTemp->next;

34 } }

if( pTemp==NULL) printf(Node not found\n); else { if ( pTemp == pHead ) pHead = pTemp->next; else pPre->next = pTemp->next; free(pTemp) printf(Node deleted\n ); } } }

3. Search for an item in the list: /* Given the item and the pointer to the head of the list, the function returns a pointer to the node which matches the item, or returns NULL if the item is not found */ Struct node *Search (int item, struct node *pHead) { struct node *pTemp; int i=1; pTemp=pHead; if (pHead == NULL) printf( Empty linked list\n);; else // Search node { pTemp = pHead; while (pTemp!=NULL) { if(pTemp->data == key) break; else { pTemp = pTemp->next; i = i+1; } } if( pTemp==NULL) printf(Node cant be found\n); else printf(Node found at location %d\n, i ); }

35 return(pTemp) }

4. Counting the nodes in a List: int count(struct node *pHead) { struct node * pTemp; int c = 0; pTemp = pHead; while (pTemp != NULL) { c = c + 1; pTemp = pTemp->next; } return c; } 5.Printingthecontentsofalist: Toprintthecontents,traversethelistfromtheheadnodetothelastnode. void display( struct node *pHead) { struct node *pTemp; pTemp = pHead; while (pTemp != NULL) { printf(%d, pTemp -> data); pTemp = pTemp -> next; } }

36

/* PROGRAM FOR IMPLEMENTATION OF SINGLE LINKED LIST */ #include"stdio.h" #include<alloc.h> #define NULL 0 /* STRUCTURE CONTANING A DATA PART AND A LINK PART */ struct node { int data; struct node *next; }*pHead; /* phead is a global pointer contains the adress of the first node in list*/ pHead=NULL; /* THIS IS THE MAIN PROGRAM main() { int i, num, loc; */

while(1) /* this is an indefinite loop */ { printf(" \n1.INSERT A NUMBER AT BEGINNING;\n"); printf(" 2.INSERT A NUMBER AT LAST:\n"); printf(" 3.INSERT A NUMBER AT A PARTICULAR LOCATION LIST:\n"); printf(" 4.PRINT THE ELEMENTS IN THE LIST :\n"); printf(" 5.PRINT THE NUMBER OF ELEMENTS IN THE LIST\n"); printf(" 6.DELETE A NODE IN THE LINKED LIST:\n"); printf(" 7.GET OUT OF LINKED LIST:\n"); printf(" PLEASE, ENTER THE NUMBER:\n"); scanf("%d",&i); /* ENTER A VALUE FOR SWITCH */ switch(i) { case 1: printf("ENTER THE NUMBER :-"); scanf("%d",&num); insert_begin(num); break; case 2: printf("ENTER THE NUMBER :-"); scanf("%d",&num); insert_last(num); break; case 3:

IN

37 printf(" PLEASE ENTER THE NUMBER :-"); scanf("%d",&num); printf("PLEASE ENTER THE LOCATION NUMBER :-"); scanf("%d",&loc); insert_after(num,loc); break; 4: printf(" THE ELEMENTS IN THE LIST ARE : "); display( ); break; 5: printf(" Total No Of Elements In The List are%d",count()); break; 6: printf(" PLEASE ENTER A NUMBER to be deleted :"); scanf("%d",&num); delete(num); break; 7: exit();

case

case

case

case

} }

} /* end if switch */ /* end of while */ /* end of main */

/* ADD A NEW NODE AT BEGINNING */ void insert_begin(int val) { struct node *pNew; pNew = (struct node *)malloc(sizeof(struct node)); pNew->data = val; if (pHead == NULL) { /*Adding to empty list*/ pHead = pNew; pHead->next= NULL; } else { /* Adding as first node*/ pNew->next = pHead; pHead = pNew; } }

38 /*THIS FUNCTION ADDS A NODE AT THE LAST OF LINKED LIST */ void insert_last(int val) { struct node *pNew, *pTemp; pNew = (struct node *)malloc(sizeof(struct node)); pNew->data = val; pNew->next= NULL; if (pHead == NULL) { /*Adding to empty list*/ pHead = pNew; pHead->next= NULL; } else { /* Adding as last node*/ pTemp = pHead; while(pTemp->next!=NULL) pTemp = pTemp->next; pTemp->next = pNew; pNew->next = NULL; } }

/*THIS FUNCTION DELETES A NODE */ void delete (int key, struct node *pHead) { struct node *pTemp, *pPre; pTemp=pHead; if (pHead == NULL) printf( Empty linked list\n);; else // Search node { pTemp = pHead; while (pTemp!=NULL) { if(pTemp->data == key) break; else { pPre = pTemp pTemp = pTemp->next; } }

39

if( pTemp==NULL) printf(Node not found\n); else { if ( pTemp == pHead ) pHead = pTemp->next; else pPre->next = pTemp->next; free(pTemp) printf(Node deleted\n ); } } } /* ADD A NEW NODE AFTER A SPECIFIED NO OF NODES */ insert_after(int num, int loc) { int i; struct node *pNew,*pPre,*pTur; pTemp=phead; /* here pTemp stores the first location */ if(loc > count()+1 || loc <= 0) { printf(" insertion is not possible : "); return; } if (loc == 1) /* if list is null then add at beginning */ { insert_begin(num); return; } else { for(i=1;i<loc;i++) { ppre=pcur; /* ppre will be holding previous value */ pcur=pcur->next; } pnew=(struct node *)malloc(sizeof(struct node)); pnew->data=num; pnew->next=ppre->next ppre->next=pnew; return; } }

40

// ADD A NEW NODE AFTER A SPECIFIED node value */ void insert_byValue(int val) { struct node *pNew, *pPre, *pTemp; int key; printf(Enter the value after which node to be inserted); scanf(%d, &key); // Allocate memory for the new node pNew = (struct node *)malloc(sizeof(struct node)); pNew->data = val;

if (pHead == NULL) {

/*Adding to empty list*/

pHead = pNew; pHead->next= NULL; } else { pTemp = pHead; /* Searching insertion point*/ while(pTemp!=NULL) { if(pTemp->data == key) pNew->next = pTemp->next; pTemp->next = pNew else { pPre = pTemp; pTemp = pTemp->next; } } if( pTemp==NULL) printf( The insertion point cant be found\n); else { if ( pTemp==pHead) // inserted before head node { pNew->next = pTemp; pTemp->next = pNew; } else // insert as intermediate or last node { pNew->next = pTemp->next; pTemp->next = pNew;

41 } } } }

/* THIS FUNCTION DISPLAYS THE CONTENTS OF THE LINKED LIST */ void display( ) { struct node *pTemp; pTemp=pHead; if(pTemp ==NULL) { printf("NO ELEMENT IN THE LIST :"); return; } else /* traverse the entire linked list */ while(pTemp!=NULL) { printf("%d \n",r->data); pTemp = pTemp ->next; } }

//THIS FUNCTION COUNTS THE NUMBER OF ELEMENTS IN THE LIST void count() { struct node *pTemp; int c=0; *pTemp=phead; while(pTemp!=NULL) { pTemp = pTemp ->next; c++; } return(c); }

Application of single linked list: Program to add two polynomials: e.g (6X2+3X+1) + (5X3+5X+3) = 5X3+6X2+8X+4 #include<stdio.h> #include<alloc.h>

42 #include<conio.h> struct node { int coeff; int pow; struct node *next; }; struct node *poly1=NULL,*poly2=NULL,*poly=NULL; void create(struct node *n) { char ch; do { printf("\n enter coeff:"); scanf("%d",&node->coeff); printf("\n enter power:"); scanf("%d",&node->pow); node->next=(struct link*)malloc(sizeof(struct link)); node=node->next; node->next=NULL; printf("\n continue(y/n):"); ch=getch(); } while(ch=='y' || ch=='Y'); } void show(struct link *n) { while(n->next!=NULL) { printf("%dx^%d",n->coeff,n->pow); n=n->next; if(n->next!=NULL) printf("+"); } }

void polyadd(struct node *poly1,struct node *poly2,struct node *poly) { while(poly1->next && poly2->next) { if(poly1->pow>poly2->pow) { poly->pow=poly1->pow; poly->coeff=poly1->coeff; poly1=poly1->next;

43 } else if(poly1->pow<poly2->pow) { poly->pow=poly2->pow; poly->coeff=poly2->coeff; poly2=poly2->next; } else { poly->pow=poly1->pow; poly->coeff=poly1->coeff+poly2->coeff; poly1=poly1->next; poly2=poly2->next; } poly->next=(struct node *)malloc(sizeof(struct node)); poly=poly->next; poly->next=NULL; } while(poly1->next || poly2->next) { if(poly1->next) { poly->pow=poly1->pow; poly->coeff=poly1->coeff; poly1=poly1->next; } if(poly2->next) { poly->pow=poly2->pow; poly->coeff=poly2->coeff; poly2=poly2->next; } poly->next=(struct node *)malloc(sizeof(struct node)); poly=poly->next; poly->next=NULL; } } main() { char ch; poly1=(struct link *)malloc(sizeof(struct link)); poly2=(struct link *)malloc(sizeof(struct link)); poly=(struct link *)malloc(sizeof(struct link)); printf("\nenter 1st number:");

44 create(poly1); printf("\nenter 2nd number:"); create(poly2); printf("\n1st Number:"); show(poly1); printf("\n2nd Number:"); show(poly2); polyadd(poly1,poly2,poly); printf("\nAdded polynomial:"); show(poly); getch(); }

45 CIRCULAR LINKED LIST: The circular linked list is similar to single linked list except that the last nodes next pointer points to the first node. In a circularly linked list, all nodes are linked in a continuous circle, without using null.

Various operations that can be performed with circular linked list are Creation of circular linked list Insertion of a node in circular linked list Deletion of any node from the list Display of circular linked list //Circular Linked List in C #include<stdio.h> #include<conio.h> #include<stdlib.h> #include<alloc.h> #define NULL 0 struct node { int data; struct node *next; }; struct node *pHead; void main() { int ch,n,m,key,i,cou pHead = NULL; while(1) { printf(" 1. CREATE A LIST \n); printf(" 2.INSERT A NUMBER AT BEGINNING;\n"); printf(" 3.INSERT A NUMBER AT LAST:\n"); printf(" 4.INSERT A NUMBER AFTER A PARTICULAR VALUE IN LIST:\n"); printf(" 5.DELETE A NODE IN THE LINKED LIST:\n"); printf(" 6.PRINT THE ELEMENTS IN THE LIST :\n"); printf(" 7.PRINT THE NUMBER OF ELEMENTS IN THE LIST\n"); printf(" 8.GET OUT OF LINKED LIST:\n"); printf(" PLEASE, ENTER THE NUMBER:\n"); scanf("%d",&ch); switch(ch)

46 { case 1: printf("enter no of items"); scanf("%d",&n); for(i=0;i<n;i++) { printf("enter the element"); scanf("%d",&m); create(m); } break; case 2: printf("enter the element"); scanf("%d",&m); insert_begin(m); break; case 3: printf("enter the element"); scanf("%d",&m); insert_last(m); break; case 4: printf("enter the element"); scanf("%d",&m); printf("enter the value after which you want to insert"); scanf("%d",&key); insert_after(m,key); break; case 5: printf("Enter the element to delete"); scanf("%d",&m); delete(m); break; case 6: display(); break; case 7: cou=count(); printf("No. of elements in the list%d ", cou); break; case 8: exit(0); break; default: printf("wrong choice"); } } } // Function to create circular list

47 create(int data) { struct node *pNew,*pTemp; pNew =(struct node *)malloc(sizeof(struct node)); pNew ->data=data; pNew ->next=NULL; if(pHead == NULL) { pHead = pNew; pNew ->next= pHead; } else { pTemp = pHead; while(pTemp->next!=pHead) pTemp = pTemp->next; pTemp->next = pNew; pNew->next = pHead; } return; } /* ADD A NEW NODE AT BEGINNING */ insert_begin(int val) { struct node *pNew, *pTemp; pNew = (struct node *)malloc(sizeof(struct node)); pNew->data = val; if (pHead == NULL) { /*Adding to empty list*/ pHead = pNew; pHead->next= pNew; printf("Node inserted\n"); } else { /* Adding as first node*/ pTemp = pHead; while(pTemp->next!=pHead) pTemp = pTemp->next; pTemp->next = pNew; pNew->next = pHead; pHead = pNew; printf("Node inserted\n"); } return; }

48 /*THIS FUNCTION ADDS A NODE AT THE LAST OF LINKED LIST */ insert_last(int val) { struct node *pNew, *pTemp; pNew = (struct node *)malloc(sizeof(struct node)); pNew->data = val; if (pHead == NULL) { /*Adding to empty list*/ pHead = pNew; pHead->next= pHead; } else { /* Adding as last node*/ pTemp = pHead; while(pTemp->next!=pHead) pTemp = pTemp->next; pTemp->next = pNew; pNew->next = pHead; printf(Node inserted\n); } return; } /* ADD A NEW NODE AFTER A SPECIFIED VALUE IN THE LIST */ insert_after(int val, int key) { struct node *pNew, *pPre, *pTemp; // Allocate memory for the new node pNew = (struct node *)malloc(sizeof(struct node)); pNew->data = val; if(pHead == NULL) /*Adding to empty list*/ { pHead = pNew; pHead->next= pNew; } else { pTemp = pHead; /* Searching insertion point*/ do { if(pTemp->data == key) { pNew->next = pTemp->next; pTemp->next = pNew; printf("node inserted"); return;

49 } else { pTemp = pTemp->next; } } while(pTemp!=pHead); if( pTemp==pHead) printf(" The insertion point cant be found\n); } return; } /* THIS FUNCTION DISPLAYS THE CONTENTS OF THE LINKED LIST */ display( ) { struct node *pTemp; pTemp=pHead; if(pTemp ==NULL) { printf("NO ELEMENT IN THE LIST :"); return; } else /* traverse the entire linked list */ do { printf("%d \n", pTemp ->data); pTemp = pTemp ->next; } while(pTemp!=pHead); return; } //THIS FUNCTION COUNTS THE NUMBER OF ELEMENTS IN THE LIST int count() { struct node *pTemp; int c=1; pTemp=pHead; if (pHead ==NULL) return 0; while(pTemp->next!= pHead) { pTemp = pTemp ->next; c++; } return(c); }

50

/*THIS FUNCTION DELETES A NODE */ delete(int key) { struct node *pNew, *pPre, *pTemp; if (pHead == NULL) /*DELETION in empty list*/ { printf(" Empty linked list \n); } else { pTemp = pHead; /* Searching DELETION point*/ do { if(pTemp->data == key) { if(pTemp==pHead && pTemp->next==pHead) { free(pTemp); /* deleting only node existing in the link pHead=NULL; printf(node deleted \n"); return; } else if(pTemp==pHead && pTemp->next!=pHead) { while(pTemp->next != pHead) /*delete head node pTemp = pTemp->next; pTemp->next = pHead->next; free(pHead); pHead = pTemp->next; printf("head node deleted \n"); return; } else { pPre->next = pTemp->next; free(pTemp); printf("node deleted \n"); return; } } else { pPre = pTemp; pTemp = pTemp->next; } } while(pTemp!=pHead);

51 if( pTemp==pHead) printf(" The deletion point cant be found\n"); } return; }

52 Application of Circular Linked List: Josephus' Problem: This algorithm is named for a historian of the first century, Flavius Josephus, who survived the JewishRoman war due to his mathematical talents. Legend has it that he was one out of 41 Jewish rebels trapped by the Romans. His companions preferred suicide to escape, so they decided to form a cycle and to kill every third person and to proceed around the circle until no one was left. Josephus wasn't excited by the idea of killing himself, so he calculated where he has to stand to survive the vicious circle. Algorithm: There are n people standing in a circle waiting to be executed. After the first man is executed, k1 people are skipped and the k-th man is executed. Then again, k1 people are skipped and the k-th man is executed. The elimination proceeds around the circle (which is becoming smaller and smaller as the executed people are removed), until only the last man remains, who is given freedom. In Euclidean geometry, a circle is the set of all points in a plane at a fixed distance, called the radius, from a fixed point, called the centre. ... The task is to choose the place in the initial circle so that you survive (remain the last one), given n and k. For skip value k = 4, n=8 1 8 3 7 4 6 5 3 7 6 5 Delete(4) 1 7 6 Delete(5) 2 3 7 6 Delete(2) 1 3 3 7 6 delete(1) 7 6 5 delete(8) 3 2 1 2

1 8

7 6 Delete(3) 6 delete(7), survived = 6

53 Program to implement Josophus Problem: #include <stdio.h> #include<stdlib.h> #define NULL 0 struct node { int soilder; struct node *next; }; struct node *head, *current; int tot_soilders; main() { int ch,n; head = NULL; while(1) { printf("\n1. soilder list creation"); printf("\n2. Display soilder list"); printf("\n3. Sucide"); printf("\n0. Exit"); scanf("%d",&ch); switch(ch) { case 1: printf("\nEnter the total no. of soilders"); scanf("%d",&n); create_list(n); break; case 2: display(); getch(); break; case 3: if (tot_soilders <= 1) printf("There Should Be Atleast 2 Soilders in the List"); else { printf("\nEnter the no by which sucide is to be commited"); scanf("%d",&n); if ( n < 1 ) printf("\nInvalid Number!"); else printf("\nThe only Soilder left after sucide session is %d ",left_after_sucide(n)); }

54 getch(); break; case 0: return; default : printf("\nINVALID CHOICE"); getch(); } } } create_list(int data) { struct node *last,*New; int i; head=NULL; for(i=1;i<=data;i++) { New=(struct node *)malloc(sizeof(struct node)); New->soilder=i; New->next=NULL; tot_soilders=tot_soilders+1; if(head==NULL) { head=New; last=New; New->next=head; } else { New->next=last->next; last->next=New; last = New; } } } display() { if (head == NULL) { printf("\nNo Soilders in the Queue"); return; } printf("%d",head->soilder); printf("%c ",2); current = head->next; while( current != head) { printf("%d ",current->soilder);

55 current = current->next; } return; } int left_after_sucide(int by_n) { int i=1,j,dead_sol; struct node *save; current = head; for(i=1; i<tot_soilders; i++) { for (j=1;j< by_n;j++) { save = current; current = current->next; } save->next = current->next; if (current == head) { head = current->next; } dead_sol = current->soilder; free(current); display(); printf("\n\n%d%c is Dead \n",dead_sol,1); current = save->next; } head = current; display(); tot_soilders = 1; return(head->soilder); }

56 Doubly Linked Lists Doubly Linked List (DLL) is a type of Linked List in which each data item points to the next and also to the previous data item. This "linking" is accomplished by keeping two address variables (pointers) together with each data item. These pointers are used to store the address of the previous and the address of the next data items in the list. The structure that is used to store one element of a linked list is called a node like in Linked Lists. A node has three parts: data, previous address and next address. The data part contains the necessary information about the items of the list, the previous address part contains the address of the previous node, the next address part contains the address of the next node.

Basic double linked list fragment:


#include<stdio.h> #include<conio.h> #include<alloc.h> struct node { int no; struct node *pre,*next; }; struct node *head=NULL,*tail=NULL; void main() { int ch; clrscr(); do { printf("\n\n\n 1 Append");

57 printf("\n 2 Forward Traverse"); printf("\n 3 Insert after the given number"); printf("\n 4 Insert before the given number"); printf("\n 5 Insert By Position "); printf("\n 6 Delete By Value "); printf("\n 7 Delete by Position"); printf("\n 0 EXIT"); printf("\n Enter your choice"); scanf("%d",&ch); switch(ch) { case 1: Append(); break; case 2: ftraverse(); break; case 3: insertafter(); break; case 4: insertbefore(); break; case 5: insertbypos(); break; case 6: delByValue(); break; case 7: delByPos(); break; case 0: break; default: printf("\n Invalid Choice"); } }while(ch!=0); getch(); } // Appending a new node void Append() { struct node *pNew;

58 pNew =(struct node*)malloc(sizeof(struct node)); printf("\nEnter no"); scanf("%d",& pNew ->no); pNew ->next=NULL; if(head==NULL) { head= pNew; tail= pNew; head->pre=NULL; return; } tail->next= pNew; pNew ->pre=tail; tail= pNew; } //Displaying the contents of the list void ftraverse() { struct node *p=head; if(head==NULL) { printf("\n Empty Link List"); return; } while(p!=NULL) { printf(" %d",p->no); p=p->next; } } // Inserting a node after a specified number in the node // Insert by number void insertafter() { int val; struct node *p =head,* pNew; if(head==NULL) { printf("\nEmpty LINK lIST"); return; } printf("\n Enter number after which you want to INSERT"); scanf("%d",&val); if(val==tail->no)

59 { pNew =(struct node*)malloc(sizeof(struct node)); printf("\nENTER number"); scanf("%d",& pNew ->no); tail->next= pNew; pNew ->pre=tail; tail= pNew; tail->next=NULL; printf("\n NODE INSERTED"); return; } while(p->next!=NULL) { if(val==p->no) { pNew =(struct node*)malloc(sizeof(struct node)); printf("\nEnter number"); scanf("%d",& pNew ->no); pNew ->next=p->next; pNew ->pre=p; p->next= pNew; pNew ->next->pre= pNew; printf("\n Node Inserted"); return; } p=p->next; } printf("\n%d Number Not found",val); return; }

// Insert a node before a specified number void insertbefore() { int val; struct node *p =head,* pNew; if(head==NULL) { printf("\nEmpty LINK lIST"); return; } printf("\n Enter number BEFORE which you want to INSERT"); scanf("%d",&val); if(val==head->no)

60 { pNew =(struct node*)malloc(sizeof(struct node)); printf("\nENTER number"); scanf("%d",& pNew ->no); head->pre= pNew; pNew ->next=head; head= pNew; head->pre=NULL; printf("\n NODE INSErted"); return; } while(p!=NULL) { if(val==p->no) { pNew =(struct node*)malloc(sizeof(struct node)); printf("\nENTER number"); scanf("%d",& pNew ->no); pNew ->pre=p->pre; pNew ->next=p; pNew ->pre= pNew; pNew ->pre->next= pNew; printf("\n Node inserted"); return; } p=p->next; } printf("\n%d Number not found",val); return; }

// Insert a new node after a specified position in the list void insertbypos() { int val; struct node *p =head,* pNew; int non=0,i,pos; if(head==NULL) { printf("\nEmpty LINK lIST"); return; }

printf("\nEnter the position");

61 scanf("%d",&pos); while(p!=NULL) { non++; p=p->next; } if (pos<1&&pos>non+1) { printf("\n Invalid Position"); return; } if(pos==1) { pNew =(struct node*)malloc(sizeof(struct node)); printf("\nENTER number"); scanf("%d",& pNew ->no); head->pre= pNew; pNew ->next=head; head= pNew; head->pre=NULL; printf("\n Node inserted"); return; } if(pos==non+1) { pNew =(struct node*)malloc(sizeof(struct node)); printf("\nEnter number"); scanf("%d",& pNew ->no); tail->next=n; pNew ->pre=tail; tail= pNew; tail->next=NULL; printf("\n Node inserted"); return; } p=head; for(i=1;i<pos-1;i++) { p=p->next; } pNew =(struct node*)malloc(sizeof(struct node)); printf("\nENTER number"); scanf("%d",& pNew ->no); pNew ->next=p->next;

62 pNew ->pre=p; p->next= pNew; pNew ->next->pre= pNew; printf("\n NODE Inserted"); return; }

// Delete a node with specified number in the node void delByValue() { int val; struct node *p =head; int i,pos; if(head==NULL) { printf("\nEmpty LINK lIST"); return; }

printf("\nEnter the number to be deleted"); scanf("%d",&val); if(head==tail&&val==head->no) { free(p); head=tail=NULL; printf("\nNode Deleted"); return; } else if (val==head->no) { head=head->next; head->pre=NULL; free(p); printf("\nNode Deleted"); return; } else if(val==tail->no) { p=tail; tail=tail->pre; tail->next=NULL; free(p); printf("\nNode Deleted"); return;

63 } while(p->next!=NULL) { if(val==p->no) { p->pre->next=p->next; p->next->pre=p->pre; free(p); printf("\nNode Deleted"); } p=p->next; } printf("\n Node not Found") ; return; } // Delete a node by position void delByPos() { struct node *p =head; int non=0,i,pos; if(head==NULL) { printf("\nEmpty LINK lIST"); return; } printf("\nEnter the POSITION which is to be deleted"); scanf("%d",&pos); while(p!=NULL) { non++; p=p->next; } p=head; if(pos<1||pos>non); { printf("\n INvalid Position"); } if(head==tail&&pos==1) { free(p); head=tail=NULL; printf("\nNODE Deleted"); return;

64 } else if (pos==1) { head=head->next; head->pre=NULL; free(p); printf("\nNode DELeted"); return; } else if(pos==non) { p=tail; tail=tail->pre; tail->next=NULL; free(p); printf("\nNode DELeted"); return; } for(i=0;i<pos-1;i++) { p=p->next; } p->pre->next=p->next; p->next->pre=p->pre; free(p); printf("\nNode Deleted"); return; } Application : Palindrome checking using doubly linked list

Linked List problems: 1. Implement stack using linked list 2. Implement queue using linked list

65

TREES
Tree definitions A tree is a multi-linked data structure consisting of nodes with pointers to two or more other nodes. The connecting pointers between nodes are typically called tree edges. A tree has a distinguished node called its root to which no other node points. Recursively, a tree can be defined as empty, or a root node with edges to zero or more subtrees.

Tree views There are a number of ways to depict trees in graphic form. The nodes are represented as labeled circles and edges are connecting lines; e.g.,

The linked list implementation of a tree is shown below

Array implementation of tree is shown below a b e c d f g

A[0] A[1] A[2] A[3] A[4] A[5] A[6]

66 Terminology 1. Root: The topmost node of the tree is the root. (or) A node that doesnt have a parent is a root node. 2. Leaf: A node that doesnt have children is called leaf node. 3. Siblings: Nodes with a common parent are called siblings. 4. An upper node in the tree is the parent of the nodes immediately below it, and those lower nodes are the children. 5. A node is the grandparent of nodes two levels below, etc. 6. Path: A path is a sequence of nodes connected by edges. 7. Height: The height of a node is the length of the longest path from the node to a leaf. 8. Depth: The depth of a node is the length of the path from the root to that node This terminology is illustrated in Figure 1.

Figure 1: Tree terminology. The following are important tree facts. 1. 2. 3. Every node except the root has exactly one parent. The root has no parent. There is a unique path from the root to any node. In tree examples above, each node had two children; such trees are called binary. In general, a tree node may have any number of children, from 0 to n; such trees are called n-ary. Figure 2 is an example of an n-ary tree,

67

Figure 2: Example n-ary tree.

Examples of tree-structured hierarchies: 1. Directory Hierarchies: In computers, files are stored in directories that form a tree. The top level directory represents the root. It has many subdirectories and files. The subdirectories would have further set of subdirectories. 2. Organization charts: In a company a number of vice presidents report to a president. Each VP would have a set of general managers, each GM having his/her own set of specific managers and so on. 3. Biological classifications: Starting from living being at the root, such a tree can branch off to mammals, birds, marine life etc. 4. Game Trees: All games which require only mental effort would always have number of possible options at any position of the game. For each position, there would be number of counter moves. The repetitive pattern results in what is known a game tree.

Binary Trees Definition: A binary tree is a tree in which each node can have maximum two children. Thus each node can have no child, one child or two children. The pointers help us to identify whether it is a left child or a right child. Examples of binary trees:

The following are NOT binary trees:

68

The level of a node in a binary tree: The root of the tree has level 0 The level of any other node in the tree is one more than the level of its parent.

Number of nodes in each level are Level 0 : 1 node Level 1 : 2 nodes Level 2 : 4 nodes Level 3 : 8 nodes

69

Total number of nodes for this tree = 15 Height of the root node = 3 n = 23+1 1 = 15 In general, n = 2h+1 1 ( maximum) Maximum Number of nodes in a tree with height h is n = 2h+1 1 Height of a tree with n nodes : h = log ( n+1) - 1 Implementation A binary tree has a natural implementation in linked storage. A tree is referenced with pointer to its root. Recursive definition of a binary tree: A binary tree is either - Empty, or - A node (called root) together with two binary trees (called left subtree and the right subtree of the root) Each node of a binary tree has both left and right sub trees which can be reached with pointers Structure of a binary tree struct tree_node { int data; struct tree_node *left_child; struct tree_node *right_child; };

Tree traversal: Tree traversal is the process of visiting each node in the tree exactly once. Types of tree traversal: ( Depth first tree traversing) 1. Preoder traversal: ( Root Visit the root node. Left Right )

70 Traverse the left subtree. Traverse the right subtree. 2. Inorder traversal: ( Left Root Right ) Traverse the left subtree. Visit the root node. Traverse the right subtree. 3. Postorder traversal: ( Left Right Root ) Traverse the left subtree. Traverse the right subtree. Visit the root node. 4. Level order traversal: (Breadth First Tree Traversal ) Traverse the nodes from level 0 to level n E.g Suppose there are only 3 nodes in the tree having the following arrangement:

In order: n2 n1 n3 Pre order: n1 n2 n3 Post order: n2 n3 n1

Algorithm for Preorder traversal of a binary tree: If the root node of the tree is null, the traversal is done. Otherwise, visit the root node. Then recursively traverse the left subtree. i-e If there is a left child we visit the left subtree (all the nodes) in pre-order fashion starting with that left child . Then recursively traverse the right subtree. i-e If there is a right child then we visit the right subtree in pre-order fashion starting with that right child.

void preorder(struct tree_node * p) { if (p !=NULL) { printf(%d\n, p->data); preorder(p->left_child);

71 preorder(p->right_child); } } Example:

Preorder Traversal : a b c d f g e a(root) b(left) c d f g e (right) In-order traversal of a binary tree If the root node of the tree is null, the traversal is done. Traverse the left subtree recursively. i-e If there is a left child we visit the left subtree (all the nodes) in in-order fashion. Visit the root node. Then recursively traverse the right subtree. i-e If there is a right child then we visit the right subtree in in-order fashion.

void inorder(struct tree_node *p) { if (p !=NULL) { inorder(p->left_child); printf(%d\n, p->data); inorder(p->right_child); } }

72 Inorder: bafdgce b(left) a(root) f d g c e(right) Algorithm for Post order traversal of a binary tree: If the root node of the tree is null, the traversal is done. Recursively traverse the left subtree. i-e If there is a left child we visit the left subtree (all the nodes) in pre-order fashion starting with that left child . Then recursively traverse the right subtree. i-e If there is a right child then we visit the right subtree in pre-order fashion starting with that right child. At last visit the root node.

void postorder(struct tree_node * p) { if (p !=NULL) { preorder(p->left_child); preorder(p->right_child); printf(%d\n, p->data); } } Example:

Post order : b f g d e c a b (left) f g d e c (Right) a ( Root ) Level order: abcdefg a (level 0) b c (level 1) d e (level 2) f g (level 3)

Finding sum of the values of all the nodes of a tree: To find the sum, add to the value of the current node, the sum of values of all nodes of

73 left subtree and the sum of values of all nodes in right subtree. int sum(struct tree_node *p) { if ( p!= NULL) return(p->data + sum(p->left_child)+ sum(p->right_child)); else return 0; }

Expression Trees

Expression tree for (a + b * c) + ((d * e + f ) * g) The above figure shows an example of an expression tree . The leaves of an expression tree are operands , such as constants or variable names, and the other nodes contain operators . This particular tree is a binary tree, because all of the operations are binary, and although this is the simplest case, it is possible for nodes to have more than two children. It is also possible for a node to have only one child, as is the case with the unary minus operator. We can evaluate an expression tree, T, by applying the operator at the root to the values obtained by recursively evaluating the left and right subtrees. In our example, the left subtree evaluates to a + (b * c) and the right subtree evaluates to ((d *e ) + f )*g. The entire tree therefore represents (a + (b*c)) + (((d * e ) + f)* g).

Constructing an Expression Tree


Algorithm for converting postfix expression into an expression tree: read the expression one symbol at a time. If the symbol is an operand, create a one-node tree and push a pointer to it onto a stack. If the symbol is an operator, pop pointers to two trees T1 and T2 from the stack (T1 is popped first). Form a new tree whose root is the operator and whose left and right children point to T2 and T1 respectively. A pointer to this new tree is then pushed onto the stack. Example, suppose the input is a b + c d e + * *

74

1. The first two symbols are operands, so we create one-node trees and push pointers to them onto a stack.* *For convenience, we will have the stack grow from left to right in the diagrams.

2. Next, a '+' is read, so two pointers to trees are popped, a new tree is formed, and a pointer to it is pushed ontothe stack.*

3. Next, c, d, and e are read, and for each a one-node tree is created and a pointer to the corresponding tree is pushed onto the stack.

4. Now a '+' is read, so two trees are merged.

Continuing, a '*' is read, so we pop two tree pointers and form a new tree with a '*' as root.

75

Finally, the last symbol is read, two trees are merged, and a pointer to the final tree is left on the stack.

76

Binary search trees


A binary search tree (BST) or ordered binary tree is a node-based binary tree data structure which has the following properties:

The left subtree of a node contains only nodes with keys less than the node's key. The right subtree of a node contains only nodes with keys greater than the node's key. Both the left and right subtrees must also be binary search trees. Left subtree < Root < Right subtree

E.g binary search tree The most appealing property of a binary search tree is that searching for a particular node can be performed using the following algorithm:

If the value we're searching for is at the root, success. Otherwise, if the value is less than the root, search the left subtree. Otherwise, search the right subtree.

If the tree is reasonably well balanced, this algorithm can operate in logarithmic time. This is the case because at each step in the search, half of the tree is eliminated from consideration. This is the same kind of logarithmic behavior we saw in the binary search of a linear list.

Tree balancing: The logarithm search behavior on a binary search tree only happens when the tree is reasonably well balanced. "Reasonably well balanced" means that O(N/2) of the nodes are in each half of the tree. The binary search tree property by itself does not guarantee a balanced tree. For example, Figure below shows degenerate "left heavy" and "right heavy" binary search trees.

77

In both of these trees, search time is O(N), not O(log N). This is because the "eliminate half the tree" step in the search algorithm ends up eliminating nothing So in the worst case, all N nodes of the tree must be searched to find the value at a leaf node. The key to keeping a binary search tree useful is to maintain balance. This can be done by a global balancing algorithm, that takes any binary search tree and redistributes all of its nodes into a maximally well balanced tree. The problem with global balancing is that it is as expensive timewise as a complete sort. A more sensible approach is to maintain balance incrementally, not allowing the tree ever to get too far out of balance. Each time a node is inserted or removed from the tree, the tree balance is adjusted.

Traversal of a Binary Search Tree: The above BST can be traversed starting with the root node (Preorder traversal) to result in the sequence 52, 32, 25, 10, 28, 75, 68, 63, 58, 70, 96. However, an interesting feature is revealed with the Inorder Traversal which yields 10, 25, 28, 32, 52, 58, 63, 68, 70, 75, 96 This is nothing but an ordered listing of the values of BST nodes in the increasing order, with the left most node being the smallest element and the right most node being the largest element. Inserting a node in a BST: Insertion of a new node in a BST has to be done only at the appropriate place for it, so that overall BST structure is still maintained, i.e. the value at any node should be less than that at right node and less than that at the left node. Inserting a new node into a BST always occurs at a NULL pointer. There is never a case when existing nodes need to be rearranged to accommodate the new node. As an example, consider inserting the new value 43 into the BST shown above.

78

Hint: Search for node containing 43. Obviously you wont find it, but the search algorithm has taken you to the NULL pointer where it should be placed, which is the right pointer of 32.

Inserting a new value in a BST: The following function inserts a value d in its proper place in a Binary Search Tree (with distinct values in the nodes, no repeated values). recursively searches for the proper place to insert the new value, and returns with a pointer to the position of the new node. a new node is created and the value is stored in it. ( the new node gets automatically linked to its parent node.) //Inserts a node with value d in a tree with root p struct treenode { int data; struct treenode *left; struct treenode *right; };

struct treenode* insert( int d , struct treenode * p) { //Inserting the value in a new node if( p== null) { p = (struct treenode*)(malloc(sizeof(struct treenode))); p->data = d; p->left = NULL; p->right = NULL;

79 } //Inserting a value less than the root value, move left else { if(d < p->data) p->left = insert( d, p->left); //Inserting a value greater than the root value, move right else if(d > p->data) p->right = insert( d, p->right ); } return p; } Creating a Binary Search Tree: To create a binary search tree, keep on inserting the nodes as and when they arrive. Note that the shape of the BST will depend on the order of insertion of the nodes. The earlier BST tree was created from the sequence 52, 32, 25, 75, 68, 96, 63, 10, 70, 28, 58 If the values arrived in the following sequence: 10, 25, 28, 32, 52, 58, 63, 68, 70, 75, 96 the BST would take the following shape

Finding Minimum value in a tree: To perform FindMin, Start at the root Traverse the left subtree until a left leaf (child) is reached. Recursive implementation of find_min for binary search trees: struct treenode* find_min(struct treenode* T ) { if( T == NULL ) return NULL; else

80 if( T->left == NULL ) return( T ); else return( find_min ( T->left ) ); } This function returns tree node with minimum value. Nonrecursive implementation of find_min for binary search trees: struct treenode* find_min(struct treenode* T ) { if( T != NULL ) while( T->left != NULL ) T = T->left; return T; } Finding Maximum value in a tree: To perform FindMax Start at the root Traverse the right subtree until a right leaf (child) is reached. // This function returns tree node with maximum value struct treenode* find_min(struct treenode* T ) { if( T == NULL ) return NULL; else if( T->right == NULL ) return( T ); else return( find_min ( T->right) ); } // Nonrecursive implementation of find_max for binary search trees: struct treenode* find_min(struct treenode* T ) { if( T != NULL ) while( T->right != NULL ) T = T->right; return T; }

81 Searching for a target in the Binary Search Tree: The binary search tree definition allows you to quickly search for a particular value in the BST. Check the given value with the value in the root node. If it matches, return p, else if the given value is smaller than the root value, look into the left subtree, else look into the right subtree. If subtree is null, return 0. struct treenode *treeSearch( struct treenode *p, int target) { if (p!=NULL) { if (p->data == target) return p; else if (p->data > target) return treeSearch(p->left, target); else return treeSearch(p->right, target); } return 0; } Apply this function on the binary search tree shown earlier to locate 63. It requires to search, just 4 nodes to locate the value. If the tree is balanced the time complexity of searching should be O(log n), where n is the number of nodes on the tree. In the worst case, when the tree is skewed in one direction, it would be O(n). Creating a balanced tree from sorted data: The above tree is not a balanced tree, but skewed towards right, as all the elements are in perfectly sorted order. In such a case, the search and insertion complexity would be O(n) instead of O(log n). If the sequence were entered in descending order then it would result in a left-skewed tree. For any other ordering of the sequence the complexity would lie in between O(n) and O(log n). To generate a balanced BST, given any arbitrary sequence of values, storing all the elements in an array and sorting them in ascending order. Once sorted, the element at the midpoint of the array is chosen as the root of the BST. The array can now be viewed as consisting of two subarrays, one to the left of the midpoint and one to the right of the midpoint. The middle element in the left subarray becomes the left child of the root node and the middle element in the right subarray becomes the right child of the root.

82 This process continues with further subdivision of the original array until all the elements in the array have been positioned in the BST. We have to take care to completely generate the left subtree of the root before generating the right subtree of the root. If this is done, a simple recursive procedure can be used to generate a balanced BST.

void balance( int sequence[], int first, int last) { int mid; if (first <= last) { mid = (first + last)/2; insert( sequence[mid], p); //p is the pointer to the tree balance(sequence, first, mid-1); balance(sequence, mid+1, last); } }

Deleting a node from a Binary Search Tree:


Deletion of a node is not so straightforward as is the case of insertion. It would depend on which particular node is being deleted. In fact, we note that there can be three separate cases and each case needs to be handled somewhat differently. The various cases are: (1) deletion of a leaf node, (2) deletion of an internal node with a single child (either a left or right subtree), (3) deletion of an internal node with two children (having both left subtree and right subtree. ) Well examine each case separately: a. Deletion of a leaf Node; Since a leaf node has empty left and right subtrees, deleting a leaf node will render a tree with one less node but which remains a BST. This is illustrated below:

83 b. Deletion of a Node with one child when the node gets deleted, the parent of the node must point to its left child or its right child. The parents reference to the node is reset to refer to the deleted nodes child. This has the effect of lifting up the deleted nodes children by one level in the tree. An example is shown below.

BST after deletion

It makes no difference if the node to be deleted has only a left or a right child.

The previous example illustrated the case when the only child was a right child. The next example illustrates the case when the only child is a left child.

c. Deletion of a Node with two child nodes There is no one-step operation that can be performed since the parents right or left reference cannot refer to both the children at the same time. A deleted node with two children must be replaced by a value which is one of: The largest value in the deleted nodes left subtree. The smallest value in the deleted nodes right subtree. This means that we need to be able to find, either the immediate predecessor or the immediate successor node to the node which is being deleted and replace the deleted node with this value. The General case of Deletion of a node having two child nodes. In the following tree, we want to delete the node q. The node q has T1 as left subtree and

84 T2 as right subtree. Note that all nodes of T1 are going to be smaller than nodes of T2. Note further that the rightmost node of T1 will have the largest value in that subtree. It will be immediate predecessor of node q. All nodes in T2 will be successor of this node.

In order traversal: C D E G J K L N P R T U W Y

Now let us say we want to delete node N. copy the immediate predecessor (the largest element from its left subtree). It would be L, the rightmost node of the left subtree. All elements in the left subtree are going to be smaller than this node. All elements in the right subtree are going to be greater than this node. L can simply replace N, while keeping the structure of the BST tree undisturbed. So copy the node L at the node containing N.

85

The node N can also be removed by replacing it with its immediate successor , the smallest node (left most node) of the right subtree and making the appropriate links. In this case node P becomes the right child of C

The subtree G becomes the left child of P and the subtree T becomes the right child of P. In order traversal: C D E G J K L P R T U W Y.

86

Example 1: As an example, consider the following BST and suppose that we are deleting the value 18 from this tree.

Find the immediate successor of 18 which is the leftmost node in its right subtree 25, and put this value into the place currently occupied by 18. delete the old copy of the node 25. The final tree is shown below.

87

Example 2: Delete Node 2 Node 2 is replaced with minimum value in the right subtree i-e node 3. Node 3 is deleted from its current position

Code to delete a node from a BST containing integer values. It returns the tree after deleting the node with value x: struct treenode *delete_tree (int x, struct treenode* T) { struct treenode *temp, *child; if( T == NULL ) //Deleting from an empty subtree

printf("Element not found"); else if( x < T->data ) /* Go left */ T->left = delete( x, T->left ); elseif( x > T->data ) /* Go right */ T->right = delete( x, T->right ); else /* Found element to be deleted */

88 if( T->left && T->right ) /* Two children */ { /* Replace with smallest in right subtree */ temp = find_min( T->right ); T->data = tmp_cell->data; T->right = delete( T->data, T->right ); } else /* One child */ { temp = T; if( T->left == NULL ) /* Only a right child */ child = T->right; if( T->right == NULL ) /* Only a left child */ child = T->left; free( tmp_cell ); return child; } return T; }

89

Height Balanced Trees (AVL Trees)


If a BST is balanced, one can do insertion, search and deletion in O(log n) time. But if it is skewed, then these operations could take O(n) time. Thus when n is large, it is always of interest to keep the tree balanced as far as possible. One of the first tree balancing algorithms was suggested by two Russian scientists Adelson-Velskii and Landis. Such balanced trees are known by their initials as AVL trees. Definition (1) A node of a tree is balanced if the heights of its children differ by one atmost. (2) An AVL tree is a binary search tree in which all nodes are balanced. Figure 1 shows two binary search trees; the one on the left is AVL, the one on the right is not.

Figure 1: AVL and non-AVL trees. o To maintain the AVL property, the tree must be rebalanced, if necessary, every time a node is inserted or deleted. o The key to the rebalancing is a rotation operation, that grabs an unbalanced tree by its proper root node and redistributes its out-of-balance children. o For example, suppose we have the following AVL tree to which we add a node with value c.

o The steps to add the node then rotate the tree into balance are the following:

90 The height of a node is the longest path length. In the following tree the heights of the nodes are indicated below: Nodes 40, 75, 90 : Height 0 Nodes 60, 80 : Height 1 Node 70 : Height 2. A new term is introduced now, called the Balance factor. The Balance Factor (B.F.) of any node of a Binary search tree, is the difference between the height of right child node (HR) and the height of the left child node (HL). B.F. = HR HL A Height balanced tree (an AVL tree), is one which is either empty, or each node of the tree has a balance factor of 0, -1 or 1.

If BF for any node is 2 or -2 , we say that the tree is unbalanced. The tree can be balanced by appropriate rotation of sub trees around a particular node. If the BF of all nodes remains 0, -1 or1, after a new node is inserted in the tree, nothing needs to be done. However, if the tree gets unbalanced, we have to carry out rotation of subtrees. The rotation will depend on where the new node was inserted in the tree. After a node is inserted, we have to consider 4 separate cases of unbalancing . Case 1: An insertion into the left subtree of the left child LL ( single rotation right for balancing) Case 2: An insertion into the right subtree of the left child RL ( double rotation for balancing) Case 3: An insertion into the left subtree of the right child LR ( double rotation for balancing) Case 4: An insertion into the right subtree of the right child RR ( single rotation left for balancing) 1. Case LL ( Node added to Left of Left ): Here we consider the case where the tree gets unbalanced after a node is added to the left subtree of the left child. The node with BF 2 or -2 must be brought down to a lower level. General case: consider the general LL case, where k1 and k2 are nodes, and X , Y , Z are sub trees. Let us say insertion of a node in X causes the tree to be unbalanced. The tree can be balanced by (right rotation) moving k2 down, and moving k1 up. The subtrees are taken care in the same way

91

(i) Insert at left of subtree makes tree unbalance (ii) After Single rotation right

Example 1: Consider the following tree. It is an AVL tree, as each node has BF of 0, -1, or 1.

Let us now insert a new node 8 in this tree, which will be in the left subtree of left child(25) of the root (56).

Balance factor BF for 12 and 25 is -1, but for 56 the BF is -2. So the tree is unbalanced. The balance has been disturbed by insertion of a node in the left subtree of the left child 25. A right rotation of the tree around the node 56, causes node 25 to move up and 56 to move down the tree. To maintain the Binary search tree property, the right subtree of 25 needs to be attached to node 56. The tree is now balanced and is shown below.

92 Example 2:

AVL property destroyed by insertion of , then fixed by a rotation

2. Case RR( Node added Right of Right): General case: consider the general RR case, where k1 and k2 are nodes, and X , Y , Z are sub trees. Let us say insertion of a node in Z causes the tree to be unbalanced. The tree can be balanced by (left rotation) moving k1 down, and moving k1 up. The subtrees are taken care in the same way

(i) Insert at right of right subtree makes tree unbalance (ii) After Single rotation left

This case is similar to LL case, except that the tree gets unbalanced when a node is added to the Right subtree of the Right child of the root. Consider the following tree which became unbalanced after insertion of node 98. Node 98 was inserted in the right subtree of the right child 80.

The tree can be balanced by a left rotation which causes node 60 to move down and node 80 to move up, as shown

93

3. Case RL (node added to Right subtree of Left child): Let us now consider the general case, of a tree with nodes k1, k2 and k3 with 4 sub trees A, B, C, D. The tree gets unbalanced after insertion of a node in B or C . It needs a double rotation to balance the tree. Rotate k1 to the right and then k3 to the left

Example 1: Inserting 45 to the tree makes the tree unbalanced at node 56.

This is a case where the tree gets unbalanced after a node is added to the right subtree of the left child. Let us check the balance factor in the following tree after node 45 is attached to right subtree of the left node25. Here node 12 has zero BF, node 25 has BF of 1, and node 56 has BF of -2. Note that along that path, the Balance Factors are going as 0, -1, 2, that is, first decreasing and then increasing. It needs a double rotation to balance the tree. A left rotation brings up the node 38 as shown below:

94

The tree is still unbalanced, but the nodes along the path to the root have the Balance Factors as -1, -1, and -2 (the BF have the same sign). This means a right rotation would be able to balance the tree, as it is similar to the LL case. The right rotation results in the following balanced tree:

4. Case LR: (Left of Right) This case is similar to the RL case. The tree gets unbalanced when a node is added to the Left subtree of the Right child. It also needs a double rotation to balance the tree.

The tree can be balanced by a double rotation. The final effect of the double rotation is shown below. Note that in this case the node R moves up, with nodes P and Q as its children and the subtrees are taken care of in the following way. Also note that the left to right ordering of subtrees T1 T2 T3 T4 is not disturbed by the balancing process.

95

Consider the following tree which became unbalanced after inserting the node 75 in the left subtree of right child 80. Here node 40 has zero BF, node 80 has BF 1, and node 60 has BF of -2. Note that along that path, the Balance Factors are going as 0, 1, -2, that is, first increasing and then decreasing. The balance factor of 60 and 80 are of different signs.

The tree is balanced by moving 70 up .Node 60 retains its left subtree, and node 80 retains its right subtree. The left and right subtrees of the root node 70 are taken care of by nodes 60 and 80.

Example 3: Creating an AVL tree

( applying both single rotation and double rotation)

Suppose we start with an initially empty AVL tree and insert the keys 1 through 7 in sequential order. The first problem occurs when it is time to insert key 3, because the AVL property is violated at the root. We perform a single rotation Left between the root and its right child to fix the problem. The tree is shown in the following figure, before and after the rotation:

Next, we insert the key 4, which causes no problems, but the insertion of 5 creates a violation at node 3, which is fixed by a single rotation Left. Besides the local change caused by the rotation, the programmer must remember that the rest of the tree must be informed of this change. Here, this means that 2's right child must be reset to point to 4 instead of 3. This is easy to forget to do and would destroy the tree (4 would be inaccessible).

96

Next, we insert 6. This causes a balance problem for the root, since its left subtree is of height 0, and its right subtree would be height 2. Therefore, we perform a single rotation Left at the root between 2 and 4. The rotation is performed by making 2 a child of 4 and making 4's original left subtree the new right subtree of 2. Every key in this subtree must lie between 2 and 4, so this transformation makes sense.

The next key we insert is 7, which causes another rotation.

Suppose we insert keys 8 through 15 in reverse order. Inserting 15 is easy, since it does not destroy the balance property, but inserting 14 causes a height imbalance at node 7. In our example, the double rotation is a right-left double rotation and involves 7, 15, and 14. Here, k3 is the node with key 7, k1 is the node with key 15, and k2 is the node with key 14. Subtrees A, B, C, and D are all empty.

97 General case ( (Right-left) double rotation)

Next we insert 13, which requires a double rotation. Here the double rotation is again a right-left double rotation that will involve 6, 14, and 7 and will restore the tree. In this case, k3 is the node with key 6, k1 is the node with key 14, and k2 is the node with key 7. Subtree A is the tree rooted at the node with key 5, subtree B is the empty subtree that was originally the left child of the node with key 7, subtree C is the tree rooted at the node with key 13, and finally, subtree D is the tree rooted at the node with key 15.

If 12 is now inserted, there is an imbalance at the root. Since 12 is not between 4 and 7, we know that the single rotation Left will work.

98

Insertion of 11 will require a single rotation Right :

To insert 10, a single rotation needs to be performed, and the same is true for the subsequent insertion of 9. We insert 8 without a rotation, creating the almost perfectly balanced tree that follows.

Finally, we insert 8.5 to show the symmetric case of the double rotation. Notice that 8.5 causes the node containing 9 to become unbalanced. Since 8.5 is between 9 and 8 (which is 9's child on the path to 8.5 , a (Right-left) double rotation needs to be performed, yielding the following tree.

99 General case : (Right-left) double rotation

After inserting 8.5


8.5

After balancing struct avl_node { int element; struct avl_node *left; struct avl_node *right; int height; }; int height(struct avl_node *p ) { if( p == NULL )

100 return -1; else return p->height; } Insertion into an AVL tree struct avl_node *insert( int x, struct avl_node *T ) { return insert1( x, T, NULL ); }

struct avl_node * insert1( int x, struct avl_node *T, struct avl_node *parent ) { struct avl_node *rotated_tree; if( T == NULL ) { /* Create and return a one-node tree */ T = (struct avl_node *) malloc ( sizeof (struct avl_node) ); if( T == NULL ) printf("Out of space!!!"); else { T->element = x; T->height = 0; T->left = NULL T->right = NULL; } } else { if( x < T->element ) { T->left = insert1( x, T->left, T ); if( ( height( T->left ) - height( T->right ) ) == 2) { if( x < T->left->element ) rotated_tree = s_rotate_left( T ); else rotated_tree = d_rotate_left( T ); if( parent->left == T ) parent->left = rotated_tree; else parent->right = rotated_tree;

101 } else T->height = max( height(T->left), height(T->right) ) + 1; } else /* Symmetric Case for right subtree */; /* Else x is in the tree already. We'll do nothing */ } return T; } Routine to perform single rotation /* This function can be called only if k2 has a left child. */ /* Perform a rotate between a node (k2) and its left child. */ /* Update heights. */ /* Then return new root. */ struct avl_node *s_rotate_left(struct avl_node *k2 ) { struct avl_node *k1; k1 = k2->left; k2->left = k1->right; k1->right = k2; k2->height = max( height(k2->left), height(k2->right) ) + 1; k1->height = max( height(k1->left), k2->height ) + 1; return k1; /* New root */ } Routine to perform double rotation /* This function can be called only if k3 has a left child */ /* and k3's left child has a right child */ /* Do the left-right double rotation. Update heights */ struct avl_node *d_rotate_left( avl_ptr k3 ) { /* rotate between k1 and k2 */ k3->left = s_rotate_right( k3->left ); /* rotate between k3 and k2 */ return( s_rotate_left( k3 ) ); }

102 Time complexity of AVL tree operations Average Space Search Insert Delete O(n) O(log n) O(log n) O(log n) Worst case O(n) O(log n) O(log n) O(log n)

103 Splay Trees 1 Motivation Binary search trees (BST) are meant to solve dynamic dictionary problem. n keys are stored in such a way that each access/insert/delete operation costs O(log n) time while still preserving the properties of BST. Balanced BSTs can achieve this goal but at a costthey have to maintain extra balance information. On the other hand, splay trees can adjust the structure of itself effectively so that all standard search tree operations still can have an amortized time bound of O(log n). By amortized time, we mean the time per operation averaged over a worst-case sequence of operations.

Splay trees are binary search trees which are self-adjusting in the following way: Every time we access a node of the tree, whether for retrieval or insertion or deletion, we perform radical surgery on the tree, resulting in the newly accessed node becoming the root of the modified tree. This surgery will ensure that nodes that are frequently accessed will never drift too far away from the root whereas inactive nodes will get pushed away farther from the root.

Amortized complexity o Splay trees can become highly unbalanced so that a single access to a node of the tree can be quite expensive. o However, over a long sequence of accesses, the few expensive cases are averaged in with many inexpensive cases to obtain good performance. Does not need heights or balance factors as in AVL trees and colours as in Red-Black trees. The surgery on the tree is done using rotations, also called as splaying steps. There are six different splaying steps. 1. Zig Rotation (Right Rotation) 2. Zag Rotation (Left Rotation) 3. Zig-Zag (Zig followed by Zag) 4. Zag-Zig (Zag followed by Zig) 5. Zig-Zig 6. Zag-Zag

Consider the path going from the root down to the accessed node. Each time we move left going down this path, we say we ``zig'' and each time we move right, we say we ``zag.''

I. Zig Rotation and Zag Rotation Note that a zig rotation is the same as a right rotation whereas the zag step is the left rotation.

104

Figure 1: Zig rotation and zag rotation II. Zig-Zag This is the same as a double rotation in an AVL tree. Note that the target element is lifted up by two levels.

Figure 2: Zig-zag rotation

III.

Zag-Zig This is also the same as a double rotation in an AVL tree. Here again, the target element is lifted up by two levels.

105

Figure 3: Zag-zig rotation IV. Zig-Zig and Zag-Zag The target element is lifted up by two levels in each case. Zig-Zig is different from two successive right rotations; zag-zag is different from two successive left rotations.

Figure 4: Zig-zig and zag-zag rotations

Figure 5: Two successive right rotations V. See Figure 6 for an example of splaying

106

Figure 6: An example of splaying The above scheme of splaying is called bottom-up splaying. In top-down splaying, we start from the root and as we locate the target element and move down, we splay as we go. This is more efficient. Search, Insert, Delete in Bottom-up Splaying Search (i, t) If item i is in tree t, return a pointer to the node containing i; otherwise return a pointer to the null node.

Search down the root of t, looking for i If the search is successful and we reach a node x containing i, we complete the search by splaying at x and returning a pointer to x If the search is unsuccessful, i.e., we reach the null node, we splay at the last non-null node reached during the search and return a pointer to null. If the tree is empty, we omit any splaying operation.

Example of an unsuccessful search: See Figure 7

107

Figure 7: An example of searching in splay trees

Insert (i, t)

Search for i. If the search is successful then splay at the node containing i. If the search is unsuccessful, replace the pointer to null reached during the search by a pointer to a new node x to contain i and splay the tree at x

Figure 8: An example of an insert in a splay tree

Delete (i, t)

Search for i. If the search is unsuccessful, splay at the last non-null node encountered during search. If the search is successful, let x be the node containing i. Assume x is not the root and let y be the parent of x. Replace x by an appropriate descendent of y in the usual fashion and then splay at y.

108

Figure 9: An example of a delete in a splay tree Example

Figure 10. Result of splaying at node 1

109

Figure 11. Result of splaying at node 1 a tree of all left children An access on the node with key 2 will bring nodes to within n/4 of the root Figure 12 Result of splaying previous tree at node 2

Figure 13 Result of splaying previous tree at node 3

Figure 14 Result of splaying previous tree at node 4

110

Figure 15 Result of splaying previous tree at node 5

Figure 16 Result of splaying previous tree at node 6

Figure 17 Result of splaying previous tree at node 7

111 Figure 18 Result of splaying previous tree at node 8

Figure 19 Result of splaying previous tree at node 9

Creating a splay Tree: Insert the elements 1 2 3 7 5 2.5 Insert 1; 1 Insert 2: 1 2 Insert 3: 2 1 Zag 3 1 Insert 7 3 2 1 1 7 2 Zag 3 7 3 2 2 Zag 1

112

Insert 5 ( added as right of left tree ) ZagZig Rotation 7 5 5 2 1 ( added as right of left tree ) 5 3 2 1
2. 5

7 3 2 1 3

5 3 2 1 7

Insert 2.5

ZagZig Rotation 5

7
2. 5

7
2. 5

5 7 3 1
2

2. 5

5 3 7

2 1 1

Insert 6
2.5 2 2.5

2.5 2

ZigZag 5
2

5 3 6 7 1

6 5 7

3 6

3 Zag

6 2.5

7 5

2 3 1

113

114

B-Trees Definition B-tree is a m-way search tree in which The root has at least one key Non-root nodes have at least m/2 subtrees (i.e., at least (m - 1)/2 keys) All the empty subtrees (i.e., external nodes) are at the same level

B-trees are especially useful for trees stored on disks, since their height, and hence also the number of disk accesses, can be kept small. The growth and contraction of m-way search trees occur at the leaves. On the other hand, B-trees grow and contract at the root. Insertions Insert the key to a leaf. Overfilled nodes should send the middle key to their parent, and split into two at the location of the submitted key.

115

Deletions

Key that is to be removed from a node with non-empty subtrees is being replaced with the largest key of the left subtree or the smallest key in the right subtree. (The replacement is guaranteed to come from a leaf.)

116

If a node becomes under staffed, it looks for a sibling with an extra key. If such a sibling exist, the node takes a key from the parent, and the parent gets the extra key from the sibling.

117

118

If a node becomes under staffed, and it cant receive a key from a sibling, the node is merged with a sibling and a key from the parent is moved down to the node.

119

120

121

122

123

124

125

126

HASHING

127

128

129

130

131

132

133

Binary Heaps
A binary heap is a binary tree with the following properties 1. Structure property A binary heap is a complete binary tree Each level is completely filled Bottom level may be partially filled from left to right Height of a complete binary tree with N elements is Log2N 2. Heap-order property: The data item stored in each node is less than or equal to the data items stored in its children. Array implementation of heaps: Given element at position i in the array is left child is at position 2i is right child is at position 2i+1 is parent is at position

Binary heaps can either be a maximum heap or minimum heap. In a maximum heap, every node indexed by i, other than the root, has A[Parent(i)] A[i]. In a minimum heap, every node indexed by i, other than the root, has A[Parent(i)] A[i].

Note: This property makes a binary heap diffrent from a binary search tree. In a binary tree, leftchild parent<rightchild. However, in a minimum heap, parent leftchild and parent rightchild. So a binary search tree can be viewed as sorted, while a heap cannot.

134

Basic Heap Operations:


1. Insertion: The new element to be inserted on the binary heap tree is placed next to the last element of the array containing the heap . This is called the hole position. Next it is moved up in the tree, to find its rightful place on the tree such that the heap order property is maintained. The reheapUp algorithm to do this is as follows: PercolateUp Algorithm for MinHeap While (hole is not at the root and element < holes parent) { Move holes parent down Move hole up } Place element into final hole.

135

Now the hole is in proper place, as the hole value is not less than the parent value. The insertion algorithm terminates here. The final tree is shown below:

It is easily seen that irrespective of the value to be inserted, the maximum hole movement is limited to the height of the heap tree, and thus the insertion takes O(log N) operations.

136

Example 2:

Example 3: Construct Max-Heap

2. Creation: The binary heap tree can be created by inserting each element in an empty tree following the above algorithm. The time complexity of the operations for every node is bounded by O(log i), which sums up to O( N) for N elements. 3. Deletion: In a binary heap tree, we are not interested in removing a random element in the tree, but we want to delete the node with minimum value which is sitting at the root. Deleting the root node from a tree having N elements, leaves a hole at the root. Now the tree must be readjusted by moving the hole down. This is done by PercolateDown algorithm. Delete_min( ) Put the last element in the root ( That is H [1] = H [N--] ); PercolateDown ( 1 );

137

PercolateDown algorithm: As long as current size is not exceeded If the hole is greater than its children, swap it with its smaller of the two children The worst case running time of delete_min is O(log N) as the hole moves from the root to its rightful place in the tree and this movement is limited by the height of the tree. Example 1: deletion in Max-Heap

Example 2: Deletion in Min-Heap

138

139

Printf ( Priority Queue empty );

140

5. Building a heap
struct heap_struct { /* Maximum # that can fit in the heap */ int max_heap_size; /* Current # of elements in the heap */ int size; int *elements; }; typedef struct heap_struct *PRIORITY_QUEUE;

PRIORITY_QUEUE create_pq(int max_elements ) { PRIORITY_QUEUE H; if( max_elements < MIN_PQ_SIZE ) printf("Priority queue size is too small");

141
H = (PRIORITY_QUEUE) malloc ( sizeof (struct heap_struct) ); if( H == NULL ) printf("Out of space!!!");

/* Allocate the array + one extra for sentinel */ H->elements = (int *) malloc( ( max_elements+1) * sizeof (int) ); if( H->elements == NULL ) printf("Out of space!!!"); H->max_heap_size = max_elements; H->size = 0; H->elements[0] = MIN_DATA; return H; } Function to perform insert in a binary heap /* H->element[0] is a sentinel */ Void insert( int x, PRIORITY_QUEUE H ) { int i; if( is_full( H ) ) printf("Priority queue is full"); else { i = ++H->size; while( H->elements[i/2] > x ) { H->elements[i] = H->elements[i/2]; i /= 2; } H->elements[i] = x; } }

Function to perform delete_min in a binary heap Int { int i, child; int min_element, last_element; delete_min( PRIORITY_QUEUE H )

142
if( is_empty( H ) ) { printf("Priority queue is empty"); return H->elements[0]; } min_element = H->elements[1]; last_element = H->elements[H->size--]; for( i=1; i*2 <= H->size; i=child ) { /* find smaller child */ child = i*2; if((child != H->size) && (H->elements[child+1] < H->elements[child])) child++; /* percolate one level */ if( last_element > H->elements[child] ) H->elements[i] = H->elements[child]; else break; } H->elements[i] = last_element; return min_element; }

Heap Practical Example : Priority Queue Many applications require that we process records with keys/priorities in order, but not necessarily in full sorted order. (e.g. CPU process scheduling). Items in a priority queue are not processed strictly in order of entry into the queue Items can be placed on the queue at any time, but processing always takes the item with the largest key/ priority Applications of priority queues include: Simulation Systems, where events need to be processed chronologically. Job scheduling in computer systems.

Bandwidth management

143 Priority queuing can be used to manage limited resources such as bandwidth on a transmission line from a network router. In the event of outgoing traffic queuing due to insufficient bandwidth, all other queues can be halted to send the traffic from the highest priority queue upon arrival. This ensures that the prioritized traffic (such as real-time traffic, e.g. an RTP stream of a VoIP connection) is forwarded with the least delay and the least likelihood of being rejected due to a queue reaching its maximum capacity. All other traffic can be handled when the highest priority queue is empty. Another approach used is to send disproportionately more traffic from higher priority queues. Many modern protocols for Local Area Networks also include the concept of Priority Queues at the Media Access Control (MAC) sub-layer to ensure that high-priority applications (such as VoIP or IPTV) experience lower latency than other applications which can be served with Best effort service. Examples include IEEE 802.11e (an amendment to IEEE 802.11 which provides Quality of Service) and ITU-T G.hn (a standard for high-speed Local area network using existing home wiring (power lines, phone lines and coaxial cables). Usually a limitation (policer) is set to limit the bandwidth that traffic from the highest priority queue can take, in order to prevent high priority packets from choking off all other traffic. This limit is usually never reached due to high lever control instances such as the Cisco Callmanager, which can be programmed to inhibit calls which would exceed the programmed bandwidth limit.

Discrete event simulation


Another use of a priority queue is to manage the events in a discrete event simulation. The events are added to the queue with their simulation time used as the priority. The execution of the simulation proceeds by repeatedly pulling the top of the queue and executing the event thereon.

A* and SMA* search algorithms


The A* search algorithm finds the shortest path between two vertices of a weighted graph, trying out the most promising routes first. The priority queue (also known as the fringe) is used to keep track of unexplored routes; the one for which a lower bound on the total path length is smallest is given highest priority. If memory limitations make A* impractical, the SMA* algorithm can be used instead, with a double-ended priority queue to allow removal of low-priority items.

ROAM triangulation algorithm


The Real-time Optimally Adapting Meshes (ROAM) algorithm computes a dynamically changing triangulation of a terrain. It works by splitting triangles where more detail is needed and merging them where less detail is needed. The algorithm assigns each triangle in the terrain a priority, usually related to the error decrease if that triangle would be split. The algorithm uses two priority queues, one for triangles that can be split and another for triangles that can be merged. In each step the triangle from the split queue with the highest priority is split, or the triangle from the merge queue with the lowest priority is merged with its neighbours.

144

Leftist Heaps
The null path length of a binary tree is the shortest path from the root to a node without two children A leftist heap is one in which: o both sub-trees are leftist heaps, and o the left sub-tree has a null path length greater than or equal to the that of the right sub-tree The definition suggests a bias to the left, however, this should not suggest that the tree is necessarily left-heavy The following are leftist trees (null path length in bold):

As there is no relation between the nodes in the sub-trees of a heap: o If both the left and right sub-trees are leftist heaps but the root does not form a leftist We only need to swap the two sub-trees

heap,

145 o We can use this to merge two leftist heaps When inserting a new node into a tree, a new one-node tree is created and merged into the existing tree. To delete a minimum item, we remove the root and the left and right sub-trees are then merged. Both these operations take O(log n) time. For insertions, this is slower than binary heaps which support insertion in amortized constant time, O(1) and O(log n) worst-case. Leftist trees are advantageous because of their ability to merge quickly, compared to binary heaps which take (n).

Merging Given two leftist heaps, recursively merge the larger value with the right subheap of the root Traversing back to the root, swap trees to maintain the leftist heap property Dequeuing remove the top node and merge the two sub-trees together

146

The heaps are merged, but the result is not a leftist heap as 3 is unhappy. We must recurse to the root and swap sub-heaps where necessary. Find the unhappy nodes after updating the null path lengths.

147

148

149

Skew Heap
Skew heap heap-ordered binary tree without a balancing condition. With these, there is no guarantee that the depth of the tree is logarithmic. It supports all operations in logarithmic amortized time. It is somewhat like a splay tree.

Merging
Many operations with heap-ordered trees can be done using merging. Insert create a one-node tree containing x and merge that tree into the priority queue. Find minimum return the item at the root of the priority queue. Delete minimum delete the root and merge its left and right subtrees. Decrease the value of a node assume that p points to the node in the priority queue. Lower the value of ps key. Detach p from its parent, which yields two priority queues. Merge the two resulting priority queues. Thus, we need only see how merging priority queues is implemented.

Simplistic Merging of Heap-Ordered Trees Assume we have two heap-ordered trees, H1 and H , that need to be merged. If either tree is empty, the other tree is the merged tree. Otherwise, compare the roots. Recursively merge the tree with the larger root into the right subtree of the tree with the smaller
root.

Figure 1 Simplistic merge of heap-ordered trees

The practical effect of the above operation is in fact an ordered arrangement consisting only of a single right path. Thus the operations can be linear.

The Skew Heap A Simple Modification

We can make a simple modification to the merge operation and get better results. Prior to the completion of a merge, we swap the left and right children for every node in the resulting right path of the temporary tree.

150

Figure 2 Merging a skew heap

When a merge is performed in this way, the heap-ordered tree is also called a skew heap. Lets consider this operation from a recursive point of view. Let L be the tree with the smaller root and R be the other tree. If one tree is empty, the other is the merged result. Otherwise, let Temp be the right subtree of L. Make Ls left subtree its new right subtree. Make the result of the recursive merge of Temp and R, the new left subtree of L. The result of child swapping is that the length of the right path will not be unduly large all the time. The amortized time needed to merge two skew heaps is (log n).

151 Example:

152

Binomial Queues
A binary heap provides O(log n) inserts and O(log n) deletes but suffers from O(n log n) merges A binomial queue offers O(log n) (average is constant time) inserts and O(log n) deletes and O(log n) merges A Binomial Queue is a collection of heap-ordered trees known as a forest. Each tree is a binomial tree. A recursive definition is: 1. A binomial tree of height 0 is a one-node tree. 2. A binomial tree, Bk, of height k is formed by attaching a binomial tree Bk1 to the root of another binomial tree Bk1 .

153

154

155 DATA SORTING The efficiency of handling data can be substantially improved if the data is sorted according to some criteria of order. In a telephone directory we are able to locate a phone number, only because the names are alphabetically ordered. Same thing holds true for listing of directories created by us on the computer. Retrieval of a data item would be very time consuming if we dont follow some order to store book indexes, payrolls, bank accounts, customer records, items inventory records, especially when the number of records is pretty large. We want to keep information in a sensible order. It could be one of the following schemes: alphabetical order ascending/descending order order according to name, ID, year, department etc. The aim of sorting algorithms is to organize the available information in an ordered form. There are dozens of sorting algorithms. The more popular ones are listed below: 1. Internal Sorting Selection Sort - Find the largest element in the array, and put it in the proper place. Repeat until array is sorted. This is also slow Bubble Sort - Exchange two adjacent elements if they are out of order. Repeat until array is sorted. This is a slow algorithm. Insertion Sort - Scan successive elements for out of order item, then insert the item in the proper place. Sort small array fast, big array very slowly. Shell Sort - Sort every Nth element in an array using insertion sort. Repeat using smaller N values, until N = 1. On average, Shellsort is fourth place in speed. Shellsort may sort some distributions slowly. Quick Sort - Partition array into two segments. The first segment all elements are less than or equal to the pivot value. The second segment all elements are greater or equal to the pivot value. Sort the two segments recursively. Quicksort is fastest on average, but sometimes unbalanced partitions can lead to very slow sorting. Merge Sort - Start from two sorted runs of length 1, merge into a single run of twice the length. Repeat until a single sorted run is left. Mergesort needs N/2 extra buffer. Performance is second place on average, with quite good speed on nearly sorted array. Mergesort is stable in that two elements that are equally ranked in the array will not have their relative positions flipped. Heap Sort Form a tree with parent of the tree being larger than its children. Remove the parent from the tree successively. On average, Heapsort is third place in speed. Heapsort does not need extra buffer, and performance is not sensitive to initial distributions. 2. External Sorting We are interested in finding out as to which algorithms are best suited for a particular situation. The efficiency of a sorting algorithm can be worked out by counting the number of comparisons and the number of data movements involved in each of the

156 algorithms. The order of magnitude can vary depending on the initial ordering of data. How much time does a computer spend on data ordering if the data is already ordered? We often try to compute the data movements, and comparisons for the following three cases: - best case ( often, data is already in order), - worst case( sometimes, the data is in reverse order), - average case( data in random order). Some sorting methods perform the same operations regardless of the initial ordering of data. Why should we consider both comparisons and data movements? If simple keys are compared, such as integers or characters, then the comparisons are relatively fast and inexpensive. If strings or arrays of numbers are compared, then the cost of comparisons goes up substantially. If on the other hand, the data items moved are large, such as structures, then the movement measure may stand out as the determining factor in efficiency considerations.

157

Insertion Sort
The key idea is to pick up a data element and insert it into its proper place in the partial data considered so far. An outline of the algorithm is as follows: The list is divided into two parts: sorted and unsorted. In each pass, the first element of the unsorted part is picked up, transferred to the sorted sublist, and inserted at the appropriate place. A list of n elements will take at most n-1 passes to sort the data.

Insertion Sort Example

Insertion Sort Algorithm /* With each pass, first element in unsorted sublist is inserted into sorted sublist. */ void insertionSort(int a[], int n) {

158 int i, temp, j; for (i = 1; i <= n ; i++) { temp = a[i]; for ( j = i; j >= 0 && temp<a[j]; j--) { if( temp<a[j] ) a[j]= a[ j-1 ]; else break; } a[j] = temp; } return; } Algorithm Analysis: An advantage of this method is that it sorts the array only when it is really necessary. If the array is already in order, no moves are performed. However, it overlooks the fact that the elements may already be in their proper positions . When an item is to be inserted, all elements greater than this have to be moved. There may be large number of redundant moves, as an element ( properly located) may be moved , but later brought back to its position. Best case: the data is already sorted. The inner loop is never executed, and the outer loop is executed n 1 times for total complexity of O(n). Worst case: data in reverse order. The inner loop is executed the maximum number of times. Thus the complexity of the insertion sort in this worst possible case is quadratic or O(n2).

159

Mergesort
The mergesort-sorting algorithm uses the divide and conquer strategy of solving problems in which the original problem is split into two problems, with size about half the size of the original problem. The basic idea is as follows. Suppose you have got a large number of integers to sort. Write each integer on a separate slip of paper. Make two piles of the slips. So the original problem has been reduced to sorting individually two piles of smaller size. Now reduce each pile to half of the existing size. There would be now 4 piles with a smaller set of integers to sort. Keep on increasing the number of piles by reducing their lengths by half every time. Continue with the process till you have got piles with maximum two slips in each pile. Sort the slips with smaller number on the top. Now take adjacent piles and merge them such that resulting pile is in sorted form. The piles would keep growing in size but now this time these are in sorted order. Stop when all piles have been taken care of and there remains one single pile. Thus the mergesort can be thought of as a recursive process. Let us assume that the elements are stored in an array. Mergesort Algorithm: 1. Divide Step If given array A has zero or one element, return S; it is already sorted. Otherwise, divide A into two arrays, A1 and A2, each containing about half of the elements of A. 2. Recursion Step Recursively sort array A1 and A2. 3. Conquer Step Combine the elements back in A by merging the sorted arrays A1 and A2 into a sorted sequence. What would be the complexity of the process? Since this algorithm uses the divide and conquer strategy and employs the halving principle, the sorting process would have O(log2 n) complexity. However, the merging operation would involve movement of all the n elements (linear time ), and we shall show later that the overall complexity turns out to be O(N log2 N). We can merge two input arrays A and B to result in a third array C. Let the index counter for the respective arrays be actr, bctr, and cctr. The index counters are initially set to the position of the first element. The smaller of the two elements A[actr] and B[bctr] is stored in C[cctr] as shown below: if A[actr] < B[bctr] { C[cctr] = A[actr]; cctr++; actr++; } else {

160 C[cctr] = B[bctr]; cctr++; bctr++; } Let us take an example. Say at some point in the sorting process we have to merge two lists 1, 3, 4, 7 and 2, 5, 9, 11 . We store the first list in Array A and the second list in Array B. The merging goes in following fashion: We store the first list in Array A and the second list in Array B. The merging goes in following fashion:

Example: Linear Merge A


1 3 4 7

B
2 5 9 11

actr
1 3 4 7

bctr
2 5 9 11

cctr
1

3 4

2 5 9 11

3 4

2 5 9 11

3 4

2 5 9 11

3 4

4 7

2 5 9 11

2 3

4 5

3 4

5 9 11

4 7

2 5 9 11

4 7

2 5 9 11

11

161 The array is recursively split into two halves and mergesort function is applied on the two arrays separately. The arrays get sorted. Then these two arrays are merged. The mergesort and merge functions are shown below: #include<stdio.h> void merge ( int a [], int a1[], int n1,int a2[], int n2) { int j, p=0, p1=0,p2=0; while ( p1 < n1 && p2 < n2 ) { if( a1[p1] < a2[p2 ]) array [p++] = a1[p1++]; else a[p++] = a2[p2++]; } while ( p1 < n1 ) array [p++] = a1[p1++]; while ( p2 < n2 ) a[p++] = a2[p2++]; } void mergesort(int a[10], int n) { int j,n1,n2,a1[10],a2[10]; if (n>1) { n1=n/2; n2 = n - n1; for ( j = 0; j<n1; j++ ) a1[j]= a[j]; for ( j = 0; j<n2; j++ ) a2[j]= a[j+n1]; mergesort(arr1, n1); mergesort(arr2, n2); merge(array, arr1, n1, arr2, n2); } } main() { int i,j,k,n,tmp,a[10]; clrscr(); scanf("%d",&n); for(i=0;i<n;i++)

162 scanf("%d",&a[i]); mergesort(a,n); printf("sorted list\n"); for(i=0;i<n;i++) printf("%d\n",a[i]); getch(); } Model outout: After merging [31 ] &[45 ] After merging [24 ] & [15 ]

merged array is [31 45 ] merged array is [15 24 ] merged array is [15 24 31 45 ]

After merging [31 45 ] & [15 24 ] After merging [23 ] & [92 ] After merging [30 ] & [77 ]

merged array is [23 92 ] merged array is [30 77 ] merged array is [23 30 77 92 ]

After merging [23 92 ] & [30 77 ]

After merging [15 24 31 45 ] & [23 30 77 92 ] merged array is [15 23 24 30 31 45 77 92 ] To sort an array A of size n, the call from the main program would be mergesort(A, n). Here is a typical run of the algorithm, for an array of size 8. Unsorted array A = [31 45 24 15 23 92 30 77 ] The original array is kept on splitting in two halves till it reduces to array of one element. Then these are merged. [31 45 24 15 23 92 30 77 ] [31 45 24 15] [15 23 24 30 31 45 77 92 ] [23 92 30 77 ] [23 92]

[15 24 31 45 ] [24 15]

[23 30 77 92 ] [30 77 ] [30 77 ]

[31 45 ]

[31 45 ]

[15 24 ]

[23 92 ]

31

45

24

15

23

92

30

77

163 Computational Complexity: Intuitively we can see that as the mergesort routine reduces the problem to half its size every time, (done twice), it can be viewed as creating a tree of calls, where each level of recursion is a level in the tree. Effectively, all n elements are processed by the merge routine the same number of times as there are levels in the recursion tree. Since the number of elements is divided in half each time, the tree is a balanced binary tee. The height of such a tree tends to be log n. The merge routine steps along the elements in both halves, comparing the elements. For n elements, this operation performs n assignments, using at most n 1 comparisons, and hence it is O(n). So we may conclude that [ log n . O(n) ] time merges are performed by the algorithm. The same conclusion can be drawn more formally using the method of recurrence relations. Let us assume that n is a power of 2, so that we always split into even halves. The time to mergesort n numbers is equal to the time to do two recursive mergesorts of size n/2, plus the time to merge, which is linear. For n = 1 , the time to mergesort is constant. We can express the number of operations involved using the following recurrence relations: T(1) = 1 T(n) = 2 T(n/2) + n Using same logic and going further down T(n/2) = 2 T(n/4) + n/2 Substituting for T(n/2) in the equation for T(n),we get T(n) = 2[ 2 T(n/4) + n/2 ] + n = 4 T(n/4) + 2n Again by rewriting T(n/4) in terms of T(n/8), we have T(n) = 4 [ 2 T(n/8) + n/4 ] + 2n = 8 T(n/8) + 3 n = 23 T( n/ 23 ) + 3 n The next substitution would lead us to T(n) = 24 T(n/ 24 ) + 4 n

164

Continuing in this manner, we can write for any k, T(n) = 2k T(n/ 2k ) + k n This should be valid for any value of k. Suppose we choose k = log n, i.e. 2k = n. Then we get a very neat solution: T(n) = n T(1) + n log n = n log n + n Thus T(n) = O( n log n) This analysis can be refined to handle cases when n is not a power of 2. The answer turns out to be almost identical. Although mergesorts running time is very attractive, it is not preferred for sorting data in main memory. The main problem is that merging two sorted lists uses linear extra memory ( as you need to copy the original array into two arrays of half the size), and the additional work spent copying to the temporary array and back, throughout the algorithm, has the effect of slowing down the sort considerably. The copying can be avoided by judiciously switching the roles of list and temp arrays at alternate levels of the recursion. For serious internal sorting applications, the algorithm of choice is the Quicksort, which we shall be studying next.

165 Quicksort As the name implies the quicksort is the fastest known sorting algorithm in practice. It has the best average time performance. Like merge sort, Quicksort is also based on the divide-and-conquer paradigm. But it uses this technique in a somewhat opposite manner, as all the hard work is done before the recursive calls. It works by partitioning an array into two parts, then sorting the parts independently, and finally combining the sorted subsequences by a simple concatenation.

In particular, the quick-sort algorithm consists of the following three steps: 1. Choosing a pivot::. To partition the list, we first choose some element from the list which is expected to divide the list evenly in two sublists. This element is called a pivot. 2. Partitioning: Then we partition the elements so that all those with values less than the pivot are placed in one sublist and all those with greater values are placed in the other sublist. 3. Recur: Recursively sort the sublists separately. Repeat the partition process for both the sublists. Choose again two pivots for the two sublists and make 4 sublists now. Keep on partitioning till there are only one cell arrays that do not need to be sorted at all. By dividing the task of sorting a large array into two simpler tasks and then dividing those tasks into even simpler task, it turns out that in the process of getting prepared to sort , the data have already been sorted. This is the core part of the quicksort. The steps involved in the quicksort algorithm are explained through an example. Example array: [ 56 25 lh 37 58 95 19 73 30 ] rh

1. Choose first element 56 as pivot. 2. Move rh index to left until it coincides with lh or points to value smaller than pivot. In this example it already points to value smaller than the pivot. 3. Move lh index to right until it coincides with rh or points to value equal to or greater than pivot. [ 56 25 37 58 lh 95 19 73 30 ] rh

166 4. If lh and rh not pointing to same element , exchange the elements. [56 25 37 30 lh 95 19 73 58] rh

5. Repeat steps 2 to 4 until lh and rh coincide. [56 25 37 30 lh 95 19 rh 73 58]

move lh to right till it finds an element larger than 56 [56 25 37 30 95 lh 19 rh 73 58]

Now exchange [56 25 37 30 19 lh 95 rh 73 58]

move rh to left [56 19 95 73 58] lh rh No exchange now, as both lh and rh point to the same element 6. This will be the smallest value. Exchange this with the pivot. [19 25 37 30 56 95 73 58] 25 37 30

7. This will result in two sub arrays, one to left of pivot and one to right of pivot. [19 25 37 30 ] 56 [ 95 73 58 ]

8. Repeat steps 1 to 6 for the two sub arrays recursively.

167

Implementation of Quicksort:
quicksort( int a[], int low, int high ) { int pivot; /* Termination condition! */ if ( low<hign ) { pivot = partition( a, low, high ); quicksort( a, low, pivot-1 ); quicksort( a, pivot+1, high ); } } int partition( int a[], int low, int high ) { int left, right,temp; int pivot_item; pivot_item = a[low]; pivot = low; left =low+1; right = high; while ( left < right ) { /* Move left while item < pivot */ while( a[left] <= pivot_item ) left++; /* Move right while item > pivot */ while( a[right] >= pivot_item ) right--; if ( left < right ) { temp = a[left]; a[left]=a[right]; a[right]=temp; }

168
} /* right is final position for the pivot */ a[low] = a[right]; a[right] = pivot_item; return right; }

Exercise: sort the following numbers 23, 12, 15, 38, 42,18,36,29,27

Picking the Pivot:


first element bad if input is sorted or in reverse sorted order bad if input is nearly sorted variation: particular element (e.g. middle element) random element even a malicious agent cannot arrange a bad input median of three elements choose the median of the left, right, and center elements

Analysis of Quicksort: To do the analysis let us use the recurrence type relation used for analyzing mergesort. We can drop the steps involved in finding the pivot as it involves only constant time. We can take T(0) = T(1) = 1. The running time of quicksort is equal to the running time of the two recursive calls, plus the linear time spent in the partition. This gives the basic quicksort relation T(N) the time for Quicksort on array of N elements, can be given by T(N) = T(j ) + T( N j 1 ) + N, Where j is the number of elements in the first sublist. Best case analysis: The best case analysis assumes that the pivot is always in the middle. To simplify the math, we assume that the two sublists are each exactly half the size of the original. Then we can follow the same analysis as in merge sort and can show that T(N) = T(N/2) + T(N/2) + 1 leads to T(N) = O( N log N )

169 Worst Case Analysis: The partitions are very lopsided, meaning that either left partition |L| = 0 or N 1 or right partition |R| = N 1 or 0, at each recursive step. Suppose that Left contains no elements, Right contains all of the elements except the pivot element (this means that the pivot element is always chosen to be the smallest element in the partition), 1 time unit is required to sort 0 or 1 elements, and N time units are required to partition a set containing N elements. Then if N > 1 we have: T(N) = T(N-1) + N This means that the time required to quicksort N elements is equal to the time required to recursively sort the N-1 elements in the Right subset plus the time required to partition the N elements. By telescoping the above equation we have: T(N) = T(N-1) + N T(N-1) = T(N-2) + (N-1) T(N-2) = T(N-3) + (N-2) .. T(2) = T(1) + 2 T(N) = T(1) + 2 + 3 + 4 + + N = N(N+1)/2 = O(N2) This implies that whenever a wrong pivot is selected it leads to unbalanced partitioning.

170

Heap Sort
The heap sort is the slowest of the O(n log n) sorting algorithms, but unlike the merge and quick sorts it doesn't require massive recursion or multiple arrays to work. This makes it the most attractive option for very large data sets of millions of items. Algorithm: 1. building a heap out of the data set, 2. removing the largest item and placing it at the end of the sorted array 3. After removing the largest item, reconstruct the heap and removes the largest remaining item and places it in the next open position from the end of the sorted array. 4. steps 2&3 repeated until there are no items left in the heap and the sorted array is full. Elementary implementations require two arrays - one to hold the heap and the other to hold the sorted elements. To do an in-place sort and save the space the second array would require, the algorithm below "cheats" by using the same array to store both the heap and the sorted array. Whenever an item is removed from the heap, it frees up a space at the end of the array that the removed item can be placed in. Pros: In-place and non-recursive, making it a good choice for extremely large data sets. Cons: Slower than the merge and quick sorts. As mentioned above, the heap sort is slower than the merge and quick sorts but doesn't use multiple arrays or massive recursion like they do. This makes it a good choice for really large sets, but most modern computers have enough memory and processing power to handle the faster sorts unless over a million items are being sorted. The "million item rule" is just a rule of thumb for common applications - high-end servers and workstations can probably safely handle sorting tens of millions of items with the quick or merge sorts. Heap sort : The siftDown() function builds and reconstructs the heap. void heapSort(int numbers[], int array_size) { int i, temp; for (i = (array_size / 2)-1; i >= 0; i--) siftDown(numbers, i, array_size); for (i = array_size-1; i >= 1; i--) { temp = numbers[0]; numbers[0] = numbers[i]; numbers[i] = temp; siftDown(numbers, 0, i-1);

171 } } void siftDown(int numbers[], int root, int bottom) { int done, maxChild, temp; done = 0; while ((root*2 <= bottom) && (!done)) { if (root*2 == bottom) maxChild = root * 2; else if (numbers[root * 2] > numbers[root * 2 + 1]) maxChild = root * 2; else maxChild = root * 2 + 1; if (numbers[root] < numbers[maxChild]) { temp = numbers[root]; numbers[root] = numbers[maxChild]; numbers[maxChild] = temp; root = maxChild; } else done = 1; } } Runningtimeanalysis: 1. BuildingaheaptreeofNelements TorepeatedlyinsertNelementsO(NlogN)time 2.PerformNdeleteMaxoperation eachdeleteMaxoperationrequiresO(logN)time todeleteNelemensO(NlogN) 3.RecordtheseelementsinasecondarrayandcopyitbackO(N) TotaltimecomplexityO(NlogN)

172 SHELL SORT

This sort algorithm is also called diminishing increment sort. Actually this naming is more meaningful than Shell Sorting Algorithm The algorithm sorts the sub-list of the original list based on increment value or sequence number k Common Sequence numbers are 5,3,1. There is no proof that these are the best sequence numbers. Each sub-list contains every kth element of the original list

Definition
For example: if k = 5 then sub-lists will be as follows. s[0] s[5] s[10] ... This means that there are 5 sub-lists and each contain 1/5 of the original list.
Sublist1: Sublist2: Sublist3: Sublist4: Sublist5: s[0] s[1] s[2] s[3] s[4] s[5] s[6] s[7] s[8] s[9] s[10] s[11] s[12] s[13] s[14] ... ... ... ... ...

If k = 3 then there will be three sub-lists and so on.

173

Shell Sort Process


Create the sub-lists based on increment number (Sequence number) Sort the lists Combine the lists Lets see this algorithm in action

Shell Sort Process


Lets sort the following list given the sequence numbers are 5, 3, 1 30 62 53 42 17 97 91 38

174

Shell Sort Process k=5


30 62 53 42 17 97 91 38
Step 1: Create the sub list k = 5 S[0] S[5] S[1] S[6] S[2] S[7] S[3] S[4]

30 62 53 42 17 97 91 38
[0] [1] [2] [3] [4] [5] [6] [7]

Step 2 - 3: Sort the sub list & combine S[0] < S[5] This is OK S[1] < S[6] This is OK S[2] > S[7] This is not OK. Swap them

30 62 38 53 42 17 97 91 53 38
[0] [1] [2] [3] [4] [5] [6] [7]

Shell Sort Process k=3


30 62 53 42 17 97 91 38
Step 1: Create the sub list k = 3 S[0] S[3] S[6] S[1] S[4] S[7] S[2] S[5] Step 2 - 3: Sort the sub list & combine S[0] S[3] S[6] S[1] S[4] S[7] SORT them S[2] S[5] 30, 42, 91 OK 62, 17, 53 not OK 17, 53, 62 38, 97 OK

30 62 38 42 17 97 91 53
[0] [1] [2] [3] [4] [5] [6] [7]

30 17 62 38 42 53 17 97 91 62 53
[0] [1] [2] [3] [4] [5] [6] [7]

175

Shell Sort Process k=1


30 62 53 42 17 97 91 38
S[7] Step 1: Create the sub list k =1 S[0] S[1] S[2] S[3] S[4] S[5] S[6]

30 17 38 42 53 97 91 62
[0] Step 2 - 3: Sort the sub list & combine Sorting will be like insertion sort [1] [2] [3] [4] [5] [6] [7]

30 30 17 17 38 42 53 62 97 91 97 62
[0] [1] [2] [3] [4] [5] [6] [7]

DONE

Shellsort Analysis
The running time of Shellsort depends heavily on the choice of increment sequences. Shell suggested starting gap at N/2 and halving it until reaches 1. Shellsort despite of three nested loop, represents a substantial improvement over the insertion sort

176

Shellsort Analysis
When Shells increment are used, the worst case can be O(N2). When N is an exact power of two one can prove that the average running time is O(N3/2). A minor change to the increment sequence can improve the quadratic worst case from occurring.and it becomes even, then we can add one to make it odd. We can then prove that the worst case is not quadratic but only O(N3/2).
Ref: Mark Allan Wiess

Shell sort is a relatively fast algorithm and is easy to code. It attempts to roughly sort the data first, moving large elements towards one end and small elements towards the other. It performs several passes over the data, each finer than the last. After the final pass, the data is fully sorted. Shell sort quickly arranges data by sorting every nth element, where n can be any number less than half the number of data. Once the initial sort is performed, n is reduced, and the data is sorted again until n equals 1. It is vital that the data is finally sorted with n = 1, otherwise there may be out-of-order elements remaining.

Selecting good values for n


Choosing n is not as difficult as it might seem. The only sequence you have to avoid is one constructed with the powers of 2. Do not choose (for example) 16 as your first n, and then keep dividing by 2 until you reach 1. It has been mathematically proven that using only numbers from the power series {1, 2, 4, 8, 16, 32, ...} produces the worst sorting times. The fastest times are (on average) obtained by choosing an initial n somewhere close to the maximum allowed and continually dividing by 2.2 until you reach 1 or less. Remember to always sort the data with n = 1 as the last step. Function for Shell sort: #include<stdio.h> #include<conio.h>

177 main() { int i,j,k,tmp,a[10]; clrscr(); scanf("%d",&n); for(i=0;i<n;i++) scanf("%d",&a[i]); for(k=n/2;k>0;k/=2) for(i=k;i<n;i++) { tmp=a[i]; for(j=i;j>=k;j=j-k) { if(tmp<a[j-k]) a[j]=a[j-k]; else break; } a[j]=tmp; } printf("sorted list\n"); for(i=0;i<6;i++) printf("%d\n",a[i]); getch(); } Example: The data in the table needs to be sorted into ascending order. For simplicity, we have chosen the sequence {3, 2, 1} for our n. The top line of each table shows the data before we performed that step, the bottom shows the data afterwards. In this example, we will assume there is a selection sort being used to do the actual sorting. 8 4 1 5 7 6 9 3 2 The unsorted data

1. As mentioned above, we will be using an initial value of 3 for n. This is less than the maximum (which in this case is 4, because this is the largest number less than half the number of elements we have). We pretend that the only elements in the data set are elements containing 5, 8 and 9 (highlighted in bold). Notice that this is every 3rd element (n is 3). After sorting these, we look at the elements to the right of the ones we just looked at. Repeat

178 the sort and shift routine until all of the elements have been looked at once. So if n = 3, you will need to repeat this step 3 times to sort every element. Note that only the highlighted elements are ever changed; the ones with a white background are totally ignored.

8 5

4 4

1 1

5 8

7 7

6 6

9 9

3 3

2 2

Put the 8, 5 and 9 into ascending order (ignoring the other elements)

5 5

4 3

1 1

8 8

7 4

6 6

9 9

3 7

2 2

Do the same for 4, 7 and 3 5 5 3 3

1 1

8 8

4 4

6 2

9 9

7 7

2 6

As well as 1, 6 and 2...

2. Now that all of the elements have been sorted once with n = 3, we repeat the process with the next value of n (2). If you look carefully, there is a general congregation of large numbers at the right hand side, and the smaller numbers are at the left. There are still quite a few misplaced numbers (most notably 8, 2, 5 and 6), but it is better sorted than it was.

5 1

3 3

1 4

8 8

4 5

2 2

9 6

7 7

6 9

Place the odd elements into ascending order

179

1 1

3 2

4 4

8 3

5 5

2 7

6 6

7 8

9 9

And the same for the even elements...

3. You can see now that the data is almost completely sorted - after just 2 more steps! All that remains is to sort it again with n = 1 to fix up any elements that are still out of order. When n = 1, we are just performing a normal sort, making sure every element in the dataset is in its correct place. You may wonder why don't we just skip to this step. Yes, doing that would work, however the selection and bubble sorts are fastest when the data is already sorted (or close to). The shell sort method orders the data in fewer steps than would be required for either of the above methods.

1 1

2 2

4 3

3 4

5 5

7 6

6 7

8 8

9 9

The few elements which are still out of order are fixed.

All sorted!

180

181

182

External Sorting
Performing sorting operations on amounts of data that are too large to fit into main memory.

Replacement Selection
1. Choose as large a priority queue as possible, say of M elements 2. Sort Step o Initialization Step: Read M records into the priority queue o Replacement Step (creating a single run): 1. Delete the smallest record from the priority queue and write it out 2. Read a record from the input file. If the new element is smaller than the last one output, it cannot become part of the current run. Mark it as belonging to the next run and treat it as greater than all the unmarked elements in the queue. 3. Terminate the run when a marked element reaches the top of the queue 3. Merge Step: Same as before

Das könnte Ihnen auch gefallen