Sie sind auf Seite 1von 7

CS301 Data Structures

Lecture No. 43
___________________________________________________________________

Data Structures
Lecture No. 43
Reading Material
Data Structures and Algorithm Analysis in C++

Chapter. 5, 7
5.4, 5.5, 5.6, 7.1

Summary

Hashing Animation
Applications of Hashing
When Hashing is Suitable?
Sorting
Elementary Selection Algorithms
Selection Sort

Hashing Animation
In the previous lecture, we discussed about collision strategies in hashing. We studied
three solutions, Linear Probing, Quadratic Probing and Linked List chaining. Hashing is
vast research field, which covers hash functions, storage and collision issues etc. At the
moment, we will see hashing in implementation of table ADT. Operations of insert,
delete and find are performed in constant time using this hashing strategy. Constant time
means the time does not increase with the increase in data volume. However, if collisions
start happening then the time does not remain constant. Especially if we see linear
probing, we had to insert the data by sorting the array sequentially. Similar was the case
with quadratic. In case of linked list, we start constructing linked list that takes time and
memory. But later we will see some situations where hashing is very useful.
Today, we will study these three strategies of hash implementation using animations.
These animations will be provided to you in a Java program. It is important to mention
here that all the data structures and algorithms we have studied already can be
implemented using any of the languages of C/C++ or Java. However, it is an important
decision to choose the programming language because every language has its strong area,
where it has better application. Java has become very popular because of its facilities for
Internet. As you already know C++, therefore, Java is easy to learn for you. The syntax
is quite similar. If we show you the java code you will say it is C++.
Lets see the hashing animation. This animation will be shown in the browser. This is an
applet written in java language. We will see linear probing, quadratic probing and link list
chaining in it. This is an example of how do we solve collision.
Page 1 of 7

CS301 Data Structures


Lecture No. 43
___________________________________________________________________
We have an array shown in four different columns. The size of the array is 100 and the
index is from 0 to 99. Each element of the array has two locations so we can store 200
elements in it. When we have first collision the program will use the 2 nd part of the array
location. When there is a 2nd collision the data is stored using the linear probing. At the
top right corner we have hash function x and its definition is mod 100. That is when a
number is passed to it, it will take mod with 100 and return the result which is used as
index of the array.
In this example we are using numbers only and not dealing with the characters. We also
have a statistical table on the right side. This program will generate 100 random numbers
and using the hash function it will store these in the array. The numbers will be stored in
the array at different locations. In the bottom left we have hashing algorithms. We have
chosen the linear probing. Now the program will try to solve this problem using the linear
probing. Press the run button to start it. It is selecting the numbers randomly, dividing it
by 100 and the remainder is the hash value which is used as array index.
Here we have number 506. It is divided by 100 and the remainder is 6. It means that this
number will be stored at the sixth array location. Similarly we have a number 206, now
its remainder is also 6. As location 6 is already occupied so we will store 206 at the 2 nd
part of the location 6. Now we have the number 806. Its remainder is also 6. As both the
parts of location 6 are occupied. Using the linear probing we will store it in the next array
location i.e. 7. If we have another number having the remainder as 6, we will store it at
the 2nd part of location 7. If we have number 807, its remainder is 7. The location 7 is
already occupied due to the linear probing. Therefore the number 807 will be stored using
the linear probing in the location 8. Now you can understand how the numbers are stored
in the array. You can also see some clustering effect in it. See the location 63. All the
numbers having remainder as 63 are clustered around location 63.
Lets change the collision resolution algorithm to quadratic probing. Run the animation
again. Now we have array size as 75 and the array is shown in three columns. Each
location of the array can store two numbers. In quadratic probing we add square of one
first i.e. 1 and then the square of two and so on in case of collisions. Here we have used a
different hash function. We will take the mod with 75. When the both parts of the array
location is filled we will use the quadratic probing to store the next numbers. Analyze the
numbers and see where the collisions have happened.
Lets see the animation using the linked list chaining. Now the hash function uses 50 to
take mod with the numbers. So far pointers are not shown. When both parts of the
location are filled, we will see the link list appearing. We have four numbers having
remainder 0. The two numbers will be stored in the array and the next two will be stored
using the link list which is attached at the 0 location.
We are not covering the hashing topic in much depth here as it is done in algorithms and
analysis of algorithms domain. This domain is not part of this course. For the time being,
we will see the usage of hashing. For certain situations, table ADT can be used, which
internally would be using hashing.
Page 2 of 7

CS301 Data Structures


Lecture No. 43
___________________________________________________________________

Applications of Hashing
Lets see few examples of those applications where hashing is highly useful. The hashing
can be applied in table ADT or you can apply hashing using your array to store and
retrieve data.
Compilers use hash tables to keep track of declared variables (symbol table).
Compilers use hash tables in order to implement symbol tables. A symbol table is an
important part of compilation process. Compiler puts variables inside symbol table during
this process. Compiler has to keep track of different attributes of variables. The name of
the variable, its type, scope and function name where it is declared etc is put into the
symbol table. If you consider the operations on symbol table, those can be insertion of
variable information, search of a variable information or deletion of a variable. Two of
these insert and find are mostly used operations. You might have understood already that
a variable name will be parameter (or the key) to these operations. But there is one slight
problem that if you named a variable as x outside of a code block and inside that code
block, you declared another variable of the similar type and name then only name cannot
be the key and scope is the only differentiating factor. Supposing that all the variables
inside the program have unique names, variable name can be used as the key. Compiler
will insert the variable information by calling the insert function and by passing in the
variable information. It retrieves the variable value by passing in the variable name. Well,
this exercise is related to your Compiler Construction course where you will construct
you own language compiler.
Another usage of hashing is given below:
A hash table can be used for on-line spelling checkers if misspelling detection
(rather than correction) is important, an entire dictionary can be hashed and words
checked in constant time.
You must have used spell checkers in a word processing program like MS Word. That
spell checker finds out mistakes, provides you correct options and prompts you to choose
any of the synonyms. You can also set the correct the words automatically.
Hashing can be used to find the spelling mistakes. For that you first take all the words
from the dictionary of spoken English and construct a hash table of those. To find the
spelling mistakes, you will take first word from the text that is being checked and
compare it with all the words present inside the hash table. If the word is not found in the
hash table then there is a high probability that the word is incorrect, although there is a
low probability that the word is correct but it is not present in the dictionary. Based on the
high probability a message can be displayed to the user of the application that the word is
wrong. MS Word does the same. As far the automatic correct feature is concerned, it is
another algorithm, which we are not going to discuss here.
Lets see few more examples in this connection.
Game playing programs use hash tables to store seen positions, thereby saving
Page 3 of 7

CS301 Data Structures


Lecture No. 43
___________________________________________________________________
computation time if the position is encountered again.
Normal computer games are graphical, there are positions that are chosen by the
computer or by the player. Consider the game of chess where one player has chosen one
position again. Here we can use the positions of the pieces (64 pieces) at that time as the
key to store it in the hash table. If our program wants to analyze that if a player has
encountered the similar situation before, can pass in the positions of the pieces to the
function find. Inside the function, when the positions are hashed again then the previously
present index is returned, which shows that the similar situation has been encountered
before.
See another example below:
Hash functions can be used to quickly check for inequality if two elements hash to
different values they must be different.
Sometimes in your applications, you dont want to know which value is smaller or bigger
but you are only interested in knowing if they are equal or not. For this, we can use
hashing. If the two data items dont collide, then their hash values will be different. Based
on this two values are said to be unequal.
Above was the situation when hashing can be useful. You may like to know in what
circumstances hashing is not a good solution to apply.

When Hashing is Suitable?


Hash tables are very good if there is a need for many searches in a reasonably stable
table.
We have just seen the excellent example of reasonably stable hash table when we
discussed hash table for English dictionary. We had constructed a hash table of all the
words inside dictionary and were looking for different words in it. So majorly, there were
frequent look up operations and insertions were in very minor frequency.
Hash tables are not so good if there are many insertions and deletions, or if table
traversals are needed in this case, AVL trees are better.
In some applications, it is required to frequently read and write data. In these kinds of
applications hash table might not be a good solution, AVL tree might be a good option.
But bear in mind that there are no hard and fast statistics to go for hash table and then to
AVL tree. You have to be a good software engineer to choose relevant data structure.
Also, hashing is very slow for any operations which require the entries to be sorted
o e.g. Find the minimum key
At times, you do other operations of insert, delete and find but additionally, you require
Page 4 of 7

CS301 Data Structures


Lecture No. 43
___________________________________________________________________
the data in sorted order or the minimum or maximum value. We have discussed it many
times that we insert data in the array of hash table without any sort order and it is
scattered through the array in such a fashion that there are holes in the array. In these
circumstances, the hash table is not useful. You might be remembering from the
animation we saw earlier on in this lecture that there was no real sequence of filling of
array. Some clusters were formed because of collision but there was no order as such. So
hashing is not really useful in these circumstances.
We are finishing with our discussion on hashing. The important thing is how we thought
about one data structure and internally we implemented in six different ways. You must
be remembering that as long as the interface of the data structures remains the same,
different internal implementations does not really matter from the client perspective.
Occasionally, somebody might be interested in knowing the internal implementation of
your data structure because that might be important for him in order to use your data
structure.
Lets move on to the next topic of Sorting. It is very vast topic and cannot be covered in
this course thoroughly.

Sorting
Sorting means to put the data in a certain order or sequence. We have discussed sorting
before in different scattered through topics in this course but it has not been discussed so
far as a separate topic. You must be remembering that when we traverse the binary search
tree in in-order way, the obtained data happens to be sorted. Similarly, we saw other data
structures, where we used to keep data in sorted order. In case of min-heap if we keep on
removing elements one by one, we get data in sorted order.
Sorting is so useful that in 80-90% of computer applications, sorting is there in one form
or the other. Normally, sorting and searching go together. Lot of research has been done
on sorting; you can get lot of stuff on it from different sources. Very efficient algorithms
have already been developed for it. Moreover, a vast Mathematical analysis has been
performed of these algorithms. If you want to expose yourself, how these analyses are
performed and what Mathematical tools and procedures are employed for performing
analysis then sorting is very useful topic for you.

Sorting Integers
How to sort integers in this array?

Page 5 of 7

CS301 Data Structures


Lecture No. 43
___________________________________________________________________

20

10

10 20

Fig 43.1
We want to sort the numbers given in the above array. Apparently, this operation may
seem very simple. But think about it, if you are given a very large volume of data (may
be million of numbers) then you may realize that there has to be an efficient mechanism
to perform this operation. Firstly, lets put the problem in words:
We have a very large array of numbers. We want to sort the numbers inside the array in
ascending order such that the minimum number of the array will be the first element of it
and the largest element will be the last element at the end of the array.
Lets go to the algorithms of sorting. Point to be noted here that we are going to study
algorithms of sorting; we are not talking about data structures. Until now, you might have
realized that algorithms go along data structures. We use a data structure to contain data
and we use algorithms to perform certain operations or actions on that data.

Elementary Sorting Algorithms


Selection Sort
Insertion Sort
Bubble Sort
These algorithms have been put as elementary because these are very simple. They will
act as our baseline and we will compare them with other algorithms in order to find a
better algorithm.

Selection Sort
Main idea:
o find the smallest element
o put it in the first position
o find the next smallest element
o put it in the second position

And so on, until you get to the end of the list

Page 6 of 7

CS301 Data Structures


Lecture No. 43
___________________________________________________________________
This technique is so simple that you might have found it yourself already. You search the
whole array and find the smallest number. The smallest number is put on the first position
of the array while the previous element in this position is moved somewhere else. Find
the second smallest number and then put that number in the second position in the array,
again the previous number in that position is shifted somewhere else. We keep on
performing this activity again and again and eventually we get the array sorted. This
technique is called selection sort because we select elements for their sorted positions.
In the next lecture, we will see how we can optimize this sorting operation. You read
about sorting in your textbooks and from the Internet.

Page 7 of 7

Das könnte Ihnen auch gefallen