Sie sind auf Seite 1von 88

Data structures

Array and string Binary heap Binary search tree (BST) Dynamic array Graph Hash table Singly-linked list

Abstract data types


Dictionary ADT Priority queue ADT Stack ADT (impl.: array-based stack implementation)

Array and string


Array is a very basic data structure representing a group of similar elements, accessed by index. Array data structure can be effectively stored inside the computer and provides fast access to the all its elements. Let us see an advantages and drawbacks of the arrays.
Advantages

No overhead per element. Any element of an array can be accessed at O(1) time by its index.

Drawbacks

Array data structure is not completely dynamic. Many programming languages provides an opportunity to allocate arrays with arbitrary size (dynamically allocated array), but when this space is used up, a new array of greater size must be allocated and old data is copied to it. Insertion and deletion of an element in the array requires to shift O(n) elements on average, where n is size of the array.

Static and dynamically-allocated arrays

There are two types of arrays, which differ in the method of allocation. Static array has constant size and exists all the time, application being executed. Dynamically allocated array is created during program run and may be deleted when it is not more needed. Dynamically allocated arrays can be quite large, even bigger, than amount of physical memory. Yet, dynamically allocated array can not be resized. But you can expand an array as noted below:
1. Create new array of bigger size; 2. Copy data from old array to the new one; 3. Free memory, occupied by the old array.

Fixed-size and dynamic arrays


As it mentioned above, arrays can't be resized. In this case array is called fixed-size array. But we can use a simple trick to construct a dynamic array, which can be resized. The idea is simple. Let us allocate some space for the dynamic array and imaginary divide it into two parts. One part contains the data and the other one is free space. When new element is added, free space is reduced and vice versa. This approach results in overhead for free space, but we have all advantages of arrays and capability of changing size dynamically. We present some definitions about this kind of arrays below. Dynamic array has its capacity, which shows the maximum number of elements, it can contain. Also, such an array has the logical size, which indicates, how much elements it actually contains. For instance, we would like to find minimum of the values user entering. We allocate space to store 15 elements, but user has entered only 5 numbers. In the example, capacity of an array is 15 elements, but logical size is 5 elements. When dynamic array becomes full, it must be expanded by creating new larger array and copying elements from the old array to the new one. Notice, that copying arrays is supported by the hardware and can be done very efficiently. Example. Dynamic array with capacity 10, logical size 5. 1 5 7 -8 4 0 -49 15 86 46 NB. Dynamic arrays most often is also dynamically allocated so that they can be expanded. More information about dynamic arrays and their implementation can be found here: Dynamic Array.

Connection with strings


We consider null-terminated strings here. Strings are similar to the dynamic arrays, but their logical size is indicated by null character. Therefore, its capacity is always one element more, than the maximum logical size. Logical size of the string called length. Example. ASCII string "Hello!", represented inside the computer.

H e l

! \0

72 101 108 108 111 33 0

Code snippets
Sample program finds a minimal value among entered. Note that Java allows only dynamically allocated arrays.
Java
import java.util.Scanner;

public class Arrays { public static void main(String[] args) { Scanner keyboard = new Scanner(System.in); // dynamically allocated array int arr[] = new int[15]; int n = 0; int value = 0; System.out.println("Enter values. Type \"-1\" to stop: "); while (n < 15 && value != -1) { value = keyboard.nextInt(); keyboard.nextLine(); if (value != -1) { arr[n] = value; n++; } }

if (n == 0) { System.out.println("You have entered no values, bye!"); } else { int minimum = arr[0]; for (int i = 1; i < n; i++) { if (arr[i] < minimum) minimum = arr[i]; } System.out.print("The minimal value is " + minimum); } } }

C++
#include <iostream>

using namespace std;

int main() { // static array int arr[15]; int n = 0; int value = 0; cout << "Enter values. Type \"-1\" to stop: "; while (n < 15 && value != -1) {

cin >> value; if (value != -1) { arr[n] = value; n++; } } if (n == 0) { cout << "You have entered no values, bye!"; } else { int minimum = arr[0]; for (int i = 1; i < n; i++) { if (arr[i] < minimum) minimum = arr[i]; } cout << "The minimal value is " << minimum; } return 0; }

Binary heap
There are several types of heaps, but in the current article we are going to discuss the binary heap. For short, let's call it just "heap". It is used to implement priority queue ADT and in the heapsort algorithm. Heap is a complete binary tree, which answers to the heap property.

Complete binary tree

It is said, that binary tree is complete, if all its levels, except possibly the deepest, are complete. Thought, incomplete bottom level can't have "holes", which means that it has to be fulfilled from the very left node and up to some node in the middle. See illustrations below.
Correct example of a complete binary tree

Incorrect case, middle level is incomplete

Incorrect case, bottom level has a "hole"

Height of a complete binary tree is O(log n).

Heap property

There are two possible types of binary heaps: max heap and min heap. The difference is that root of a min heap contains minimal element and vice versa. Priority queue is often deal with min heaps, whereas heapsort algorithm, when sorting in ascending order, uses max heap.
Heap property for min heap

For every node in a heap, node's value is lesser or equal, than values of the children.

Heap property for max heap

For every node in a heap, node's value is greater or equal, than values of the children.

To maintain simplicity, in the articles below we consider min-heap only.

Binary search tree


First of all, binary search tree (BST) is a dynamic data structure, which means, that its size is only limited by amount of free memory in the operating system and number of elements may vary during the program run. Main advantage of binary search trees is rapid search, while addition is quite cheap. Let us see more formal definition of BST. Binary search tree is a data structure, which meets the following requirements:

it is a binary tree; each node contains a value; a total order is defined on these values (every two values can be compared with each other); left subtree of a node contains only values lesser, than the node's value; right subtree of a node contains only values greater, than the node's value.

Notice, that definition above doesn't allow duplicates.

Example of a binary search tree

What for binary search trees are used?


Binary search tree is used to construct map data structure. In practice, data can be often associated with some unique key. For instance, in the phone book such a key is a telephone number. Storing such a data in binary search tree allows to look up for the record by key faster, than if it was stored in unordered list. Also, BST can be utilized to construct set data structure, which allows to store an unordered collection of unique values and make operations with such collections. Performance of a binary search tree depends of its height. In order to keep tree balanced and minimize its height, the idea of binary search trees was advanced in balanced search trees (AVL trees, Red-Black trees, Splay trees). Here we will discuss the basic ideas, laying in the foundation of binary search trees.

Implementation
See how binary search tree is represented inside the computer. Operations on a BST

Add a new value Search for a value

Remove a value Get values from BST in order

Binary search tree. Adding a value


Adding a value to BST can be divided into two stages:

search for a place to put a new element; insert the new element to this place.

Let us see these stages in more detail. Search for a place

At this stage analgorithm should follow binary search tree property. If a new value is less, than the current node's value, go to the left subtree, else go to the right subtree. Following this simple rule, the algorithm reaches a node, which has no left or right subtree. By the moment a place for insertion is found, we can say for sure, that a new value has no duplicate in the tree. Initially, a new node has no children, so it is a leaf. Let us see it at the picture. Gray circles indicate possible places for a new node.

Now, let's go down to algorithm itself. Here and in almost every operation on BST recursion is utilized. Starting from the root,
1. check, whether value in current node and a new value are equal. If so, duplicate is found. Otherwise, 2. if a new value is less, than the node's value: o if a current node has no left child, place for insertion has been found;

o otherwise, handle the left child with the same algorithm. 3. if a new value is greater, than the node's value: o if a current node has no right child, place for insertion has been found; o otherwise, handle the right child with the same algorithm.

Just before code snippets, let us have a look on the example, demonstrating a case of insertion in the binary search tree. Example

Insert 4 to the tree, shown above.

Code snippets
The only the difference, between the algorithm above and the real routine is that first we should check, if a root exists. If not, just create it and don't run a common algorithm for this special case. This can be done in the BinarySearchTree class. Principal algorithm is implemented in the BSTNode class.
Java
public class BinarySearchTree {

public boolean add(int value) {

if (root == null) { root = new BSTNode(value); return true; } else return root.add(value); } } public class BSTNode { public boolean add(int value) { if (value == this.value) return false; else if (value <this.value) { if (left == null) { left = new BSTNode(value); return true; } else return left.add(value); } else if (value > this.value) { if (right == null) { right = new BSTNode(value); return true; } else return right.add(value); }

return false; } }

C++
bool BinarySearchTree::add(int value) { if (root == NULL) { root = new BSTNode(value); return true; } else return root->add(value); }

bool BSTNode::add(int value) { if (value == this->value) return false; else if (value < this->value) { if (left == NULL) { left = new BSTNode(value); return true; } else return left->add(value); } else if (value > this->value) { if (right == NULL) { right = new BSTNode(value);

return true; } else return right->add(value); } return false; }

Need help with a programming assignment? Get affordable programming homework help.

Binary search tree. Lookup operation


Searching for a value in a BST is very similar to add operation. Search algorithm traverses the tree "in-depth", choosing appropriate way to go, following binary search tree property and compares value of each visited node with the one, we are looking for. Algorithm stops in two cases:

a node with necessary value is found; algorithm has no way to go.

Search algorithm in detail

Now, let's see more detailed description of the search algorithm. Like an add operation, and almost every operation on BST, search algorithm utilizes recursion. Starting from the root,
1. check, whether value in current node and searched value are equal. If so, value is found. Otherwise, 2. if searched value is less, than the node's value: o if current node has no left child, searched value doesn't exist in the BST; o otherwise, handle the left child with the same algorithm. 3. if a new value is greater, than the node's value: o if current node has no right child, searched value doesn't exist in the BST; o otherwise, handle the right child with the same algorithm. Just before code snippets, let us have a look on the example, demonstrating searching for a value in the binary search tree. Example

Search for 3 in the tree, shown above.

Code snippets
As in add operation, check first if root exists. If not, tree is empty, and, therefore, searched value doesn't exist in the tree. This check can be done in the BinarySearchTree class. Principal algorithm is implemented in the BSTNode class.

Java
public class BinarySearchTree {

public boolean search(int value) { if (root == null) return false; else return root.search(value); } } public class BSTNode { public boolean search(int value) { if (value == this.value) return true; else if (value < this.value) { if (left == null) return false; else return left.search(value); } else if (value > this.value) { if (right == null) return false; else

return right.search(value); } return false; } }

C++
bool BinarySearchTree::search(int value) { if (root == NULL) return false; else return root->search(value); }

bool BSTNode::search(int value) { if (value == this->value) return true; else if (value < this->value) { if (left == NULL) return false; else return left->search(value); } else if (value > this->value) { if (right == NULL) return false;

else return right->search(value); } return false; }

Need help with a programming assignment? Get affordable programming homework help.

Binary search tree. Lookup operation


Searching for a value in a BST is very similar to add operation. Search algorithm traverses the tree "in-depth", choosing appropriate way to go, following binary search tree property and compares value of each visited node with the one, we are looking for. Algorithm stops in two cases:

a node with necessary value is found; algorithm has no way to go.

Search algorithm in detail

Now, let's see more detailed description of the search algorithm. Like an add operation, and almost every operation on BST, search algorithm utilizes recursion. Starting from the root,
1. check, whether value in current node and searched value are equal. If so, value is found. Otherwise, 2. if searched value is less, than the node's value: o if current node has no left child, searched value doesn't exist in the BST; o otherwise, handle the left child with the same algorithm. 3. if a new value is greater, than the node's value: o if current node has no right child, searched value doesn't exist in the BST; o otherwise, handle the right child with the same algorithm. Just before code snippets, let us have a look on the example, demonstrating searching for a value in the binary search tree. Example

Search for 3 in the tree, shown above.

Code snippets
As in add operation, check first if root exists. If not, tree is empty, and, therefore, searched value doesn't exist in the tree. This check can be done in the BinarySearchTree class. Principal algorithm is implemented in the BSTNode class.

Java
public class BinarySearchTree {

public boolean search(int value) { if (root == null) return false; else return root.search(value); } } public class BSTNode { public boolean search(int value) { if (value == this.value) return true; else if (value < this.value) { if (left == null) return false; else return left.search(value); } else if (value > this.value) { if (right == null) return false; else

return right.search(value); } return false; } }

C++
bool BinarySearchTree::search(int value) { if (root == NULL) return false; else return root->search(value); }

bool BSTNode::search(int value) { if (value == this->value) return true; else if (value < this->value) { if (left == NULL) return false; else return left->search(value); } else if (value > this->value) { if (right == NULL) return false;

else return right->search(value); } return false; }

Binary search tree. Removing a node


Remove operation on binary search tree is more complicated, than add and search. Basically, in can be divided into two stages:

search for a node to remove; if the node is found, run remove algorithm.

Remove algorithm in detail

Now, let's see more detailed description of a remove algorithm. First stage is identical to algorithm for lookup, except we should track the parent of the current node. Second part is more tricky. There are three cases, which are described below.
1. Node to be removed has no children.

This case is quite simple. Algorithm sets corresponding link of the parent to NULL and disposes the node. Example. Remove -4 from a BST.

2. Node to be removed has one child.

It this case, node is cut from the tree and algorithm links single child (with it's subtree) directly to the parent of the removed node. Example. Remove 18 from a BST.

3. Node to be removed has two children. This is the most complex case. To solve it, let us see one useful BST property first. We are going to use the idea, that the same set of values may be represented as different binary-search trees. For example those BSTs:

contains the same values {5, 19, 21, 25}. To transform first tree into second one, we can do following:
o o o

choose minimum element from the right subtree (19 in the example); replace 5 by 19; hang 5 as a left child.

The same approach can be utilized to remove a node, which has two children:
o o o

find a minimum value in the right subtree; replace value of the node to be removed with found minimum. Now, right subtree contains a duplicate! apply remove to the right subtree to remove a duplicate.

Notice, that the node with minimum value has no left child and, therefore, it's removal may result in first or second cases only. Example. Remove 12 from a BST.

Find minimum element in the right subtree of the node to be removed. In current example it is 19.

Replace 12 with 19. Notice, that only values are replaced, not nodes. Now we have two nodes with the same value.

Remove 19 from the left subtree.

Code snippets
First, check first if root exists. If not, tree is empty, and, therefore, value, that should be removed, doesn't exist in the tree. Then, check if root value is the one to be removed. It's a special case and there are several approaches to solve it. We propose the dummy root method, when dummy root node is created and real root hanged to it as a left child. When remove is done, set root link to the link to the left child of the dummy root.

In the languages without automatic garbage collection (i.e., C++) the removed node must be disposed. For this needs, remove method in the BSTNode class should return not the boolean value, but the link to the disposed node and free the memory in BinarySearchTree class.
Java
public class BinarySearchTree {

public boolean remove(int value) { if (root == null) return false; else { if (root.getValue() == value) { BSTNode auxRoot = new BSTNode(0); auxRoot.setLeftChild(root); boolean result = root.remove(value, auxRoot); root = auxRoot.getLeft(); return result; } else { return root.remove(value, null); } } } } public class BSTNode { public boolean remove(int value, BSTNode parent) {

if (value < this.value) { if (left != null) return left.remove(value, this); else return false; } else if (value > this.value) { if (right != null) return right.remove(value, this); else return false; } else { if (left != null && right != null) { this.value = right.minValue(); right.remove(this.value, this); } else if (parent.left == this) { parent.left = (left != null) ? left : right; } else if (parent.right == this) { parent.right = (left != null) ? left : right; } return true; } }

public int minValue() {

if (left == null) return value; else return left.minValue();

}
}

C++
bool BinarySearchTree::remove(int value) { if (root == NULL) return false; else { if (root->getValue() == value) { BSTNode auxRoot(0); auxRoot.setLeftChild(root); BSTNode* removedNode = root->remove(value, &auxRoot); root = auxRoot.getLeft(); if (removedNode != NULL) { delete removedNode; return true; } else return false; } else { BSTNode* removedNode = root->remove(value, NULL); if (removedNode != NULL) {

delete removedNode; return true; } else return false; } } }

BSTNode* BSTNode::remove(int value, BSTNode *parent) { if (value < this->value) { if (left != NULL) return left->remove(value, this); else return NULL; } else if (value > this->value) { if (right != NULL) return right->remove(value, this); else return NULL; } else { if (left != NULL && right != NULL) { this->value = right->minValue(); return right->remove(this->value, this); } else if (parent->left == this) {

parent->left = (left != NULL) ? left : right; return this; } else if (parent->right == this) { parent->right = (left != NULL) ? left : right; return this; } } }

int BSTNode::minValue() { if (left == NULL) return value; else return left->minValue(); }

Binary search tree. List values in order


To construct an algorithm listing BST's values in order, let us recall binary search tree property:

left subtree of a node contains only values lesser, than the node's value; right subtree of a node contains only values greater, than the node's value.

Algorithm looks as following: 1. get values in order from left subtree; 2. get values in order from right subtree; 3. result for current node is (result for left subtree) join (current node's value) join (result for right subtree). Running this algorithm recursively, starting form the root, we'll get the result for whole tree. Let us see an example of algorithm, described above.

Example

Dynamic arrays
One of the problems occurring when working with array data structure is that its size can not be changed during program run. There is no straight forward solution, but we can encapsulate capacity management.

Internal representation

The idea is simple. Application allocates some amount of memory and logically divides it into two parts. One part contains the data and another one is a free space. Initially all allocated space is free. During the data structure functioning, the boundary between used / free parts changes. If there no more free space to use, storage is expanded by creating new array of larger size and copying old contents to the new location. Dynamic array data structure has following fields:

storage: dynamically allocated space to store data; capacity value: size of the storage; size value: size of the real data.

Capacity management: Ensure Capacity, Pack


Capacity management mechanism should be developed first, before we can add or remove values. The mechanism consists of two functions: ensure capacity and pack.

Ensure capacity
Before value or several values is added, we should ensure, that we have enough capacity to store them. Do the following steps:

check, if current capacity isn't enough to store new items; calculate new capacity by the formula: newCapacity = (oldCapacity * 3) / 2 + 1. Algorithm makes a free space reserve in order not to resize the storage too often. check if new capacity is enough to store all new items and, if not, increase it to store exact amount of items; allocate new storage and copy contents from the old one to it; deallocate the old storage (in C++); change the capacity value;

Enlargement coefficient can be chosen arbitrary (but it should be greater, than one). Proposed value is 1.5 and it is optimal on average.

Example. capacity = 6, size = 6, want to add 1 new item.

Pack
When items are removed, amount of the free space increases. If there are too few values in the dynamic array, unused storage become just a waste of space. For the purpose of saving space, we develop a mechanism to reduce capacity, when it is excessive.

check, if size is less or equal, than half of the capacity; calculate new capacity by the formula: newCapacity = (size * 3) / 2 + 1. Algorithm leaves exact the amount of space, as if storage capacity had been trimmed to the size and then method to ensure capacity was called. allocate new storage and copy contents from the old one to it; deallocate the old storage (in C++); change the capacity value.

Example. capacity = 12, size = 6, do packing.

Lower boundary for size, after which packing is done, may vary. In the current example it is 0.5 of the capacity value. Commonly, pack is a private method, which is called after removal. Also, dynamic array interface provides a trim method, which reduces capacity to fit exact amount of items in the array. It is done from the outside of the implementation, when you are sure, that no more values to be added (for instance, input from user is over).

Code snippets
Both Java and C++ provides efficient tools to copy memory, which are used in the implementations below.
Java
import java.util.Arrays;

public class DynamicArray {

public void ensureCapacity(int minCapacity) { int capacity = storage.length; if (minCapacity > capacity) { int newCapacity = (capacity * 3) / 2 + 1;

if (newCapacity < minCapacity) newCapacity = minCapacity; storage = Arrays.copyOf(storage, newCapacity); } }

private void pack() { int capacity = storage.length; if (size <= capacity / 2) { int newCapacity = (size * 3) / 2 + 1; storage = Arrays.copyOf(storage, newCapacity); } }

public void trim() { int newCapacity = size; storage = Arrays.copyOf(storage, newCapacity); } }

C++
#include <cstring>

void DynamicArray::setCapacity(int newCapacity) { int *newStorage = new int[newCapacity];

memcpy(newStorage, storage, sizeof(int) * size); capacity = newCapacity; delete[] storage; storage = newStorage; }

void DynamicArray::ensureCapacity(int minCapacity) { if (minCapacity > capacity) { int newCapacity = (capacity * 3) / 2 + 1; if (newCapacity < minCapacity) newCapacity = minCapacity; setCapacity(newCapacity); } }

void DynamicArray::pack() { if (size <= capacity / 2) { int newCapacity = (size * 3) / 2 + 1; setCapacity(newCapacity); } }

void DynamicArray::trim() { int newCapacity = size;

setCapacity(newCapacity); }

Data access functions: Set, Get, InsertAt, RemoveAt


Dynamic array data structure encapsulates underlying storage, but the interface must provide access functions to work with it. We can also add range check to the access functions.

Range check
There is no much to say about the range check. Algorithm checks, whether index is inside the 0..size-1 range and if not, throws an exception.

Get and set


After we ensured, that index is inside of the proper range, write a value to the storage or read a value from the storage.

InsertAt
This operation may require array expanding, so algorithm invokes ensure capacity method first, which should ensure size + 1 minimal capacity. Then shift all elements from i to size - 1, where i is the insertion position, one element right. Note, that if new element is inserted after the last element in the array, then no shifting required. After shifting, put the value to i-th element and increase size by one.

RemoveAt
Shift all elements from i to size - 1, where i is the removal position, one element left. Then decrease size by 1 and invoke pack opeartion. Packing is done, if there are too few elements left after removal.

Code snippets
Java
public class DynamicArray {

private void rangeCheck(int index) { if (index < 0 || index >= size) throw new IndexOutOfBoundsException("Index: " + index + ", Size: " + size); }

public void set(int index, int value)

{ rangeCheck(index); storage[index] = value; }

public int get(int index) { rangeCheck(index); return storage[index]; }

public void removeAt(int index) { rangeCheck(index); int moveCount = size - index - 1; if (moveCount > 0) System.arraycopy(storage, index + 1, storage, index, moveCount); size--; pack(); }

public void insertAt(int index, int value) { if (index < 0 || index > size) throw new IndexOutOfBoundsException("Index: " + index + ", Size: "

+ size); ensureCapacity(size + 1); int moveCount = size - index; if (moveCount > 0) System.arraycopy(storage, index, storage, index + 1, moveCount); storage[index] = value; size++; } }

C++
#include <cstring> #include <exception>

void DynamicArray::rangeCheck(int index) { if (index < 0 || index >= size) throw "Index out of bounds!"; }

void DynamicArray::set(int index, int value) { rangeCheck(index); storage[index] = value; }

int DynamicArray::get(int index) {

rangeCheck(index); return storage[index]; }

void DynamicArray::removeAt(int index) { rangeCheck(index); int moveCount = size - index - 1; if (moveCount > 0) memmove(storage + index, storage + (index + 1), sizeof(int) * moveCount); size--; pack(); }

void DynamicArray::insertAt(int index, int value) { if (index < 0 || index > size) throw "Index out of bounds!"; ensureCapacity(size + 1); int moveCount = size - index; if (moveCount != 0) memmove(storage + index + 1, storage + index, sizeof(int) * moveCount); storage[index] = value; size++; }

Introduction to graphs
Graphs are widely-used structure in computer science and different computer applications. We don't say data structure here and see the difference. Graphs mean to store and analyze metadata, the connections, which present in data. For instance, consider cities in your country. Road network, which connects them, can be represented as a graph and then analyzed. We can examine, if one city can be reached from another one or find the shortest route between two cities. First of all, we introduce some definitions on graphs. Next, we are going to show, how graphs are represented inside of a computer. Then you can turn to basic graph algorithms. There are two important sets of objects, which specify graph and its structure. First set is V, which is called vertex-set. In the example with road network cities are vertices. Each vertex can be drawn as a circle with vertex's number inside.

vertices Next important set is E, which is called edge-set. E is a subset of V x V. Simply speaking, each edge connects two vertices, including a case, when a vertex is connected to itself (such an edge is called a loop). All graphs are divided into two big groups: directed and undirected graphs. The difference is that edges in directed graphs, called arcs, have a direction. These kinds of graphs have much in common with each other, but significant differences are also present. We will accentuate which kind of graphs is considered in the particular algorithm description. Edge can be drawn as a line. If a graph is directed, each line has an arrow.

undirected graph

directed graph

Now, we present some basic graph definitions.

Sequence of vertices, such that there is an edge from each vertex to the next in sequence, is called path. First vertex in the path is called the start vertex; the last vertex in the path is called the end vertex. If start and end vertices are the same, path is called cycle. Path is called simple, if it includes every vertex only once. Cycle is called simple, if it includes every vertex, except start (end) one, only once. Let's see examples of path and cycle.

path (simple)

cycle (simple)

The last definition we give here is a weighted graph. Graph is called weighted, if every edge is associated with a real number, called edge weight. For instance, in the road network example, weight of each road may be its length or minimal time needed to drive along.

weighted graph

Undirected graphs

Internal representation Depth-first search (DFS)

Undirected graphs representation


There are several possible ways to represent a graph inside the computer. We will discuss two of them: adjacency matrix and adjacency list.

Adjacency matrix
Each cell aij of an adjacency matrix contains 0, if there is an edge between i-th and j-th vertices, and 1 otherwise. Before discussing the advantages and disadvantages of this kind of representation, let us see an example.

Graph

Adjacency matrix

Edge (2, 5)

Cells for the edge (2, 5)

Edge (1, 3)

Cells for the edge (1, 3)

The graph presented by example is undirected. It means that its adjacency matrix is symmetric. Indeed, in undirected graph, if there is an edge (2, 5) then there is also an edge (5, 2). This is also the reason, why there are two cells for every edge in the sample. Loops, if they are allowed in a graph, correspond to the diagonal elements of an adjacency matrix. Advantages. Adjacency matrix is very convenient to work with. Add (remove) an edge can be done in O(1) time, the same time is required to check, if there is an edge between two vertices. Also it is very simple to program and in all our graph tutorials we are going to work with this kind of representation. Disadvantages.

Adjacency matrix consumes huge amount of memory for storing big graphs. All graphs can be divided into two categories, sparse and dense graphs. Sparse ones contain not much edges (number of edges is much less, that square of number of vertices, |E| << |V|2). On the other hand, dense graphs contain number of edges comparable with square of number of vertices. Adjacency matrix is optimal for dense graphs, but for sparse ones it is superfluous. Next drawback of the adjacency matrix is that in many algorithms you need to know the edges, adjacent to the current vertex. To draw out such an information from the adjacency matrix you have to scan over the corresponding row, which results in O(|V|) complexity. For the algorithms like DFS or based on it, use of the adjacency matrix results in overall complexity of O(|V|2), while it can be reduced to O(|V| + |E|), when using adjacency list. The last disadvantage, we want to draw you attention to, is that adjacency matrix requires huge efforts for adding/removing a vertex. In case, a graph is used for analysis only, it is not necessary, but if you want to construct fully dynamic structure, using of adjacency matrix make it quite slow for big graphs.

To sum up, adjacency matrix is a good solution for dense graphs, which implies having constant number of vertices.

Adjacency list
This kind of the graph representation is one of the alternatives to adjacency matrix. It requires less amount of memory and, in particular situations even can outperform adjacency matrix. For every vertex adjacency list stores a list of vertices, which are adjacent to current one. Let us see an example.

Graph

Adjacency list

Vertices, adjacent to {2}

Row in the adjacency list

Advantages. Adjacent list allows us to store graph in more compact form, than adjacency matrix, but the difference decreasing as a graph becomes denser. Next advantage is that adjacent list allows to get the list of adjacent vertices in O(1) time, which is a big advantage for some algorithms.

Disadvantages.

Adding/removing an edge to/from adjacent list is not so easy as for adjacency matrix. It requires, on the average, O(|E| / |V|) time, which may result in cubical complexity for dense graphs to add all edges. Check, if there is an edge between two vertices can be done in O(|E| / |V|) when list of adjacent vertices is unordered or O(log2(|E| / |V|)) when it is sorted. This operation stays quite cheap. Adjacent list doesn't allow us to make an efficient implementation, if dynamically change of vertices number is required. Adding new vertex can be done in O(V), but removal results in O(E) complexity.

To sum up, adjacency list is a good solution for sparse graphs and lets us changing number of vertices more efficiently, than if using an adjacent matrix. But still there are better solutions to store fully dynamic graphs.

Code snippets
For reasons of simplicity, we show here code snippets only for adjacency matrix, which is used for our entire graph tutorials. Notice, that it is an implementation for undirected graphs.
Java
public class Graph { private boolean adjacencyMatrix[][]; private int vertexCount;

public Graph(int vertexCount) { this.vertexCount = vertexCount; adjacencyMatrix = new boolean[vertexCount][vertexCount]; }

public void addEdge(int i, int j) { if (i >= 0 && i < vertexCount && j > 0 && j < vertexCount) { adjacencyMatrix[i][j] = true;

adjacencyMatrix[j][i] = true; } }

public void removeEdge(int i, int j) { if (i >= 0 && i < vertexCount && j > 0 && j < vertexCount) { adjacencyMatrix[i][j] = false; adjacencyMatrix[j][i] = false; } }

public boolean isEdge(int i, int j) { if (i >= 0 && i < vertexCount && j > 0 && j < vertexCount) return adjacencyMatrix[i][j]; else return false; } }

C++
class Graph { private: bool** adjacencyMatrix; int vertexCount; public:

Graph(int vertexCount) { this->vertexCount = vertexCount; adjacencyMatrix = new bool*[vertexCount]; for (int i = 0; i < vertexCount; i++) { adjacencyMatrix[i] = new bool[vertexCount]; for (int j = 0; j < vertexCount; j++) adjacencyMatrix[i][j] = false; } }

void addEdge(int i, int j) { if (i >= 0 && i < vertexCount && j > 0 && j < vertexCount) { adjacencyMatrix[i][j] = true; adjacencyMatrix[j][i] = true; } }

void removeEdge(int i, int j) { if (i >= 0 && i < vertexCount && j > 0 && j < vertexCount) { adjacencyMatrix[i][j] = false; adjacencyMatrix[j][i] = false; } }

bool isEdge(int i, int j) { if (i >= 0 && i < vertexCount && j > 0 && j < vertexCount) return adjacencyMatrix[i][j]; else return false; }

~Graph() { for (int i = 0; i < vertexCount; i++) delete[] adjacencyMatrix[i]; delete[] adjacencyMatrix; } };

Depth-first search (DFS) for undirected graphs


Depth-first search, or DFS, is a way to traverse the graph. Initially it allows visiting vertices of the graph only, but there are hundreds of algorithms for graphs, which are based on DFS. Therefore, understanding the principles of depth-first search is quite important to move ahead into the graph theory. The principle of the algorithm is quite simple: to go forward (in depth) while there is such possibility, otherwise to backtrack.

Algorithm
In DFS, each vertex has three possible colors representing its state: white: vertex is unvisited; gray: vertex is in progress;

black: DFS has finished processing the vertex. NB. For most algorithms boolean classification unvisited / visited is quite enough, but we show general case here. Initially all vertices are white (unvisited). DFS starts in arbitrary vertex and runs as follows:
1. Mark vertex u as gray (visited). 2. For each edge (u, v), where u is white, run depth-first search for u recursively. 3. Mark vertex u as black and backtrack to the parent.

Example. Traverse a graph shown below, using DFS. Start from a vertex with number 1.

Source graph.

Mark a vertex 1 as gray.

There is an edge (1, 4) and a vertex 4 is unvisited. Go there.

Mark the vertex 4 as gray.

There is an edge (4, 2) and vertex a 2 is unvisited. Go there.

Mark the vertex 2 as gray.

There is an edge (2, 5) and a vertex 5 is unvisited. Go there.

Mark the vertex 5 as gray.

There is an edge (5, 3) and a vertex 3 is unvisited. Go there.

Mark the vertex 3 as gray.

There are no ways to go from the vertex 3. Mark it as black and backtrack to the vertex 5.

There is an edge (5, 4), but the vertex 4 is gray.

There are no ways to go from the vertex 5. Mark it as black and backtrack to the vertex 2.

There are no more edges, adjacent to vertex 2. Mark it as black and backtrack to the vertex 4.

There is an edge (4, 5), but the vertex 5 is black.

There are no more edges, adjacent to the vertex 4. Mark it as black and backtrack to the vertex 1.

There are no more edges, adjacent to the vertex 1. Mark it as black. DFS is over.

As you can see from the example, DFS doesn't go through all edges. The vertices and edges, which depth-first search has visited is a tree. This tree contains all vertices of the graph (if it is connected) and is called graph spanning tree. This tree exactly corresponds to the recursive calls of DFS.

If a graph is disconnected, DFS won't visit all of its vertices. For details, see finding connected components algorithm.

Complexity analysis
Assume that graph is connected. Depth-first search visits every vertex in the graph and checks every edge its edge. Therefore, DFS complexity is O(V + E). As it was mentioned before, if an adjacency matrix is used for a graph representation, then all edges, adjacent to a vertex can't be found efficiently, that results in O(V2) complexity. You can find strong proof of the DFS complexity issues in [1].

Code snippets
In truth the implementation stated below gives no yields. You will fill an actual use of DFS in further tutorials.
Java
public class Graph {

enum VertexState { White, Gray, Black }

public void DFS() { VertexState state[] = new VertexState[vertexCount]; for (int i = 0; i < vertexCount; i++) state[i] = VertexState.White; runDFS(0, state); }

public void runDFS(int u, VertexState[] state) { state[u] = VertexState.Gray; for (int v = 0; v < vertexCount; v++) if (isEdge(u, v) && state[v] == VertexState.White) runDFS(v, state); state[u] = VertexState.Black; } }

C++
enum VertexState { White, Gray, Black };

void Graph::DFS() { VertexState *state = new VertexState[vertexCount]; for (int i = 0; i < vertexCount; i++) state[i] = White; runDFS(0, state); delete [] state; }

void Graph::runDFS(int u, VertexState state[]) {

state[u] = Gray; for (int v = 0; v < vertexCount; v++) if (isEdge(u, v) && state[v] == White) runDFS(v, state); state[u] = Black; }

Hash table
Hash table (or hash map) is one of the possible implementions of dictionary ADT. Hence, basically it maps unique keys to associated values. In the view of implementation, hash table is an array-based data structure, which uses hash function to convert the key into the index of an array element, where associated value is to be sought.

Hash function
Hash function is very important part of hash table design. Hash function is considered to be good, if it provides uniform distribution of hash values. Other hash function's properties, required for quality hashing will be examined in detail later. The reason, why hash function is a subject to the principal concern, is that poor hash functions cause collisions and some other unwanted effects, which badly affect hash table overall performance.

Hash table and load factor


Basic underlying data strucutre used to store hash table is an array. The load factor is the ratio between the number of stored items and array's size. Hash table can whether be of a constant size or being dynamically resized, when load factor exceeds some threshold. Resizing is done before the table becomes full to keep the number of collisions under certain amount and prevent performance degradation.

Collisions
What happens, if hash function returns the same hash value for different keys? It yields an effect, called collision. Collisions are practically unavoidable and should be considered when one implements hash table. Due to collisions, keys are also stored in the table, so one can distinguish between key-value pairs having the same hash. There are various ways of collision resolution. Basically, there are two different strategies:

Closed addressing (open hashing). Each slot of the hash table contains a link to another data structure (i.e. linked list), which stores key-value pairs with the same hash. When collision occures, this data structure is searched for key-value pair, which matches the key. Open addressing (closed hashing). Each slot actually contains a key-value pair. When collision occurs, open addressing algorithm calculates another location (i.e. next one) to locate a free slot. Hash tables, based on open addressing strategy experience drastic performance decrease, when table is tightly filled (load factor is 0.7 or more).

Singly-linked list
Linked list is a very important dynamic data structure. Basically, there are two types of linked list, singly-linked list and doubly-linked list. In a singly-linked list every element contains some data and a link to the next element, which allows to keep the structure. On the other hand, every node in a doubly-linked list also contains a link to the previous node. Linked list can be an underlying data structure to implement stack, queue or sorted list.

Example
Sketchy, singly-linked list can be shown like this:

Each cell is called a node of a singly-linked list. First node is called head and it's a dedicated node. By knowing it, we can access every other node in the list. Sometimes, last node, called tail, is also stored in order to speed up add operation.

Operations on a singly-linked list


Concrete implementation of operations on the singly-linked list depends on the purpose, it is used for. Following the links below, you can find descriptions of the common concepts, proper for every implementation.

Singly-linked list traversal Adding a node Removing a node

See how singly-linked list is represented inside the computer.

Visualizers
1. Linked List in Java Applets Centre

Singly-linked list. Traversal.


Assume, that we have a list with some nodes. Traversal is the very basic operation, which presents as a part in almost every operation on a singly-linked list. For instance, algorithm may traverse a singly-linked list to find a value, find a position for insertion, etc. For a singly-linked list, only forward direction traversal is possible.

Traversal algorithm
Beginning from the head, 1. check, if the end of a list hasn't been reached yet; 2. do some actions with the current node, which is specific for particular algorithm; 3. current node becomes previous and next node becomes current. Go to the step 1.

Example
As for example, let us see an example of summing up values in a singly-linked list.

For some algorithms tracking the previous node is essential, but for some, like an example, it's unnecessary. We show a common case here and concrete algorithm can be adjusted to meet it's individual requirements.

Code snippets
Although we have two classes for singly-linked list, SinglyLinkedListNode class is used as storage only. Whole algorithm is implemented in the SinglyLinkedList class.
Java implementation
public class SinglyLinkedList {

public int traverse() { int sum = 0; SinglyLinkedListNode current = head; SinglyLinkedListNode previous = null; while (current != null) { sum += current.value;

previous = current; current = current.next; } return sum; } }

C++ implementation
int SinglyLinkedList::traverse() { int sum = 0; SinglyLinkedListNode *current = head; SinglyLinkedListNode *previous = NULL; while (current != NULL) { sum += current->value; previous = current; current = current->next; } return sum; }

Singly-linked list. Addition (insertion) operation.


Insertion into a singly-linked list has two special cases. It's insertion a new node before the head (to the very beginning of the list) and after the tail (to the very end of the list). In any other case, new node is inserted in the middle of the list and so, has a predecessor and successor in the list. There is a description of all these cases below.

Empty list case

When list is empty, which is indicated by (head == NULL)condition, the insertion is quite simple. Algorithm sets both head and tail to point to the new node.

Add first
In this case, new node is inserted right before the current head node.

It can be done in two steps: 1. Update the next link of a new node, to point to the current head node.

2. Update head link to point to the new node.

Add last
In this case, new node is inserted right after the current tail node.

It can be done in two steps:


1. Update the next link of the current tail node, to point to the new node.

2. Update tail link to point to the new node.

General case
In general case, new node is always inserted between two nodes, which are already in the list. Head and tail links are not updated in this case.

Such an insert can be done in two steps:


1. Update link of the "previous" node, to point to the new node.

2. Update link of the new node, to point to the "next" node.

Code snippets
All cases, shown above, can be implemented in one function with two arguments, which are node to insert after and a new node. For add first operation, the arguments are (NULL, newNode). For add last operation, the arguments are (tail, newNode). Though, this specific operations (add first and add last) can be implemented separately, in order to avoid unnecessary checks.
Java implementation
public class SinglyLinkedList {

public void addLast(SinglyLinkedListNode newNode) { if (newNode == null) return; else { newNode.next = null; if (head == null) { head = newNode; tail = newNode; } else { tail.next = newNode;

tail = newNode; } } }

public void addFirst(SinglyLinkedListNode newNode) { if (newNode == null) return; else { if (head == null) { newNode.next = null; head = newNode; tail = newNode; } else { newNode.next = head; head = newNode; } } }

public void insertAfter(SinglyLinkedListNode previous, SinglyLinkedListNode newNode) { if (newNode == null) return;

else { if (previous == null) addFirst(newNode); else if (previous == tail) addLast(newNode); else { SinglyLinkedListNode next = previous.next; previous.next = newNode; newNode.next = next; } } } }

C++ implementation
void SinglyLinkedList::addLast(SinglyLinkedListNode *newNode) { if (newNode == NULL) return; else { newNode->next = NULL; if (head == NULL) { head = newNode; tail = newNode; } else { tail->next = newNode;

tail = newNode; } } }

void SinglyLinkedList::addFirst(SinglyLinkedListNode *newNode) { if (newNode == NULL) return; else { if (head == NULL) { newNode->next = NULL; head = newNode; tail = newNode; } else { newNode->next = head; head = newNode; } } }

void SinglyLinkedList::insertAfter(SinglyLinkedListNode *previous, SinglyLinkedListNode *newNode) { if (newNode == NULL) return;

else { if (previous == NULL) addFirst(newNode); else if (previous == tail) addLast(newNode); else { SinglyLinkedListNode *next = previous->next; previous->next = newNode; newNode->next = next; } } }

Singly-linked list. Removal (deletion) operation.


There are four cases, which can occur while removing the node. These cases are similar to the cases in add operation. We have the same four situations, but the order of algorithm actions is opposite. Notice, that removal algorithm includes the disposal of the deleted node, which may be unnecessary in languages with automatic garbage collection (i.e., Java).

List has only one node


When list has only one node, which is indicated by the condition, that the head points to the same node as the tail, the removal is quite simple. Algorithm disposes the node, pointed by head (or tail) and sets both head and tail to NULL.

Remove first
In this case, first node (current head node) is removed from the list.

It can be done in two steps: 1. Update head link to point to the node, next to the head.

2. Dispose removed node.

Remove last
In this case, last node (current tail node) is removed from the list. This operation is a bit more tricky, than removing the first node, because algorithm should find a node, which is previous to the tail first.

It can be done in three steps:


1. Update tail link to point to the node, before the tail. In order to find it, list should be traversed first, beginning from the head.

2. Set next link of the new tail to NULL.

3. Dispose removed node.

General case
In general case, node to be removed is always located between two list nodes. Head and tail links are not updated in this case.

Such a removal can be done in two steps:


1. Update next link of the previous node, to point to the next node, relative to the removed node.

2. Dispose removed node.

Code snippets
All cases, shown above, can be implemented in one function with a single argument, which is node previous to the node to be removed. For remove first operation, the argument is NULL. For remove last operation, the argument is the node, previous to tail. Though, it's better to implement this special cases (remove first and remove last) in separate functions. Notice, that removing first and last node have different complexity, because remove last needs to traverse through the whole list.
Java implementation
public class SinglyLinkedList {

public void removeFirst() { if (head == null) return; else {

if (head == tail) { head = null; tail = null; } else { head = head.next; } } }

public void removeLast() { if (tail == null) return; else { if (head == tail) { head = null; tail = null; } else { SinglyLinkedListNode previousToTail = head; while (previousToTail.next != tail) previousToTail = previousToTail.next; tail = previousToTail; tail.next = null; } }

public void removeNext(SinglyLinkedListNode previous) { if (previous == null) removeFirst(); else if (previous.next == tail) { tail = previous; tail.next = null; } else if (previous == tail) return; else { previous.next = previous.next.next; } } }

C++ implementation
void SinglyLinkedList::removeFirst() { if (head == NULL) return; else { SinglyLinkedListNode *removedNode; removedNode = head; if (head == tail) { head = NULL;

tail = NULL; } else { head = head->next; } delete removedNode; } }

void SinglyLinkedList::removeLast() { if (tail == NULL) return; else { SinglyLinkedListNode *removedNode; removedNode = tail; if (head == tail) { head = NULL; tail = NULL; } else { SinglyLinkedListNode *previousToTail = head; while (previousToTail->next != tail) previousToTail = previousToTail->next; tail = previousToTail; tail->next = NULL; }

delete removedNode; } }

void SinglyLinkedList::removeNext(SinglyLinkedListNode *previous) { if (previous == NULL) removeFirst(); else if (previous->next == tail) { SinglyLinkedListNode *removedNode = previous->next; tail = previous; tail->next = NULL; delete removedNode; } else if (previous == tail) return; else { SinglyLinkedListNode *removedNode = previous->next; previous->next = removedNode->next; delete removedNode; } }

Dictionary ADT
Dictionary (map, association list) is a data structure, which is generally an association of unique keys with some values. One may bind a value to a key, delete a key (and naturally an associated

value) and lookup for a value by the key. Values are not required to be unique. Simple usage example is an explanatory dictionary. In the example, words are keys and explanations are values.

Dictionary ADT
Operations

Dictionary create() creates empty dictionary boolean isEmpty(Dictionary d) tells whether the dictionary d is empty put(Dictionary d, Key k, Value v) associates key k with a value v; if key k already presents in the dictionary old value is replaced by v Value get(Dictionary d, Key k) returns a value, associated with key k or null, if dictionary contains no such key remove(Dictionary d, Key k) removes key k and associated value destroy(Dictionary d) destroys dictionary d

Implementations

binary search tree (BST) hash map

Priority queue ADT


In practice we often deal with priorities. For instance, in a to-do-list for a day, each task has an associated significance. Is absolutely necessary to collect a car from repair shop (highest priority) and you may possibly watch a new film (lowest priority). Besides real life examples, many computer tasks work with priorities. Frequently cited instance is the Dijkstra's shortest path algorithm. Priority queue ADT lets us to work with objects that have an associated priority.

In the application we have a pair (priority, item) where an item is some auxiliary data priority is associated with. To maintain simplicity, we omit priorities and consider that for items e1, e2: e1 < e2 means e1 has higher priority than e2.
Operations

PriorityQueue create() creates empty priority queue boolean isEmpty(PriorityQueue pq) tells whether priority queue pq is empty insert(PriorityQueue pq, Item e) inserts item e to priority queue pq Item minimum(PriorityQueue pq) tells minimal item in priority queue pq Precondition: pq is not empty removeMin(PriorityQueue pq) removes minimum item from priority queue pq Precondition: pq is not empty destroy(PriorityQueue pq) destroys priority queue pq

Implementations

binary heap implementation;

Array-based stack implementation


Here we present the idea of stack implementation, based on arrays. We assume in current article, that stack's capacity is limited to a certain value and overfilling the stack will cause an error. Though, using the ideas from dynamic arrays implementation, this limitation can be easily avoided (see capacity management for dynamic arrays). In spite of capacity limitation, array-based implementation is widely applied in practice. In a number of cases, required stack capacity is known in advance and allocated space exactly satisfies the requirements of a particular task. In other cases, stack's capacity is just intended to be "big enough". Striking example of the last concept is an application stack. It's capacity is quite large, but too deep recursion still may result in stack overflow.

Implementation
Implementation of array-based stack is very simple. It uses top variable to point to the topmost stack's element in the array.
1. 2. 3. 4. 5. Initialy top = -1; push operation increases top by one and writes pushed element to storage[top]; pop operation checks that top is not equal to -1 and decreases top variable by 1; peek operation checks that top is not equal to -1 and returns storage[top]; isEmpty returns boolean (top == -1).

Code snippets
Java implementation
public class Stack { private int top; private int[] storage;

Stack(int capacity) { if (capacity <= 0) throw new IllegalArgumentException( "Stack's capacity must be positive"); storage = new int[capacity]; top = -1; }

void push(int value) { if (top == storage.length) throw new StackException("Stack's underlying storage is overflow"); top++;

storage[top] = value; }

int peek() { if (top == -1) throw new StackException("Stack is empty"); return storage[top]; }

void pop() { if (top == -1) throw new StackException("Stack is empty"); top--; }

boolean isEmpty() { return (top == -1); }

public class StackException extends RuntimeException { public StackException(String message) { super(message); } }

C++ implementation
#include <string> using namespace std;

class Stack { private: int top; int capacity; int *storage; public: Stack(int capacity) { if (capacity <= 0) throw string("Stack's capacity must be positive"); storage = new int[capacity]; this->capacity = capacity; top = -1; }

void push(int value) { if (top == capacity) throw string("Stack's underlying storage is overflow"); top++; storage[top] = value;

int peek() { if (top == -1) throw string("Stack is empty"); return storage[top]; }

void pop() { if (top == -1) throw string("Stack is empty"); top--; }

bool isEmpty() { return (top == -1); }

~Stack() { delete[] storage; } };

Das könnte Ihnen auch gefallen