Sie sind auf Seite 1von 18

Database Management Systems

MI0025
Name : Tigist Awoke

Roll Number: 530911222

Learning Centre: SRI SAI

Subject: Database Management


Systems – MI0025

Date Of Submission at the Learning Center: 04/08/2010

Assignment No.: Set I


Database Management Systems
MI-0025
Assignment Set I
Student Name: Tigist Awoke

++++++++++++++++++++++++++++++++++++++++++++++++++++
Qn. 1.
I.What do you understand by DBMS? What are the various procedures
carried on in a DBMS? Give a comparison between the Traditional
File Systems and Modern Database Management Systems.
II.What do you understand by the Data independence? Describe the
different types of data independence. What do you understand by
DDL and DML? Describe the two main types of DML.
Answer: 1.1 toppling
A DBMS is a complex set of software programs that controls the organization,
storage and retrieval of data in a database. A DBMS is a system software package that
helps the use of integrated collection of data
records and files known as databases. It allows
different user application programs to easily access
the same database. DBMSs may use any of a
variety of database models, such as the network
model or relational model. In large systems, a
DBMS allows users and other software to store and
retrieve data in a structured way. Instead of having to write computer programs to
extract information, user can ask simple questions in a query language.
Various Procedures carried on DBMS
1. The Process of specifying the data types, structures and constraints is called
Defining the database
2. The process of storing the data on some storage medium
3. Manipulating the database-involves the retrieval (activity of finding) of
required data and modifying it depending on the requirement

E.g. Employee Database

Page 2 of 18
Database Management Systems
MI-0025
Assignment Set I
Student Name: Tigist Awoke

++++++++++++++++++++++++++++++++++++++++++++++++++++
1. Defining of a database
Constraints
Entity test Attribute Data types (limitation)
Emp_Name Char(40) Alphabet only
Emp_id Num(6) Val>0
Emp-add Char(100) -
Employee Emp_desig Char(15) -
Emp_sal Number(10.2) Val>0

2. Constructing the Database


Emp_Name Emp_id Emp-add Emp_desig Emp_sal
Aster 100 Addis Ababa, Bole Project Leader 40000
road, K18
Usha 101 #165, 4th main Software 10000
Chamrajpet, Banglore Engineer
Henok 102 K18, H14, H.no.999, Lecturer 30000
Addis Ababa
Peter 103 Syndicate house, IT Executive 15000
Manipal

3. Manipulating the Database


e.g. for some queries
a. List all employees whose salaries are greater than 200000
b. List all employees whose names start with “p”
c. Delete records whose Emp_Name is Aster

Traditional File Systems Vs Modern Database Management Systems


Traditional File System Modern Database Management Systems
Traditional file system is the system that This is the modern way which has replaced
was followed before the advent of DBMS the older concept of file system

Page 3 of 18
Database Management Systems
MI-0025
Assignment Set I
Student Name: Tigist Awoke

++++++++++++++++++++++++++++++++++++++++++++++++++++
i.e. it is the older way
In traditional the processing, data definition • Data definition is part of the DBMS
is part of the application program and • Application is independent and can
works with only specific application be used with any application
File systems are design driven; they require • One extra column (attribute) can be
design/coding change when new kind of added without any difficulty
data occurs • Minor coding changes in the
E.g. In traditional employee the master file application program may be
has Emp_nam, Emp_id, Emp_add, required
Emp_dept, Emp_sal, if we want to insert
one more column ‘Emp_mob’ then it
requires a complete structuring of the file
or redesign of the application code, even
though basically all the data except that in
one column is the same
Traditional file system keeps redundant Redundancy is eliminated to the maximum
(duplicate) information in many locations. extent in DBMS if properly defined
This might result in the loss of Data
Consistency.
For Eg. Employee Name s might exist in
separate files like Payroll Master File and
also in Employee Benefit Master File Etc.
Now if an employee changes his or her last
name, the name might be changed in the
pay roll master file but not be changed in
Employee benefit Master file etc. This
might result in the loss of Data
Consistency.
In a File system data is scattered in various This problem is completely solved here
files, and each of these files may be in

Page 4 of 18
Database Management Systems
MI-0025
Assignment Set I
Student Name: Tigist Awoke

++++++++++++++++++++++++++++++++++++++++++++++++++++
different formats, making it difficult to
write new application programs to retrieve
the appropriate data.
Security features are to be coded in the Coding for security requirements is not
Application Program itself required as most of them have been taken
care by the DBMS

Hence, a database management system is the software that manages a database, and is
responsible for the storage, security, integrity, concurrency, recovery and access
The DBMS has a data dictionary, referred to as system catalog, which sores data about
everything it holds, such as names, structure, locations and types. This data is also
referred to as Metadata.

Answer: 1.2
Data Independence: is defined as the ability to modify a schema definition in one level
without affecting a schema definition in a higher level. There are two kinds
1. Physical data independence: this is the ability to modify the physical scheme
without causing application programs to be rewritten. Modifications at this level
are usually to improve performance.
2. Logical data independence: This ability to modify the conceptual schema
without causing application programs to be rewritten. This is usually done when
the logical structure of database is altered. Logical data independence is harder to
achieve, as the application programs are usually heavily dependent on the logical
structure of the data. An analogy is made to abstract data types in programming
languages.
As a database supports a number of user groups, DBMS must have languages and
interfaces that support each of these user groups.
DBMS Languages
DDL: the data definition language, used by the DBA and database designers to define the

Page 5 of 18
Database Management Systems
MI-0025
Assignment Set I
Student Name: Tigist Awoke

++++++++++++++++++++++++++++++++++++++++++++++++++++
conceptual and internal schemas
• The DBMS has a DDL compiler to process DDL statements in order to identify
the schema constructs, and to store the description in the catalogue.
• In database where there is a separation between the conceptual; and internal
schema, DDL is used to specify the conceptual schema, and SDL, storage
definition language, is used to specify the internal schema.
• For true three-schema architecture, VDL, view definition language, is used to
specify the user views and their mappings to the conceptual schema. But in most
DBMSs, the DDL is used to specify both the conceptual schema and the external
schemas.
Data Manipulation Languages (DMLs): it is a family of computer languages used by
computer programs or database users to retrieve, insert, delete and update data in a
database.
• Currently, the most popular data manipulation language is that of SQL, which is
used to retrieve and manipulate data in a relational database.
• Other forms of DML are those used by IMS/DL1, CODASYL databases (such as
IDMS), and others.
• Data manipulation languages were initially only used by computer programs, but
(with the advent of SQL) have come to be used by people as well.
• Data manipulation languages have their functional capability organized by the
initial word in a statement, which is almost always a verb. In the cases of SQL,
these verbs are “select”, “insert”, “update|, and “delete”.
• DML tends to have many different “flavors” and capabilities between database
vendors
• There has been a standard established for SQL by ANSI, but vendors still |
exceed” the standard and provide their own extensions
There are two main types of DML:
High level/ Non- Procedural:
• Can be used on its own to specify complex database operations

Page 6 of 18
Database Management Systems
MI-0025
Assignment Set I
Student Name: Tigist Awoke

++++++++++++++++++++++++++++++++++++++++++++++++++++
• DMBSs allow DML statements to be entered interactively from a terminal, or to
be embedded in a programming language
• If the commands are embedded in a general purpose programming language, the
statements must be identified, so they can be extracted by a pre-compiler and
processed by DBMS.
• High –level DMLs, such as SQL can specify and retrieve many records in a single
DML statement, and are called ‘set at a time’ or ‘set oriented DML’s’
• High-level languages are often called declarative, because the DML often
specifies what to retrieve, rather than how to retrieve it.
Low Level/ Procedural
• Must be embedded in a general purpose programming language
• Typically retrieves individual records or objects from the database and processes
each separately
• Therefore it needs to use programming language constructs such as loops
• Low-level DMLs are also called ‘record at a time DMLs’

Qn. 2
(i) Describe the process of file allocation on the Disk. Describe briefly the
different Hashing Technique.
(ii) Describe the concept of variable length records. Describe the
characteristics of magnetic disk and magnetic tape storage devices.

Answer: 2.1
Placing File Records on Disk

Record Types: Data is usually stored in the form of records. Each record consists of a
collection of related data values. Records usually describe entities and their attributes. For
example, an EMPLOYEE record represents an employee entity, and each field value in
the record specifies some attribute of that employee, such as NAME, BIRTHDATE,
SALARY or SUPERVISOR. A collection of field names and their corresponding data
types constitutes a record type or record format definition.

Page 7 of 18
Database Management Systems
MI-0025
Assignment Set I
Student Name: Tigist Awoke

++++++++++++++++++++++++++++++++++++++++++++++++++++
Files, Fixed-length Records, and Variable-length Records: A file is a sequence of records.

Fixed length records:

· All records in a file are of the same record type. If every record in the file has exactly
the same size [in bytes], the file is said to be made up of fixed length records.

Variable length records:

If different records in the file have different sizes, the file is said to be made up of
variable length records. The variable length field is a field whose maximum allowed
length would be specified. When the actual length of the value is less than the maximum
length, the field will take only the required space. In the case of fixed length fields, even
if the actual value is less than the specified length, the remaining length will be filled
with spaces of null values.

A file may have variable-length records for several reasons:

· Records having variable length fields:

The file records are of the same record type, but one or more of the fields are of varying
size. For example, the NAME field of EMPLOYEE can be a variable-length field.

Records having Repeating fields

The file records are of the same record type, but one or more of the fields may have
multiple values for individual records. Group of values for the field is called repeating
group.

Here the record length varies depending on the number of authors.

Page 8 of 18
Database Management Systems
MI-0025
Assignment Set I
Student Name: Tigist Awoke

++++++++++++++++++++++++++++++++++++++++++++++++++++
· Records having optional fields:

The file records are of the same record type, but one or more of the fields are optional.
That is some of the fields will not have values in all the records. For example there are 25
fields in a record, and out of 25 if 10 fields are optional, there will be wastage of
memory. So only the values that are present in each record will be stored.

Record Blocking and Spanned versus Unspanned Records:

The records of a file must be allocated to disk blocks. If the block size is larger than the
record size, each block will contain numerous records. Some files may have unusually
large record sizes that cannot fit in one block. Suppose that the block size is B bytes. For
a file of fixed length records of size R bytes, with B ³ R we can fit bfr = [(B/R)] records
per block. The value bfr is called the blocking factor for the file. In general R may not
divide B exactly, so we have some unused space n each block equal to
R – (bfr *R) bytes.

To utilize this unused space, we can store part of the record on one block and the rest on
another block. A pointer at the end of the first block points to the block containing the
remainder of the record. This organization is called spanned, because records can span
more than one block. Whenever a record is larger than a block, we must use a spanned
organization. If records are not allowed to cross block boundaries, the organization is
called unspanned. This is used with fixed-length records having B ³ R.

We can use “bfr” to calculate the number of blocks b needed for file of
r records:

Allocating File Blocks on Disk:

There are several standard techniques for allocating the blocks of a file on disk. In
contiguous (sequential) allocation the file blocks are allocated to consecutive disk blocks.
This makes reading the whole file very fast, using double buffering, but it makes
expanding the file difficult. In linked allocation each file block contains a pointer to the

Page 9 of 18
Database Management Systems
MI-0025
Assignment Set I
Student Name: Tigist Awoke

++++++++++++++++++++++++++++++++++++++++++++++++++++
next file block. A combination of the two allocates clusters of consecutive disk blocks,
and the clusters are linked together. Clusters are sometimes called segments or extents.

File Headers: A file header or file descriptor contains information about a file, that is
needed by the header and includes information to determine the disk addresses of the file
blocks as well as to record format descriptions, which may include field lengths and order
of fields within a record for fixed-length unspanned records and field type codes,
separator characters.

To search for a record on disk, one or more blocks are copied into main memory buffers.
Programs then search for the desired record utilizing the information in the file header. If
the address of the block that contains the desired record is not known, the search
programs must do a linear search through the file blocks. Each file block is copied into a
buffer and searched until either the record is located. This can be very time consuming
for a large file.

Hashing Techniques

One disadvantage of sequential file organization is that we must use linear search or
binary search to locate the desired record and that results in more i/o operations. In this
there are a number of unnecessary comparisons. In hashing technique or direct file
organization, the key value is converted into an address by performing some arithmetic
manipulation on the key value, which provides very fast access to records.

Let us consider a hash function h that maps the key value k to the value h(k). The
VALUE h(k) is used as an address.

The basic terms associated with the hashing techniques are:

1) Hash table: It is simply an array that is having address of records.

2) Hash function: It is the transformation of a key into the corresponding location or


address in the hash table (it can be defined as a function that takes key as input and
transforms it into a hash table index).

3) Hash key: Let ‘R’ be a record and its key hashes into a key value called hash key.

Internal Hashing

Page 10 of 18
Database Management Systems
MI-0025
Assignment Set I
Student Name: Tigist Awoke

++++++++++++++++++++++++++++++++++++++++++++++++++++
For internal files, hash table is an array of records, having array in the range from 0 to M-
1. Let as consider a hash function H(K) such that H(K)=key mod M which produces a
remainder between 0 and M-1 depending on the value of key. This value is then used for
the record address. The problem with most hashing function is that they do not guarantee
that distinct value will hash to distinct address, a situation that occurs when two non-
identical keys are hashed into the same location.

For example: let us assume that there are two non-identical keys k1=342 and k2=352 and
we have some mechanism to convert key values to address. Then the simple hashing
function is:

h(k) = k mod 10

Here h (k) produces a bucket address.

To insert a record with key value k, we must have its key first. E.g.: Consider h (K-
1)=K1% 10 will get 2 as the hash value. The record with key value 342 is placed at the
location 2, another record with 352 as its key value produces the same has address i.e.
h(k1) = h(k2). When we try to place the record at the location where the record with key
K1 is already stored, there occurs a collision. The process of finding another position is
called collision resolution. There are numerous methods for collision resolution.

1) Open addressing: With open addressing we resolve the hash clash by inserting the
record in the next available free or empty location in the table.

2) Chaining: Various overflow locations are kept, a pointer field is added to each record
and the pointer is set to address of that overflow location.

External Hashing for Disk Files

Figure 3.5: Matching bucket numbers to disk block addresses

Handling Overflow for Buckets by Chaining

Page 11 of 18
Database Management Systems
MI-0025
Assignment Set I
Student Name: Tigist Awoke

++++++++++++++++++++++++++++++++++++++++++++++++++++
Hashing for disk files is called external hashing. Disk storage is divided into buckets,
each of which holds multiple records. A bucket is either one disk block or a cluster of
continuous blocks.

The hashing function maps a key into a relative bucket number. A table maintained in the
file header converts the bucket number into the corresponding disk block address.

Figure 3.6: Handling overflow for buckets by chaining

The collision problem is less severe with buckets, because many records will fit in a same
bucket. When a bucket is filled to capacity and we try to insert a new record into the
same bucket, a collision is caused. However, we can maintain a pointer in each bucket to
address overflow records.

The hashing scheme described is called static hashing, because a fixed number of buckets
‘M’ is allocated. This can be serious drawback for dynamic files. Suppose M be a number
of buckets, m be the maximum number of records that can fit in one bucket, then at most
m*M records will fit in the allocated space. If the records are fewer than m*M numbers,
collisions will occur and retrieval will be slowed down.

Dynamic Hashing Technique

A major drawback of the static hashing is that address space is fixed. Hence it is difficult
to expand or shrink the file dynamically.

In dynamic hashing, the access structure is built on the binary representation of the hash
value. In this, the number of buckets is not fixed [as in regular hashing] but grows or
diminishes as needed. The file can start with a single bucket, once that bucket is full, and
a new record is inserted, the bucket overflows and is slit into two buckets. The records
are distributed among the two buckets based on the value of the first [leftmost] bit of their
hash values. Records whose hash values start with a 0 bit are stored in one bucket, and
those whose hash values start with a 1 bit are stored in another bucket. At this point, a
binary tree structure called a directory is built. The directory has two types of nodes.

Page 12 of 18
Database Management Systems
MI-0025
Assignment Set I
Student Name: Tigist Awoke

++++++++++++++++++++++++++++++++++++++++++++++++++++
1. Internal nodes: Guide the search, each has a left pointer corresponding to a 0 bit, and a
right pointer corresponding to a 1 bit.

2. Leaf nodes: It holds a pointer to a bucket – a bucket address.

Each leaf node holds a bucket address. If a bucket overflows, for example: a new record
is inserted into the bucket for records whose hash values start with 10 and causes
overflow, then all records whose hash value starts with 100 are placed in the first split
bucket, and the second bucket contains those whose hash value starts with 101. The
levels of a binary tree can be expanded dynamically.

Extendible Hashing: In extendible hashing the stored file has a directory or index table
or hash table associated with it. The index table consists of a header containing a value d
called the global depth of the table, and a list of 2d pointers [pointers to data block]. Here
d is the number of left most bits currently being used to address the index table. The left
most d bits of a key, when interpreted as a number give the bucket address in which the
desired records are stored.
Each bucket also has a header giving the local depth d1.
Of that bucket – specifies the number bits on which the bucket contents are based.
Suppose d=3 and that the first pointer in the table [the 000 pointer] points to a bucket for
which the local depth d1 is 2, the local depth 2 means that in this case the bucket contains
all the records whose search keys start with 000 and 001 [because first two bits are 00].
To insert a record with search value k, if there is room in the bucket, we insert the record
in the bucket. If the bucket is full, we must split the bucket and redistribute the current
records plus the new one.

For ex: To illustrate the operation of insertion using account file. We assume that a
bucket can hold only two records.

Page 13 of 18
Database Management Systems
MI-0025
Assignment Set I
Student Name: Tigist Awoke

++++++++++++++++++++++++++++++++++++++++++++++++++++
Sample account file

Hash function for branch name

Let us insert the record (Bangalore, 100). The hash table (address table) contains a
pointer to the one-bucket, and the record is inserted. The second record is also placed in
the same bucket (bucket size is 2).

When we attempt to insert the next records (downtown, 300), the bucket is full. We need
to increase the number of bits that we use from the hash value i.e., d=1, 2 1=2 buckets.
This increases entries in the hash address table. Now the hash table contains two entries
i.e., it points to two buckets. The first bucket contains the records whose search key has a
hash value that begins with 0, and the second bucket contains records whose search key
has a hash value beginning with 1. Now the local depth of bucket =1.
Next we insert (Mianus, 400). Since the first bit of h (Mianus) is 1, the new record should
placed into the 2nd bucket, but we find that the bucket is full. We increase the number of
bits for comparison, that we use from the hash to 2(d=2). This increases the number of
entries in the hash table to 4 (22 = 4). The records will be distributed among two buckets.
Since the bucket that has prefix 0 was not split, hash prefix 00 and 01 both point to this
bucket.

Figure 3.7: Extendable Hash Structure

Page 14 of 18
Database Management Systems
MI-0025
Assignment Set I
Student Name: Tigist Awoke

++++++++++++++++++++++++++++++++++++++++++++++++++++
Next (perryridge, 500) record is inserted into the same bucket as Mianus. Next insertion
of (Perryridge, 600) results in a bucket overflow, causes an increase in the number of bits
(increase global depth d by 1 i,e d=3), and thus increases the hash table entries. (Now the
hash table has 23 = 8 entries).

Figure 3.8: Extendable Hash Structure

The records will be distributed among two buckets; the first contains all records whose
hash value start with 110, and the second all those whose hash value start with 111.

Advantages of dynamic hashing:

1. The main advantage is that splitting causes minor reorganization, since only the
records in one bucket are redistributed to the two new buckets.

2. The space overhead of the directory table is negligible.

3. The main advantage of extendable hashing is that performance does not degrade as the
file grows. The main space saving of hashing is that no buckets need to be reserved for
future growth; rather buckets can be allocated dynamically.

Disadvantages:

1. The index tables grow rapidly and too large to fit in main memory. When part of the
index table is stored on secondary storage, it requires extra access.

2. The directory must be searched before accessing the bucket, resulting in two-block
access instead of one in static hashing.

Page 15 of 18
Database Management Systems
MI-0025
Assignment Set I
Student Name: Tigist Awoke

++++++++++++++++++++++++++++++++++++++++++++++++++++
3. A disadvantage of extendable hashing is that it involves an additional level of
indirection.

1. (i) Discuss the criteria for bad relational schemas. Discuss the attribute semantics as an
information measure of goodness of a relation schema.
(ii) Discuss the Transaction Processing Concepts. List and explain the desirable
properties of transactions.
Criteria for good and bad relation schemas.

· Semantics of attributes

· Reducing the Redundant values in tuples

· Reducing the null values in tuples

· Disallowing spurious tuples.

Bad relational schemas and their problems


An example for a bad design is the following table:

Here two concepts were mixed in one table: lectures and professors
who teach the lectures. There
are two groups of problems with this design: Data redundancy and
the so called update
anomalies. (An anomaly means a special, strange situation.) There
are three types of update
anomalies: insert, delete and modification anomalies.

Update anomalies: Update anomalies are those problems which arise from the data
redundancy of the un-normalized database table.

The following are the Update anomalies.

· Insertion Anomalies:

Page 16 of 18
Database Management Systems
MI-0025
Assignment Set I
Student Name: Tigist Awoke

++++++++++++++++++++++++++++++++++++++++++++++++++++
It is difficult to insert a ne department that has no employees as yet in the Emp_dept
relation. This causes a problem because Emp.no is the primary key of Emp_dept. This
problem does not occur in the design of fig.(B), because a department is entered in the
DEPARTMENT relation, whether or not any employee works for it.

· Deletion Anomalies:

If we deletie the lost employee of a department from the emp_dept relation, than the
whole information about that department will be lost. This problem does not occur in the
database of fig.(B) because DEPARTMENT tuples are stored separately.

· Modification Anomalies:

In Emp_dept. if we change the value of one of the attributes of a particular department,


say location of department 5, we must update the tuples of employees who work in that
department, otherwise DB will become inconsistent.

For the schema to be a good one


• Each concept should be modeled in its own relation (for example
Employee is one relation,
• and Project is another relation).
• Data redundancy should be avoided (do not store same
information more than once)
• Avoid attributes whose values are frequently NULL (if
possible)

Definition: A transaction is an atomic unit comprised of one or more SQL statements. A


transaction begins with the first executable statement and ends when it is committed or
rolled back.

Desirable Properties of Transactions

To ensure data integrity, the database management system should maintain the following
transaction properties. These are often called the ACID properties.

1. Atomicity: A transaction is an atomic unit of processing. It is either performed in its


entirety (completely) or not performed at all.

2. Consistency: The basic idea behind ensuring atomicity is as follows. The database
system keeps back of the old values of any data on which a transaction performs a write,
and if the transaction does not complete its execution, the old values are restored to
make it appear as though the transaction was never executed.

Page 17 of 18
Database Management Systems
MI-0025
Assignment Set I
Student Name: Tigist Awoke

++++++++++++++++++++++++++++++++++++++++++++++++++++
For Ex: Let Ti be a transaction that transfers 850 from account A to account B. This
transaction can be defined as

Ti ; read(A)

A :=A-50;

Writ (A);

Read(B);

B:=B+50;

Write (B).

Suppose that before execution of transactions Ti the values of accounts A and B are
Rs.1000 and Rs.2000 respectively. Now suppose that, during the execution of transaction
Ti, a failure has occurred after write(A) operation, that prevents Ti from completing its
execution successfully. But before the write of B operation was executed values of A and B
in database are Rs.950 and Rs.2000. We have lost Rs.50 which is executed in a sequential
fashion.

3. Durability: Once a transaction changes the database and the changes are committed,
these changes must never be lost because of subsequent failures. The users need not worry
about the incomplete transactions. Partially executed transactions can be rolled back to
the original state, ensuring durability is the responsibility of the recovery management
component of the DBMS.

=//=

Page 18 of 18

Das könnte Ihnen auch gefallen