Sie sind auf Seite 1von 162

Chapter 1

In t r o d u ct io n t o DBMS

1.0 OBJECTIVES

n this chapter you will learn

Database definition Concept behind data processing Flat file and its disadvantages Database and Database system Advantages and disadvantages of database system Data independence Architecture of database system Databases and Their management Objectives of DBMS Components of DBMS Types of Databases Database Models

BSIT 24 Basics of DBMS

2 1.1 INTRODUCTION

Chapter 1 - Introduction to DBMS

A database is a collection of operational data that is organized so that its contents can easily be accessed, managed, and updated. Database contains the aggregations of data records or files, such as sales transactions, purchase details, product information, inventory records and also customer details etc. Database system is a collection of database, a DBMS, hardware and users. Database Management system is application software supplied by the vendors, which helps in managing the database. Database Management is an important aspect of data processing. It involves, several data models evolving into different DBMS software packages. These packages demand certain knowledge in discipline and procedures to effectively use them in data processing applications.

1.2 DATA PROCESSING AN IMPORTANT ASPECT OF ANY BUSINESS


Business organizations Big and Small generate lot of data in terms of activities they perform. Even individuals need to handle lot of data in their day to day life. A simple example would be an address book that we all maintain. In this book we keep information like name, address, and phone numbers etc., for all the people with whom we interact. Without this book, it will be impossible for us to carry on our day to day activity of contacting and communicating with our friends, relatives and business associates. As the size of the business organization increases, the amount of data it generates increases exponentially. Hence the need for storing and using them too raises multifold. Modern businesses have recognized this need and duly stress the importance of data as a vital resource to conduct business profitably. Two terms Data and Information are used in this connection. Let us understand their scope and difference.

1.2.1 Data and Information


Data are raw facts or observations typically about physical phenomenon or business transactions. More specifically data are objective measurements of the attributes (or characteristics) of entities (such as people, places, things and events) Example: 1. A sale of automobile may generate a lot of data (like, the type of Vehicle, model, price, date of purchase, buyers name / address, sellers name / address etc). 2. A meteorological satellite may collect and send lot of data about atmospheric pressure, wind velocity and direction, cloud density, Humidity, Temperature etc., on a regular basis continuously.

BSIT 24 Basics of DBMS

The observed data is usually represented by symbols such as numbers words, codes (composed of a mixture of numerical and alphabetical and other characters). It could even take other forms like, voice, images, pictures, drawings, etc. If the observed / collected data is converted into a useful and meaningful form, then it becomes Information. Data is usually subjected to a value-added process called Data Processing OR Information Processing, where 1. its form is aggregated, manipulated and organized, 2. its content is analyzed and evaluated and 3. It is presented in a context meaningful to a human user. Thus we see that information is processed data, placed in a context that gives its value for specific end users. Difference between Data and Information Sl. No. 1. 2. 3. 4. 5. 6. Data Data is raw fact and figure. Ex. 90 is data Data is not significant to a business. Data are atomic level pieces of the Information. Data does not help in decision-making Data is generally in unrecognized form. Data is collected from the source directly and hence is not dependent on Information. Information Data when stored in some form live marks : 90; then it becomes information. Information is significant to a business. Information is a collection of data. Information helps in organized form. Information is in organized form. Information is dependent on the data that is gathered

Information may further be processed to form knowledge. Information containing wisdom is known as knowledged. So, we can say that: Data when processed Information when processed Knowledge

1.2.2 Data / Information Processing and Databases


Information systems are several, depending on the needs of the different types of businesses. Further, there are different types of information systems like, Transaction processing systems, Decisions support

Chapter 1 - Introduction to DBMS

systems, expert systems etc., which are used by different levels of management in a business organization. In spite of all these variations and differences, all Information systems have some things in common. 1. They all use some kind of computerized techniques to store all the data and information generated in the system. 2. They all access the stored information in different ways to do further processing or presentation. Thus we see that data storage and retrieval is one of the central activities in Information processing. Such collection and organization of information is called Data bank. In early days of business, Data banks existed in the minds of key Personnel in the business. As the volume and complexity increased several tools like, Books, records, manuals, drawings etc., were devised as Data banks and manual procedures and skills were evolved to retrieve information from these banks when needed. However these techniques were not reliable and fast enough when the information involved was huge and complex. Hence business decisions could not be accurate and timely. To correct this Lacuna, Information systems were computerized. The speed and accuracy of computers resulted in tremendous improvement of reliability and timeliness of information generated. This process however, involved the development of techniques and tools to handle data banks on computers, namely, the tools to store and retrieve information in computers. The development of such techniques and tools resulted in what are known as DBMS packages. Integrated databanks stored in Computer Systems are called Databases. The Computer Software Packages (a set of tools and utilities) that facilitate the creation use and managing of Databases is called DBMS (Data Base Management Systems). DBMS provides computational capacity to store, retrieve, edit, sort and perform computations including statistics upon data, which it extracts from its storage. The tasks handled by DBMS packages can be classified as: a. Database Development - Define, organize the content, relationships and structure of the data needed to build a database. b. Database Interrogation - Access the data in a database to display information in various Formats. Users can selectively retrieve and display information and produce forms, reports, and other documents. c. Database Maintenance - Add, delete, update, and correct the data in a database. d. Application Development - Develop prototypes of queries, presentation forms, reports for a Proposed business application. Let us try and understand these tasks in detail later. First let us start a detailed study of Data.

Database + DBMS Software = Database system

BSIT 24 Basics of DBMS

1.2.3 Data Types and Properties


All data items have certain fundamental properties. It is important to know them first in order to create databanks. First and foremost property of the data is its form. Every data element will have a form. Data items are classified as different data types based on their form. The form decides the way it is stored in the computer.

1.2.3.1 Data Types


Data can be classified as Numeric, Picture, Voice, Data based on its Form. The last two types namely picture and voice are special forms of data and normally they are used less frequently. It is the textual data that is very large and most used. Hence let us focus on that first. Textual data can be numeric or alphanumeric (combination of numeric and alphabetic). Numeric data consists of numbers. Example: Number of students in a class is 50; Marks obtained in a subject are 78; Price of a given item is Rs. 48.56; As you can notice from the examples, pure numeric data items can be classified further into 2 types. One of them is a whole number. (Like, number of students in a class, number of vehicles in the city) These are called integers. On the other hand, we also have numeric data, which includes fractions. (Like price of an item is 48.56, Max. Temperature today was 28.32 etc). These data items are called Real numbers. This difference of data types namely integer and real number is of importance to us because they are represented and manipulated differently in a computer. The next data type is alphabetic or alphanumeric. This type of data is made up of alphabetic and numeric characters. (E.g.: The name of a person is HARI, the Reg.No. of vehicle is KA 09 F-1234) This type of data may contain numbers along with alphabets but the number is not used as a numeric data in any calculation. This data type is called a string of alphanumeric characters. How are these data represented inside computer?

1.2.3.1.1: Data Representation


All data in computer must be represented using only 2 symbols namely 0 and 1. This system of representation is called binary representation. In order to represent all data types in computers using only

Chapter 1 - Introduction to DBMS

0 and 1, some kind of coding is needed. Integers get directly represented as binary numbers. Real numbers are represented using a technique called Floating point representation. Strings are represented through an elaborate coding mechanism called ASCII (American Standard Code for Information Interchange). This coding uses 8 bits (binary digits) to represent a character. Example : Letter A could be 00110000; Letter B could be 00111000 etc. (You will have the details of data representations in other modules) Even pictorial/images and voice/audio and video data gets coded into a large number of 0s and 1s.

1.2.3.2 Data Size


All data items do have a size. Looking at previous examples we may say Number of Students in a class needs2 digit of space, price for an item may need 4 digit space (2 before decimal and 2 after decimal. decimal point need not be stored). A name string may need a maximum number of 30 character positions. Further, when it is stored inside a computer, it may need 30x 8 =240 bits. A picture data may need several thousand-bit positions. The property size is of special importance to us because we need to provide adequate space to store these items in the system. Further, DBMS packages should be able to distinguish these data types and provide necessary functions to manipulate them.

1.2.3.3 Relationship
Even though data items are individual entities, they never occur in isolation in the real world. They are always associated with other data item. Ex: Data item price is related to the vehicle in question, Date of transaction and the seller. There are 3 different types of data relationships. Let us understand each one of them. Simplest of all is 1 : 1 relationship. For each value of a data item there is one and only corresponding value in the other item. E.g.: Student ID and the student name. E.g.: Vehicle number and vehicle. Normally all such data items are grouped and kept together as a record. Second type of relationship is one to many (1: M). Here for every value of one data item there are several values of the other data item. However on the reverse, several values of other data items are related to a unique value of this data item.

BSIT 24 Basics of DBMS

E.g.: 1. A book has several chapters. But several chapters correspond to one and only one book. 2. A person can own several vehicles; all vehicles will have only one owner. One to many relationships can be represented in computers using pointers and arrays. (Details later) Third type of relationships is called Many to Many. (N: M). Most of the relationships in real world are this type. E.g.: - 1. A student has several teachers; A teacher might have several students. 2. A book can have several Authors. An author might have written several books. This type of relationships is difficult to represent and handle in computers. Hence, as far as possible we try to reduce them to two one to many relations (1: M and N: 1) and eliminate one which is irrelevant to the user. The Database must maintain all the data and their relationships and allow the user to access data based on these relations. E.g.: Get me all vehicles owned by a person. Get me the subjects taught by a teacher.

1.2.4. Data Organization and Grouping


Data as we already mentioned occurs in real world individually. But it is grouped and organized to help process it and generates information. The grouping of related data items from users view is called logical grouping. The grouping of data items from the point of view of its storage inside the computer is called physical grouping. Just as writing is organized in letters, words, sentences, paragraphs and chapters, Data can be organized as characters, fields, records, files and databases.

1.2.4.1 Character
Character is the most basic logical data element, which consists of a single alphabetic / numeric or other symbol. E.g.: The grade obtained in a subject could be A or B or C or D or E. Sex of a person could be M or F. Subject taught during hour.

8 1.2.4.2 Field

Chapter 1 - Introduction to DBMS

Field is the next higher level of data. A field consists of grouping of characters. E.g.: 1. Persons name field will be grouping of alphabetic characters. 2. Sales amount field will be grouping of numeric characters. 3. Teacher teaching the subject for a class. A field represents an attribute of some entity (object, person, place, or event) E.g.: An employees salary is an attribute that is a typical data field associated with the entity employee (in 1: 1 relation)

1.2.4.3 Record
Related data fields are grouped to form a RECORD. A record, thus is a collection of attributes that describe an entity. E.g.: 1. An employee record could consists of attributes like, his ID, name and salary he draws etc. 2. Set of subjects taught for a class during each hour.

1.2.4.4 File
A group of related records is a data FILE. E.g.: 1. A group of all employee records showing one record for each employee could be an employee file. Files are frequently classified by application for which they are used. 2. Timetable for a class for a week showing subjects taught each hour on each day of the week. Files are frequently classified by the application for which they are primarily used such as payroll file, Inventory file etc.

1.2.4.5 Database
A DATABASE is an integral collection of logically related records or objects. It consolidates records

BSIT 24 Basics of DBMS

stored in various files into common pool of data records that provide data for several users. Fig. 1.1 shows the databases, files, records and fields. E.g.: 1. The timetable for an entire school showing the details of classes, subjects, room, teachers etc. A Personnel database consolidates data files like, Payroll files, Personnel action files, employee skill files etc.

Payroll File

Inventory File

Employee Rec # 1

Employee Rec # 2 Name Id Salary

Matl Rec # 1 Matl Id Desc Qty

Malt Rec # 2

Fig.1.1 Database, Files,Fig.1.1 Records and Fields Database, Files, Records and Fields

1.3 FLAT FILE


A flat file consists of only one file, with each entry in the form of a record containing all the required data defined within it. You can imagine a flat file like a cabinet containing only one folder which has many pages in it, each page containing all the information for that specific entry. This type of arrangement makes it easy for the user to know where to find-requested entries and all the data associated with them. As entire data are at one place, retrieval of data is easy and fast. But the flat file suffers from the problem of data redundancy, this also leads to wastage of storage space. For example, assume there is an order processing application, where the data is maintained in the form of a flat file, which includes customer name, customer address and customer phone number, if this customer places order for more than one item, then there could be more than one entry depending on number of items ordered the above information about customer may be repeated, which leads to redundancy as well

10

Chapter 1 - Introduction to DBMS

as wastage of storage space. In future the customer address is to be changed, then we need change in more than one places, this may lead to inconsistency of the data.

1.4 DISADVANTAGES OF FILE ORIENTED APPROACH


The file-oriented approach to information processing discussed above has for each application a separate master file and its own set of other files. For example in COBOL we need to maintain different files for inventory, purchase, sales, payroll, and financial accounting. If the data is maintained like this sharing is not possible. One more major limitations of this approach is programs and data becomes dependent. That is a program become dependent on the files and the files become dependent upon the programs. File oriented approach to data processing suffers from the following significant disadvantages. Data redundancy and inconsistency : Same data is stored in many places, leading to duplication of data. So data becomes redundant, and hence wastage of storage space. This redundancy may lead to data inconsistency, where various copies of the same data may no longer agree. Accessing data becomes difficult: Conventional file-processing environments do not allow needed data to be retrieved in a convenient and efficient manner. More programs are to be written to access data in various forms. Data is isolated: Data are scattered in various files, and files may be in different formats, new application programs are to be written to access required data. Integrity problems: It is difficult to enforce consistency constraints, when new constraints are added, by changing the application programs Security problems: Difficult to enforce security constraints, in case of file-oriented approach. Unauthorized persons can easily access data stored in files. These difficulties associated with file-oriented systems, have prompted the development of database management systems(DBMSs). The differences between file management and database management are tabulated. Sl. No. 1) 2) 3) 4) File management There are small systems like a C++ or Cobol program. They are relatively cheap. They have simple structure. It needs very little preliminary design. Database management 1) These are large systems like Oracle or Sybase. 2) They are relatively expensive. 3) They have complex structure. 4) It needs vast preliminary design.

BSIT 24 Basics of DBMS

11
5) They are more secure. 6) They are multiple user oriented. 7) They have shared data. 8) They have complex and sophisticated backups/ recovery.

5) 6) 7) 8)

They are not secure. They are often single user oriented. They have isolated data. They have simple, primitive backup/ recovery mechanism.

Table: The difference between File management and Database Management.

1.5 DATABASE
A database is a collection of stored operational data used by the application systems of some particular organization. It is defined as the collection of logically interrelated data and description of this data, designed to meet the information needs of an organisation. For example, a dictionary, a telephone directory, student record register etc. They all store data in some particular arranged form.

1.5.1 DATABASE SYSTEM


It is a computer-based system, which aims at recording and maintaining information. A database system involves four component data, software, hardware, and users.

Data
Each database is a repository or storage of the data. The database is integrated and shared. Integrated means the whole data is available in one single place. The term-shared means the individual data items to database can be shared among several users. A database is not just shared by users sequentially but also concurrently, that is at the same time. A database system supporting this form of sharing is called multiuser system.

Software
Database management system (DBMS) is a software lies between the physical database and users of the system. All the requests coming from users for data manipulation are handled by the DBMS. DBMS shields the database users from the hardware level details and supports users operation by retrieving data for a query like Select all the employee records whose salary is more than 10,000 per month

Hardware
The hardware of DBMS consists of two components, namely:

12
a) Processor and main memory b) Secondary storage devices.

Chapter 1 - Introduction to DBMS

The processor and main memory are required to support the execution of DBMS. The seoncary devices like hard disk, CDs etc., are used to store data.

Users
There are three types of users:

Database Administrator, An application Programmer and End user

End-user

Naive users
Fig: 1.1. Database users.

Sophisticated users

The main reason for using DBMSs is to have central control over both the data and the programs that access those data. The person who has such a central control over the database system is called the Database administrator (DBA). Applications programmers are persons who write application programs for using database. These persons uses programming languages for writing these programs which manipulates the database. End users are the persons who interact with the database using application programs written by the application programmers. These persons know how to use the programs, but they do not know how exactly the programs have been written. End users are classified as follows: Naive users: They are usually unaware of DBMS. They dont know anything about database or DBMS. Sophisticated users: The sophisticated end-users is familiar with the structure of the database and the facilities that DBMS provides. Figure 1.1 shows Database users.

BSIT 24 Basics of DBMS

13

1.5.2 Advantages of Database Systems


Database system has several advantages over traditional file based systems. This is because centralized control over the data, and also the data can be shared. The advantages are Redundancy can be minimized : If each department in a company maintains files of employee data, customers data, sales data etc, leads to wastage of storage space, and also there will be considerable redundancy of data ( same data is stored in many places),this leads to inconsistency also when data is updated. In case of database, the DBA can integrate these files suitably and hence redundancy can be eliminated. Inconsistency can be avoided: If the data is redundant then there is possibility that the data becomes inconsistent when updated, if not updated properly in all places. But inconstancy can be removed if a given fact is represented by a single entry. The data can be shared: The data stored for one application, can be used for another application. Thus, the data stored in the database for one application can be shared/used with new applications. Security can be enforced : Database administrator (DBA) has full control over the operational data, he can define the access paths for accessing the data stored in the database and also DBA can define authorization checks whenever access to sensitive data is attempted by unauthorized users. Integrity of Data can be kept intact : Centralized control can also ensure that adequate checks are incorporated in the DBMS to provide data integrity. Various consistency constraints can be applied to maintain the integrity of the data. This is possible because the whole data is available in one single place. Standards can be enforced: While developing applications for storing, manipulating the database, Database Administrator, can ensure that the international, national, industry, company standards are followed. Backup and recovery can be provided: Recovery from hardware or software failure is possible, the DBMS must provide this. For this purpose we have backup and recovery subsystem. Responsible for both backup of the data and recovery whenever failure occurs.

1.5.3 Disadvantages of a Database Systems


A significant disadvantage of the DBMS is its cost. Cost is associated with purchasing or developing the software, and the purchase of required hardware. The centralization of the data reduces the duplication, but database has to be adequately backed up so that in case of any failures occurs the data can be recovered.

14
The disadvantages are

Chapter 1 - Introduction to DBMS

Problems associated with centralization of data: Centralization leads to accessing the data from one single place, the data has to be maintained properly, so that only authorized person will access the data present in the database. Cost of hardware and software: DBMS software usually costs more. We need to spend for developing the required software depending on the application even after purchasing the DBMS software. Hardware needed depends on the amount of data to be maintained and manipulated. Cost of migration: The replacement from one type of database to another also costs more. Complexity associated with backup and recovery: As the data in the database increases the complexity with backup and recovery during failure increases.

1.5.4 Architecture of a Database System


The architecture of a database system consists of three views:

External or user view Conceptual or global view Internal view

Such an architecture was proposed by the American National Standards Committee on Computers and Information Processing. Most current DBMSs support, to various extents, the separation of the physical database, the conceptual schema, and the user view. The three-tier DBMS architecture is shown in figure 1.2. The external or user view is at the highest level, where it concerns to a user or application program. A schema called an external schema describes each external view. This level is concerned with the way the data is viewed by individual users. End users can be of any degree of sophistication, and/or with different authorisations. As a result, they may need to be given different views of the database. An individual view is just a subset of the conceptual schema. Conceptual or global view represents the entire database. The conceptual schema defines this conceptual view. The description of data at this level is in a format independent of its physical representation. It also includes various features that specify various checks to maintain data consistency and integrity. It is also a representation of the entire information content of the database in a form that is more abstract in comparison with the way in which the data is physically stored i.e. it is the logical description of the entire database, the overall logical view of the data and their relationships, as seen by database developers, by the system administrators, and by the authorized users who require access to the entire database.

BSIT 24 Basics of DBMS

15
End users view End users view External Schema B External view A External view B

External Schema A

External/Conceptual mapping A

External/Conceptual mapping B

DBA builds schemas and mapping

Conceptual view

Database management system (DBMS)

Conceptual / Internal mapping A Physical or Internal schema Stored database (Internal view)
Fig: 1.2. Three-tier DBMS architecture.

Internal view is closest to the physical storage, indicates how data will be stored and describes the data structures an access methods to be used by the database. The internal view is defined by internal schema. It is concerned with the way that data is physically stored. This is an aspect of the database seen only by system programmers concerned with issues such as performance optimisation. Ordinary users are not to be involved at this level. The three schemas are only descriptions of data. The only place that the data is actually stored is at the physical level. Mappings between the three levels are provided by the DBMS.

1.5.5 Data Independence


The file-based applications are data dependent. How the data is stored on to secondary storage devices and how to access them is dictated by the programs. The DBMS provides data independence where applications written are data independent. This is possible because of the architecture of DBMS.

16

Chapter 1 - Introduction to DBMS

In the case of DBMS data independence is defined as the capacity to change a schema at one level of a database system without having to change the schema at the next higher level. There are two levels of independence: Physical data independence which insulates applications from the underlying physical storage organisation of the data, i.e. changes at the physical level do not have to affect the conceptual schema or the external schema. Logical data independence which insulates applications from changes made to the logical organisation of data, i.e. changes made to the conceptual schema should not affect the individual views unnecessarily.

1.6 DATABASES AND THEIR MANAGEMENT


Databases, as we have already seen represents the techniques of storing, accessing and managing of data. DBMS serves as the software providing these techniques and interfaces between Database and the users. It (DBMS) is a set of computer programs that controls definition, construction and maintenance and use of databases a central repository of all data of an organization and its end users.
DATA BASE

Application Programmes Application Programmes Application Programmes

User

User

User

Fig 1.2 A schematic of Database Management System

Creation of database involves specifying data types, structures and their relationship constraints for the data stored in database. Construction of a database is the process of storing the database, by populating data in it in the computer storage medium.

BSIT 24 Basics of DBMS

17

Maintenance of database includes such functions as updating and accessing the data in the database to reflect changes in the real world. E.g.: Let us consider a college environment, wherein we need to maintain data about class scheduling. Data like a) Courses and sections b) Subjects to be taught for each course c) Teachers teaching the subjects d) Rooms in which classes are held e) Timing for teaching the subject. The basic entities in this example are subjects, courses, teachers, rooms, students etc.; there will be associations or relationships linking these entities. E.g.: Subject and Teacher have N: M association. A teacher may teach several subjects. Several teachers may teach a subject.

1.7 DATABASE MANAGEMENT SYSTEM


DBMS is the software that handles all access to the database in four different steps, they are

A user of the database gives a request for accessing the record stored in the database using data manipulation language (DML) DBMS takes the request from the user and interprets it DBMS inspects the database DBMS carries out the required operation on the stored database

1.7.1 Objectives of DBMS


DBMS as a system has been designed to serve the management of a business organization. Its objectives can be listed as follows. 1. Provide for mass storage of relevant data. 2. Make access to the data easier to user.

18
3. Provide prompt response to the users request for data. 4. Allow for the modification of data in a consistent manner. 5. Eliminate or reduce the redundant data. 6. Allow multiple users to be active at a time.

Chapter 1 - Introduction to DBMS

7. Protect data from physical hardware failure and unauthorized access.

1.7.2 Components of DBMS


DBMS packages on personal computers allow end users to develop databases for their personal need. They are called single user databases. However, large organizations with lot of users usually place control of enterprise database development in the hands of the DATABASE ADMINISTRATORS (DBAs) and other specialists. This improves the integrity and security of organizational databases. Database developers use DATA DEFINITION LANGUAGE (DDL) to specify data structures, relationships and modify these structures if needed. The detailed information about these structures is called METADATA. It is stored in the DATA DICTIONARY component of DBMS, which is maintained by DBA. Users are allowed to insert, modify, delete and retrieve data from the database according to their needs. They use DATA MANIPULATION LANGUAGE (DML) for this purpose. Further, DBA needs to guard this database from media failures, accidental erases etc., For this purpose, he creates copies of the databases and the changes occurring for later recovery in case of failures. He uses DATABASE UTILITIES to handle these functions of backup and recovery. The other major components of DBMS are, DDL compiler : The DDL compiler converts the data definition statements into a set of tables. These tables are defined based on what information to be stored in the database. DML compiler: This converts DML statements embedded in an application program to normal procedure calls in the host language. The DML compiler interacts with another component called query processor. File Manager: File manager manages the allocation of space on disk storage and the data structure used to represent information stored on disk. Query Processor: The query processor processes the online users queries and convert it into efficient series of statements executes it and required information will be provided to the user.

BSIT 24 Basics of DBMS

19

1.8 Role of Database Administrator (DBA)


A DBA is an individual person or a group of persons with an overview of one or more databases so that he/she can control the design and the use of these databases. A DBA is the highest salary paid person in an organization. A DBA provides the necessary technical support for implementing policy decisions of databases. DBA is the central controller of the database system who manages all resources line database, DBMS and related software. A DBD is supported by a number of staff or a team of system programmers and other technical assistants. The main functions of DBA are as follows: Schema definition: The DBA is responsible for creating the original database schema by writing a set of definitions. These definitions are translated by the DDL compiler to a set of tables, and these are stored permanently in the data dictionary Defining storage structure as well as access mechanisms: DBA specifies appropriate storage structures and access methods for accessing the data stored. Coordinating with all type of database users: DBA interacts with all types of database users and sees that the data they require is available. Granting of authorization to users: DBA grants authorization to users for accessing data from database, so that unauthorized users are kept out of accessing database. Specifying integrity constraints: The various consistency constraints imposed to maintain the integrity are specified by the DBA. Back up and recovery policies: DBA is responsible for taking decisions about when to take the back up so that data can be recovered easily when failure occurs.

1.9 DATABASE SOFTWARE


There are three categories of DBMS software:

Languages: to create, use, and maintain the database. Utilities: to provide support facilities such as report generation, graphical output, statistical operations, and various interfaces. Operational routines: for run-time management such as back-up and recovery, and for concurrency control.

DBMS Languages:
The categories of DBMS language are shown in the following figure 1.3.

20
Languages of DBMS

Chapter 1 - Introduction to DBMS

DDL

DML

Procedural DML (how ?)

Non-procedural DML

e.g., PL/SQL

Formal query Language

Commercial query Language

Relational algebra

Relational calculus

SQL

QBE

QUEL

Fig: 1.3. Categories of DBMS language

Data definition language (DDL)


This is the storage definition language to specify the internal schema. The schema DDL is a high level notation for describing the record types and relationships existing in the database in terms of a underlying data model. Considerations of physical details such as storage structure are not to be involved in the DDL at this level because of data independence. The view definition language (Sub-schema DDL) is to describe a view of the database and specify the mapping to the conceptual schema.

BSIT 24 Basics of DBMS

21

Data manipulation language (DML)


The DML is a language used by the application programmers or some sophisticated end users to interact with the database. It offers facilities for insertion, deletion, modification, and retrieval of data. Commercial DBMSs usually provide various types of DMLs, both procedural and non-procedural (e.g. declarative). Three examples of possible DML forms:

An interactive command language (like a DOS/UNIX command language). A library of pre-defined procedures that may be called by application programs written in a standard programming language. A procedural programming language, unique to a particular DBMS, but which may be based on a standard language with added facilities for data manipulation.

Some DBMSs do not often provide distinct DDLs and DMLs. Instead they offer an integrated language which combines the capabilities of both the DDL and DML.

Query language (QL)


A query language is a very high level, non-procedural language provided by the DBMS to facilitate retrieval or simple updates when communicating with the database from a terminal or within an application program.

General features of a QL

QLs range in power and sophistication from semi-procedural interactive programming languages to very high level natural languages. High level QLs are convenient for naive end users. Most QLs lack the power of conventional languages to perform complex computations.

1.10 CLASSIFICATION OF DATABASE MANAGEMENT SYSTEMS


We have several criteria to classify the DBMSs First criteria are based on data models on which the DBMS is based. The data models which can be use are hierarchical, network, relational, object and object relational. The second criterion used to classify DBMS is based on number of users supported by the database

22

Chapter 1 - Introduction to DBMS

system. Single-user systems support only one user at a time. These type of systems are mostly used with personal computers. Multi-user systems, supports multiple users concurrently. Majority of DBMSs is of these types. A third criterion is the number of sites over which the database is distributed. Databases can be centralized or distributed. If whole data is stored at a one single site, the DBMS is called centralized DBMS. In case of distributed DBMS the actual database and DBMS software is distributed over many sites. A fourth criterion is based on cost of the DBMS. Multi-user DBMS packages costs more, compare to Single-user packages.

1.11 TYPES OF DATABASES


Developments in Information Technology have resulted in several major types of Databases.

Operational Databases
These databases store detailed data needed to support an entire organization. They are also called subject area databases, (SADB) Transaction databases and Production databases. These databases carry up-to-date information of business activities. Business supervisors in charge of day-to-day operation most frequently use them.

Analytical Databases
These databases contain information extracted from operational databases. They are used by the managers to study the trends and patterns emerging in the business to make strategic decisions and policy making. They are also known as Data warehouses, information Databases and Decision support Databases. They are generally used in query mode rather than update mode. Techniques like online Analytical Processing (OLAP) and Data Mining are used in these databases to generate meaningful information for business analysis, market research etc,

Distributed Databases
Many of the contemporary applications have geographical distribution. Advent of networking technology has made it possible to distribute the database across several computers connected in a network. This improves local access of data, and remote update without increasing the load on networks. Hence many organizations distribute copies or parts of databases to computer systems at different sites, linked to each other through networking. Such databases over a network of computers are known as Distributed Databases. Ensuring that all of the data in an organizations distributed databases are consistently and concurrently updated is a major challenge of Distributed Database Management.

BSIT 24 Basics of DBMS

23

Personal End User Databases


These databases consists of a variety of data files created by end users on their PC for personal uses. They are generally single user databases with lesser stress on backup and recovery. The data in these databases may be generated with, word processors, spreadsheets and other PC software packages.

Multimedia Databases
These databases include non-conventional data like, pictures, voice tracks along with conventional alphanumeric data. These databases tend to be huge in size and access is done through specialized access language constructs. The data accessed further needs to be interpreted and displayed by additional frontend software like Browsers and media players. From database management viewpoint, the set of interconnected multimedia data needs to be handled as specialized structures rather than simple records.

Special Purpose Databases


These databases are developed and used for certain special purpose applications. Spatial Databases, Temporal databases Biological databases etc. belongs to this category. The data stored in these applications are of a different kind and needs to be interpreted according to the ground rules of those applications. Hence special techniques are used for storage and access of data in these databases.

1.12 DATABASE MODELS


Databases are distinguished based on the conceptual model of data and the underlying relationships among them. All models try to represent data and their relationships using simple elegant models. That is A data model is a set of concepts that can be used to describe the structure of a database. Structure of a database means the data types, relationships, and constraints that should hold for the data. As data in a database can be anything, they must be organised in a unified way. A data model provides a conceptual basis for design, a formal basis for defining unambiguously the data items and their relationships to be stored, and a framework for implementation. Data models are at three levels:

High level or conceptual data models provide concepts that are close to the way in which most users perceive the data. Low level or physical data models provide concepts that describe the details of how data are stored at the physical level. Representational or implementation data models are between the above. They provide concepts that may be understood by end users, and also have a close link with the low level data representations.

24
The three important data models are,

Chapter 1 - Introduction to DBMS

Hierarchical data model Network data model Relational data model Object oriented data model

An early data model widely used in 70s was HIERARCHICAL Model where the model captures the intuitive hierarchy of data elements. User is allowed to navigate through the data structures using the tree like hierarchies. The early generation database from IBM, namely IMS, is based on this model. Hierarchical models cannot represent many to many relationships in an elegant fashion. Such data relationships resulted in cumbersome structures with lot of duplication of data and slow access. To get over these limitations CODASYL committee proposed a NETWORK MODEL in 70s and 80s. IDMS from cullinet, DMS 1100 from Unisys Corporation, are typical representatives of this generation of databases. While the network model provided much more abstraction power and very good performance for large volume data, it lacked elegance. It required high level of skills to use these databases. Further, it was difficult to dynamically alter the structures. Mr. Codd of IBM later proposed an elegant and flexible RELATIONAL MODEL. The elegance, simplicity and a solid theoretical foundation made this the darling of database developers and users. Today, this is the most popular database available on range of machines from PCs to mainframes. DB2 of IBM, ORACLE, INFORMIX, ACCESS, LOTUS etc., are all based on this popular model. DBMSs built using this model use SQL (Structured Query Language) as the means to create and manipulate data. SQL is an elegant, simple yet powerful interface to all relational databases. The present day RDBMSs provide support for several other tools and utilities to ease application development. Most common utilities are A screen designer to generate user-friendly fill in form type interface to access and manipulate data. e.g.: ORACLE FORMS Report Generator to access data and present it in a printed format suitable for the end user. e.g.: ORACLE REPORT GENERATOR Utilities to load and extract Bulk data from the database are provided to speed up data loading and extraction. e.g.: Import, Export Features of ORACLE. DBA utilities to, manage security and limit access to data. Current generation DBMS packages provide most of these above utilities along with some more to

BSIT 24 Basics of DBMS

25

manage Databases effectively. They in fact, create a total environment under which the user can comfortably handle all his information processing needs. The object-oriented data model where objects and their relationships represent a database.

Overall there are four commonly used representational data models

The hierarchical data model where a database is represented by tree-like structures. If the data are not naturally hierarchical, then this model imposes quite severe restrictions on the database designer/developer. The network data model where a database is represented by a directed graph, the nodes of which represent the data entities (of record types), and the arcs of which define the relationships among the entities. The relational data model, based on the mathematical notion of a relation. In this model both the data entities and their relationships are represented by two dimensional tables. The object-oriented data model where a database is represented by objects and their relationships. Object-oriented database systems have their origins in object-oriented programming languages (such as Smalltalk and C++). An object may be viewed as an information item that closely resembles the object in the real world. Some novel concepts, such as class hierarchies and class composition hierarchies, are provided so that the object-oriented model is closer to the high level data models and yet not far away from the physical level data models. Concepts are also provided to enable an effective transformation between the object-oriented model and a low level model.

1.13 SCHEMAS AND INSTANCES


The schema is the skeleton of a database. Instances have to be created when data are actually stored in the database. The schema is also called meta-data, and stored separately from the instances in the physical database. A schema is defined as an outline or a plan that describes the records existing at a particular level. The collection of information stored in the database at a particular moment is called as an instance of the database.

1.14 SUMMARY
In this chapter we discussed many concepts on database, database systems and data base management systems. We defined a database as a collection of related data, where data in nothing but recorded facts. We also said a database is a collection of operational data that is organized so that its contents can easily

26

Chapter 1 - Introduction to DBMS

be accessed, managed, and updated. Database system is a collection of database, a DBMS, hardware and users. Database Management system is application software supplied by the vendors, which helps in managing the database. We discussed about data and information, data types, data representation, and data size. A flat file consists of only one file, with each entry in the form of a record containing all the required data defined within it. File oriented approach to data processing suffers from number of significant disadvantages. We said database system has several advantages and disadvantages over traditional file based systems. The advantages are

Redundancy can be minimized Inconsistency can be avoided The data can be shared Security can be enforced Integrity of Data can be kept intact Standards can be enforced Backup and recovery can be provided

The disadvantages are


Problems associated with centralization of data Cost of hardware and software Cost of migration Complexity associated with backup and recovery

We discussed about data independence, most important concept in DBMS, and architecture of DBMS. The architecture contains three views, external, conceptual and internal. Then we presented databases and their management, objective of DBMS as well as its components. The role of DBA is also most important in maintaining the database. We presented important functions of DBA. We listed how database is classified based on several criterias like, data models, number of users, number of sites, and cost. We then discussed various types of databases, like, operational databases, analytical databases, distributed database, personal end user databases, multimedia databases and special purpose databases. Finally, we discussed briefly various data models used for maintaining databases.

BSIT 24 Basics of DBMS

27

CHECK YOUR
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22.

PROGRESS

A database is a collection of ____________ data Database Management system is ______________software Data are ________facts DBMS expands to _________________ All data in computer must be represented using only 2 symbols namely __ and __. A field represents an ________of some entity A record, is a collection of ___________that describe an entity. A group of related records is a __________. A __________is an integral collection of logically related records or objects. A flat file consists of only _______ file The person who has central control over the database system is called the ___________ Applications programmers are persons who write ____________ The external or user view is at the _________ level ___________ view represents the entire database. Internal view is closest to the _____________ Meta Data is store in ______________ DDL expands to ___________ DML expands to ______________ Hierarchical, network, relational, object and object relational are called _________ Multi-user systems, supports multiple users ____________ supervisors in charge of day-to-day operation most frequently use ______________ databases. The databases include non-conventional data like, pictures, voice tracks along with conventional alphanumeric data are called _________ databases.

23. 24. 25.

The three important data models are ________ , ____________ and ____________. ___________ committee proposed NETWORK MODEL __________ of IBM proposed an elegant and flexible RELATIONAL MODEL.

28
ANSWERS
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. Operational Application Raw Database Management Systems 0 and 1 attribute attributes data FILE DATABASE One Database administrator application programs highest Conceptual or global physical storage Data dictionary Data Definition Language Data Manipulation Language Data Models Concurrently Operational Multimedia Hierarchical, network and relational CODASYL Mr. Codd

Chapter 1 - Introduction to DBMS

ANSWER THE FOLLOWING QUESTIONS


1. 2. What is data? Give examples. Explain briefly the tasks of DBMS packages.

BSIT 24 Basics of DBMS

29

3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26.

Explain how data is classified. Giving examples explain how data is represented. Explain different types of data relationships. Give examples. How data is organized ? Explain. What is a record? Give example. What is a file ? Explain. Give examples. What is a flat file ? Explain. What are the disadvantages of file oriented approach ? Explain. What is a database? What are the components of database system ? Explain. Explain briefly different type of database users. Explain clearly the advantages f database systems. What are the disadvantages of a DBMS ? Explain. What is data independence? Explain. Explain the architecture of a database system with figure. Explain the schematic of database management systems. List the steps associated with accessing data from the database. List the objectives of DBMS. What are the components of DBMS? Explain. Who is DBA? List and explain the functions of DBA. Explain the classification of DBMS. Explain briefly major type of databases. Explain multimedia and special purpose databases. Explain briefly three important data models. Write a note on DBMS languages along with figure.

REFERENCES

Bipin C. Desai, An Introduction to Database System, Galgotia Publications, New Delhi. Elmasri & Navathe, Fundmentals of Database systems, Addision Wesley. Rajesh Narang, Database Management Systems, Preentice-Hall, New Delhi.

Chapter 2

St o r ag e, Reco r d s an d File Or g an izat io n

2.0 OBJECTIVES

n this chapter you will learn

Storage of information Record and Record Organization Files and file Organization Sequential file organization Index Sequential file organization Direct file organization

2.1 THE STORAGE OF INFORMATION


In a digital computer there are two types of memory units, namely operational units and storage units. The name that is commonly associated with operational units are register, A register is used for the temporary storage and manipulation of information. The storage type memory unit is designed to store information, which is more permanent in nature.

2.1.1 Operational Unit


Some of the most important registers are contained in the central processing unit (CPU) of the computer.

30

Chapter 2 - Storage, Records and File Organisation

BSIT 24 Basics of DBMS

31

The CPU contains registers which holds the arguments ( i.e. operands or information) of the arithmetic computations. Besides storing operands and results from arithmetic operations registers are also used to temporarily store program instructions and control information concerning which instruction that is to be executed next. Because of their highly specialized nature, registers have a great deal of combinational logic (i.e. circuitry) associated with them. This makes them expensive relative to storagetype memory units in computer. Consequently, registers are only used to store information temporarily.

2.1.2 Storage Unit


The storage type memory unit is designed to store information, which is more permanent in nature. For example, a particular storage unit or set of storage units is associated with a particular variable in program, variable can be referred as varies the value or quantity which is present during execution. However, before arithmetic computations involving a variable are performed, the value of variable as stored in the memory unit, must be transferred to register unit. The transfer must take place, because memory units do not have the necessary logic associated with them (or between them) to execute arithmetic operations. If the result of a computation is to be assigned to a variable, the result value must be transferred from an arithmetic register back to the memory unit associated with the variables. When program is executed, its instructions and data generally reside in storage units. The entire set of storage units in the main frame or main part of the computer is often called main memory. In some instances program can also reside in storage units, which do not belong to main memory. Examples of such storage unit devices (often-called secondary storage device) are magnetic disk and magnetic drum. The data in the main memory or internal memory of computer can be accessed very quickly, a typical access time is less than 1 micro second (= 106 sec). Main memory provides for the immediate storage requirements of central processor for execution of program. The storage capacity of main memory is limited by two factors the cost of memory and technical problems in developing large capacity main memory. The storage requirements for programs and the data on which they operate exceed the capacity of main memory in virtually all computer systems. Therefore, it is necessary to extend the storage capabilities of a computer by using device external to main memory.

2.1.3 External Storage unit


An external storage device may be loosely defined as a device other than main memory on which information or data can be stored and from which the information can be retrieved for processing of some subsequent point in time. The storage and retrieval operations are referred to as writing and reading,

32

Chapter 2 - Storage, Records and File Organisation

respectively. External storage devices have a larger capacity and are less expensive per bit of information stored than in main memory. The time required to access the information however is much greater with these devices. The primary use of external storage device include 1. Backup of programs during execution. 2 Storage of programs and subprogram for future use. 3. The storage of information in files The most common external storage devices in order of their initial development and use are magnetic tape, magnetic drum and magnetic disk.

2.2 RECORD AND RECORD ORGANIZATION


Record is a collection of fields. In a file the records are organized in a logical sequence of records. These records are mapped onto disk blocks. Files are provided as a basic construct in operating system. Although blocks are of a fixed size determined by the physical properties of the disk and by the operating system, record size vary.

Records Emp No: 98643 Emp No: 35679 Emp No : 34567 Name : Raksha Field Address : 23 B/29 Shivaji Nagar Date of Birth : 23-July-1968 Blood Group : A Doctor : Dr. Ram Dept. : Cardiology City : Bangalore Record
Fig: Records and fields.

BSIT 24 Basics of DBMS

33

2.2.1 Definition and concepts


Comprehensive and consistent overview of hierarchy of information structures associated with file processing. A record (some time called a group or segment) is a collection of information items about a particular entity, for example record may consist of information about a passenger on an airplane flight, or an article sold at retail distribution store, or information about students. An item (some time called field) of a record is a unit of meaningful information about an entity. The different items of a passenger record may be passengers name, address, seat number, date and time. Generally an item of a record may be an integer or real or characterstring data element. However items may themselves be composed of aggregates of items, such as an array of items or a sub collection of non-homogenous items. Non- homogenous items are mixed type, like integer, real and character could be mixed. The notion of record in its most general interpretations can be loosely equated to a structure. For example, a possible structure for a passenger record is declared as follows. Record Name : Passenger Name Initials char (4), Surname char (20), Address char (30), Menu char (2) A collection of records involving a set of entities with certain aspects in common and organized for some particular purpose is called a file. For example the collection of all passengers on a particular flight constitutes a file. A record item that uniquely identified in a file. In the passenger file, individual passenger records can uniquely identified by the passengers assuming duplicate names do not occur for a particular flight. The seat number item can be also be used as key, if desired, since seat numbers are uniquely assigned for a given flight. It is common practice to order the records in a file according to a key. Therefore if the passenger name is selected as the key item, the record for Adams appears before the record for Brown, which appears before the record for Camp in alphabetical ordering by surname. Some files are ordered on a particular item, termed the sequence item, which may not be unique for each record. e.g.: In a file of monthly sales for particular company, several records containing sales information may appear for one customer. The file can be ordered by customer account number with more than one occurrence of a customer sales record type for a given account number. Thus we have observed a hierarchy of information structures in which items are composed to form records and records are composed to form a file. If the set of files used by the application programs for some particular enterprise or application area, and if these files exhibit certain associations relationship

34

Chapter 2 - Storage, Records and File Organisation

between the records of the files than such collection of files as often referred to as a database or data bank. The following Figure shows the information structure hierarchy as it applies to a file processing application.

Database

File

File

File

Record

Record

Record

Item

Item

Item

Figure: Information structure hierarchy for file processing.

Let us examine that some of the factors that effect the organization of a file. The prime factor, which determines the organization of a file, is the nature of operations that are to be performed on the file, as dictated by applications. The operations normally performed are namely, retrieval, addition, deletion and updation. A particular operation involving a record or set of records is called transaction. e.g.: Delete Raja from the student list for the First Year is a transaction. Add Joseph to student list for First Year

2.2.2 Record Organization


In a relational database record of distinct relations are generally of different sizes. One approach to mapping the database to files is to use several files and store records of only fixed length in any given file. An alternative is to structure our files in such a way that we can accommodate multiple lengths for record. Files of fixed length records are easier to implement than files of variable records.

Fixed Length Record


A record item has a fixed length value and its domain is too large for an efficient encoding, a primitive data-structure( i.e., integer, real, char) format should be selected for the representation of the item. For

BSIT 24 Basics of DBMS

35

example it is unreasonable to bit-encode an item representing the net sales for the month. We can declare a record containing such an item in the programming language being used. Record : Monthly_Report Month Char(10), Net_sales Fixed (7,2) The net sales item can be range in value from -99999.99 to 99999.99. It unrealistic for the programmer to bit encode such wide range of item values when the compiler provides en efficient encoding of an item value in binary with a fixed decimal format. Record item represented by Month can be significantly reduced in size if we use a fixed-length a binary code of 0000B for January, 0001B for February,..... 1011B for December and declare the item to be type BIT(4). Because both of these items may be considered as fixed length items, they can technically be called precoordinated. That is fixed length item can only have a finite set of values which can be priori enumerated.

Variable Length Record


Many applications arise in which the value associated with a record item may be list of entities. For example the degree held and programming languages used at a computer installation are item which can assume multiple entities. In these instances, the item vale may be B.Sc., M.Sc., Ph.D, or COBOL,C, Pascal, Fortran respectively. The most popular method of handling repeating fields is to create an item, which can accommodate up to some maximum number of replications. If we represent this maximum number to three, then the example items can accommodate such information as the three most recent degrees obtained and three most often used programming languages.

2.3 FILES AND FILE ORGANIZATION


The technique used to represent and store the records on a file is called file organization. The fundamental file organization techniques are Sequential and Index sequential. The presentation of each of these organizations begins with a description of its file structure. There are two basic ways that the file organization techniques differ. First, the organization determines the files record of sequencing, which the physical ordering of the records in storage. The second, the file organization determines set of operations necessary to find a particular record. Individual records are typically identified by having particular values in search key fields. This data field may or may not have duplicate values in file, the field can be a group or elementary item, some file organization techniques provide rapid accessibility on a verity of search key; other techniques support direct access only on the value of a single key. The most appropriate organization for a particular file is determined by the operational characteristics of storage medium used and the nature of the operations to be performed on the data. The most important

36

Chapter 2 - Storage, Records and File Organisation

characteristics of storage device that influence the selection of a storage device, once the appropriate file organization techniques have been determined. Whether the device, allows direct access to particular record occurrences without accessing all physically prior record occurrences that are stored on the device, or allows only sequential access to record occurrences. Magnetic disks are examples of direct access storage devices, magnetic tapes are examples of sequential storage devices.

2.3.1 Structure of Sequential Files


In a sequential file, records are stored one after the other on storage device and sequential allocation is conceptually simple, yet flexible enough to cope with many of the problems associated with handling large volumes of data, a sequential file has been the most popular basic file structure used in the data processing industries. All types of external storage devices support a sequential file organization. Some devices, by there physical nature, can only support sequential files. Information is stored on magnetic tape as a continuous series of record along the length of the tape. Accessing particular record requires the accessing of all previous records in a file. Other devices, which are strictly sequential in nature, are tape cassettes and line printers. The operations that can be performed on a sequential file may differ slightly, depending on the storage device used. For example, a file on magnetic tape can be either an input file or output file, but not both at one time. A sequential file on a disk can be used strictly for input, strictly for output, for update. Update means that, as records are read, the record most recently read can be rewritten on the same file

2.3.2 Processing of Sequential File


Having discussed the physical layout of a sequential file and how records are transferred to/from the program area from/to the file, let us examine the types of processing for which sequential files are most suitable. Serial processing is the accessing of records, one after the other, according to physical order in which they appear in the file. Obviously, it is an easy matter to process sequential files serially. Sequential processing is the access of records, one after the other, in ascending order by a key or index item of the record. eg; MASTER file of employees records is ordered by employee surname, as follows AGARKER first , BAKER second ,., ZIDANE last. Then sequentially processing the file by surname is equivalent to serially processing file. Most sequential

BSIT 24 Basics of DBMS

37

files are ordered by a key or index item, such as employee name, student identification number when the file is created. The key or index item should be the item, which is most often searched for when processing the file. To show the importance of the key selection, assume the MASTER file of employees identification number. Suppose we want to find the records of a number of employee given only there names. Finding the first employees record, say AGARKER is simply a matter of serially processing the file until the record with name item of AGARKER appears. Consider the processing of a second record, say for BAKER. Since the position of BAKERs record bears no relationship with position of AGARKER record, we have no alternative but to start once again serially processing at the beginning of the MASTER file. There are occasions in which, serial processing is all that is required on a file irrespective of the key or item index upon which the file is ordered. For example, if we are to add a pay increase of 1000 Rupees the wage item of all employees, it is irrelevant whether the file is sequenced by name or by employees identification number. In Sequential processing, transaction records are usually grouped together and sorted according to the same index item as records in the file. Each successive record of the file is read, compared with an incoming record and then processed in a manner that is usually dependent upon whether the value of the record index item is less than, equal to, or greater than the value of the index item of the transaction record. Sequential and serial processing are most effective when high percentage of the record in file must be processed. Since every record in the file must be scanned, a relatively large number of transactions should be grouped together for processing. If records are to be added to a file, it is necessary to create a new file unless the records are to be added to the end of the file. Important points of the sequential process of sequential files. 1. Sequential processing is most advantageous if a large number of transaction can be grouped to form a single run on the file. 2. A new file should be created if there are any additions and deletions requested. 3. Quick response time should not be expected for a transaction or a batch of transactions.

2.3.3 Advantages of sequential organization


The advantages of sequential organization are Simplicity: The records are orgainzed sequentially, we need to create a record whenever is to be inserted. The created record will be inserted at the end. All storage devices supports sequential organization. Less overheads: It is not necessary to maintain any additional information to access the records. Records will be accessed in sequential fashion.

38

Chapter 2 - Storage, Records and File Organisation

2.3.4 Problems with Sequential Organization


Sequential organization has number of problems, they are Difficulties associated with searching: If we want a 100th reord in a pool of 500 records, we need to search sequentially beginning with 1st record till we reach required record. That is, in technical terms, searching information in a sequential file can be a very slow process. For any search operation, we need to start reading a sequential file from the beginning and conitnue till the end, or until the desired record is found, whichever is earlier. This is both time-consuming and cumbersome. Lack of support for queries: Sequential files are not suitable for answering queries,if the file contains large number of records. In technical terms, it means that to even find out whether something is available in the file or not, the entire file has to be read. Problem associated with record deletion: It is difficult to delete records from a sequential file. Fo this purpose we need to perform file reorganization, where the records are to be copied from file 1 to file 2 leaving the records to be deleted while copying. This process is time consuming one.

2.4 INDEX SEQUENTIAL FILE


The retrieval of a record from a sequential file, is inefficient and time consuming for large files. To improve the query response time of sequential file, the type indexing techniques can be added. Most important aspect affecting the file structure is the type of physical medium on which the file resides. The capability of directly accessing a record based on a key can only be achieved if the external storage device used supports this type of access. In particular, devices such as magnetic tape and cassette tape units allow the access of a particular record only after reading all the other records that physically appear before a desired record in the file. Hence direct access is impossible for these types of devices. The type of external storage devices that support for both sequential and direct are magnetic disks unit. The file structure concept relating to indexed sequential are best exemplified when considering a magnetic disk as the storage medium. In fact, because of their low price/performance ratio and large total storage capacity, disks are generally chosen when using indexed sequential files. Indexing associates a set of orderable quantities, which are usually smaller in number for faster search. The idea of indexing is to expedite the search process. Indexes are created from a sequential (or sorted) set of primary keys are referred to as index sequential. We shall use the term index file to describe the indexes, data file referred to data records and pointer is address of the variable. A sequential file that is indexed is called an index sequential file. The index provide the random access to records, while sequential nature of the file provides easy access to the subsequent records as

BSIT 24 Basics of DBMS

39

well as sequential processing. An index sequential file consists of three separate areas: the prime area the index area and the overflow area. An additional feature of this file system is the overflow area. This feature provides an additional space for record addition without necessitating the creation of a new file. The prime area is an area into which data records are written when the file is first created. The file is created sequential, that is, by writing records in prime area in a sequence dictated by the alphabetical ordering of the keys of the records. The cylinder of a disk. When this cylinder is filled writing continuos on the second track of the next cylinder and continues in this fashion until the files creation is completed. If the newly created file is accessed sequentially according to the key item, the records are processed in the order they were written.

2.4.1 Type of Indexes


Index access structure is similar to that behind the indexes used commonly in textbooks. A textbook index lists important terms at he end of book in alphabetic order. Along with each term, a list of page numbers where the term appears is given. We can search the index to find a list of addresses -page numbers in this case and use addresses to locate term in the textbook by searching the specified pages.

2.4.1.1 Primary Indexes


A primary index is an ordered file whose records are fixed length with two fields the first field is of the same data types as the ordering key field of the data file, and the second field is pointer to disk block address. The ordering key field is called the primary key of the data file. There is one index entry in the index file for each block in the data file. Each index entry has the value of the primary key for the record in a block and a pointer to other block as its two filed values. We will refer to two field values of index entry i as K(i), P(i). Block 1

NAME Aaron, Ed Abbott, Diane

SSN

JOB

SALARY

SEX

 :
Acosta, Marc

40
Block 2
Adams, John Adams, Robin : Akers, Jan

Chapter 2 - Storage, Records and File Organisation

Block 3
Wright Pam Wyatt,Charles : Zimmer, Byron
Figure 2.4: Some blocks on an ordered (sequential) file of Employee records with name as the ordering field

To create a primary index on the ordered file shown in figure 2.4, we use the Name field as primary key, because that is ordering key field to the file. Each entry in the index will have a Name value and pointer. Figure 2.5 illustrate this primary index. The total number of entries in the index will be the same as the number of disk blocks in the ordered data file. The first record in each block of the data file is called the anchor record of the block, or simply the block anchor similar to one described here can be used , with last record in each block, rather than the first, as block anchor, a primary index is an example of what is called non-dense index because it includes an entry for each disk block of the data file rather than for every record in the data file. A dense index, on the other hand, contains an entry for every record in the file. The index file for a primary index needs substantially fewer blocks than the data file for two reasons. First there are fewer index entries than there are records in the data file because an entry exist for each whole block of the data file rather than for each record. Second each index entry is typically smaller in size than a data record because it has only two fields, so more index entries than data records will fit in one block. A binary search on the index file will hence require fewer block accesses than a binary search on the data file.

BSIT 24 Basics of DBMS

41
DATA Figure 2.5

Major problem with primary index as with any ordered file is insertion and deletion of records. With primary index, the problem is compounded because if we attempt to insert in its correct position in the data file., we not only have to move records but also change some index entries because moving records will change the anchor records of some blocks. We can use unordered overflow file. Another possibility is to use a linked list of overflow records for each block in the data file. We can keep the records within each block and its overflow-linked list sorted to improve retrieval time. Record deletion can be handled using deletion markers.

42

Chapter 2 - Storage, Records and File Organisation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

9 5 13 8 6 15 3 17 21 11 16 2 24 10 20 1 4 23 18 14 12 7 19 22
Figure 2.6: A dense secondary index on a non ordering key field of a file

2.4.1.2 Secondary Indexes


A secondary index also is an ordered file with two fields, and, as in the order indexes, the second field is pointer to disk block. The first field is of the same data type as some non-ordering field of the data file. The field on which the secondary index is constructed is called an indexing field of the file, whether its values are distinct for every record or not. There can be many secondary indexes, and hence indexing fields, for the same file. We first consider a secondary index on a key field a field having a distinct value for every record in the data file. Such field sometimes called a secondary key for the file. In this case there is one index entry for each record in the file, which has the file of the secondary key for the record and pointer to the block i which the record is stored. A secondary index on a key field is a dense index because it contains one entry for each record in the data file.

BSIT 24 Basics of DBMS

43

We again refer to the two field vales of index entry i as K(i), P(i). The entries are ordered by value of K(i), so we can use binary search on the index. Because the records of the data file are not physically ordered by value of the secondary key field, we cannot use block anchors. That is why index entry is created for each record in the data file rather than for each block as in the case primary index. Figure 2.6 illustrates a secondary index on key attributes of a data file. Notice that in figure 2.6 the pointers P (i) in the index entry are block pointers, not record pointers. Once appropriate block is transferred to main memory. A search for the desired record within the block can be carried out. A secondary index will usually need substantially more storage space than primary index because of its larger number of entries. However, the improvement in search time for an arbitrary record is much greater for a secondary index than it is for a primary index. Because we would have to do a linear search on the data file if the secondary index did not exist. For primary index, we could still use binary search on the main file even if the index did not exist because the records are physically ordered by the primary key field.

2.4.2 Structure of Index sequential file.


An index sequential file consist of the data and one more levels of indexes. When inserting a record, we have to maintain the sequence of records and this may necessitate shifting subsequent records. For large file this is a costly and inefficient process. Instead the records that overflow their logical area are shifted into designated overflow area and pointer is provided in the logical area or associated index entry point to overflow location. This is illustrates figure 2.7 below record 615 is inserted in the original logical block causing a record to be moved to an overflow block.

611

661 2

661 4

661 8

662 4

Original logical Block 6611 661 2 661 4 661 5 661 8 624

Original logical Block Figure 2.7 : Overflow of record

Over flow Block

Multiple record belonging to same logical area may be chained to maintain logical sequencing. When records are forced into overflow area as result of insertion, the insertion process is simplified, but the search time is increased. Deletions of records from index-sequential files create logical gaps; the records are not physically removed but only flagged as having been deleted. If there were a number of deletions, we may have great amount of unused space.

44

Chapter 2 - Storage, Records and File Organisation

Index-sequential file has following components


1. A primary data storage area. In certain systems this area may be unused spaces embedded within it to permit addition of records It may also include records that have been marked as having been deleted. 2. Overflow areas. This permits the additions of records to the file. A number of schemes exist for the incorporation of records in these areas into the expected logical sequence. 3. A hierarchy of indices. In a random inquiry or update, the physical location of the desired record is obtained by accessing these indices.The primary data area contains the records written by the users programs. The records are written in data blocks in ascending key sequence. These data blocks are in turn stored in ascending sequence in the primary data area.

2.5 Direct File Organization


The index-sequential file organization considered in the previous sections, the mapping from searchkey value to the storage location is via index entries. In direct file organization the key value is mapped directly to storage location. The usual method of direct mapping is by performing some arithmetic manipulation of the key value. This process is called hashing. Let us consider hashing function h that maps key value key k to the value h(k). The value h(k) is used as an address and for our application we require that this value be in some range. If our address area for the records lies between s1 and s2, the requirement for the hash function h(k) is that for all values of k it should generate values between s1 and s2. It is obvious that a hash function that maps many different key values to a single address or one that does not map the key values uniformly is bad hash function. A collision is said to occurs when two distinct key values are mapped to the same storage location. Collision is handled in a number of ways. The colliding records may be assigned to the next available space, or they may be assigned to overflow area. We can immediately see that with hashing schemes there are no index to traverse. With well-designed hashing functions where collisions are few, this is great advantage. Another problem that we have to solve is to decide what address is represented by h(k). Let address generated by the hash function the address of buckets in which the y, address pair value of records are stored. Figure 1.8 shows bucket contains the y, address pairs that allow a reorganization of the actual data file and actual records address without affecting the hash functions. A limited number of collisions could be handled automatically by use of the bucket of sufficient capacity. Obviously the space required for the bucket will be, in general, much smaller than the actual data file. Consequently, its

BSIT 24 Basics of DBMS

45

reorganization will not be that expensive. Once the bucket address is generated from the key by hash function, a search in the bucket is also required to locate the address of the required record. However the bucket size is small, this overhead is small. The use of the bucket reduces the problem associated with the collisions. In spite of this, a bucket may become full and the resulting overflow could be handled by providing overflow buckets and using a pointer from the normal bucket to an entry in the overflow bucket. All such overflow entry are linked. Multiple entries from the same bucket results in a long list and slow down the retrieval of these records. In an alternative scheme, the address generated by the hash function is bucket address and the bucket is used to store the records directly instead of using a pointer to the block containing the record. Let S represent the value: S = upper bucket address value - Lower bucket address value + 1 S gives the number of buckets, simple hashing functions h(k) = k mod s, where k the numeric representation of the key and h(k) produces a bucket address. Simple Hashing Functions are given below 1) Use the lower order part of the key. For key that is consecutive integers with few gaps, this method can be used to map the keys to the available range. 2) Square all or part of the key and take a part from the result, the whole or some defined part of the key is squared and number of digits are selected from square as being part of the hash result. A variation is the multiplication scheme where one part of the key is multiplied by the remaining part and a number of digits are selected from the result. 3) End Folding, for a long key, we identify start, middle and end regions, such that sum of the lengths of the start and end regions equals the length of the middle region. The start and end regions are concatenated of digits is added to the middle region digits. This new number, mod s where s is the upper limit of the hash function, gives the bucket address: 123456 123456789012 654321 for above key the end folding gives the two values to be added as 123456654321 and 123456789012

46 2.5.1 Blocks of records


Bucket1 Key address 209 610 920 976

Chapter 2 - Storage, Records and File Organisation

496

176

176 Bucket2

177

610

362 Bucketn

331 920 209 209

Overflow Buckets 331 362

Figure: Bucket and block organization for hashing

BSIT 24 Basics of DBMS

47

2.5.2 Advantage of hashing


1) Key matches are extremely quick. 2) Hashing is very good for large keys, or those with multiple columns, provided the complete key value is provided for the query. 3) No disk space used by this indexing method

2.5.3 Disadvantage of hashing


1) It becomes difficult to predict overflow because the working of the hashing algorithm will not be visible to the data base administrator. 2) No sorting of data occurs either physically or logically so sequential access is poor. 3) This organization is usually takes a lot of disk space to ensure that no overflow occurs.

2.6 SUMMARY
All businesses need to process data. Processing the data is necessary to obtain useful information. As data volume increases, the data processing becomes highly complex. Computers are used in this process. One important aspect of this computerized data processing is the storage and retrieval of data. Databases provide this functionality and DBMS packages are software tools to implement databases. Data as an entity has several important properties like form, size, organization and relationships. The form of data namely numeric, alphabetic, integers and real numbers represent the different types of data stored in databases. Size of the data plays a central role in deferring the volume of database and techniques needed to store them. Organizing and grouping of the data, into characters, fields, records and files of define the basic building blocks of the database. Databases are classified into different types of databases based on their usage. Different Data Models have resulted in different kinds of databases that provide the basic service of storage and retrieval of the data. In this chapter, we discussed storage of information in register, main memory and secondary memory. A register is used for the temporary storage and manipulation of information. Registers are also used to temporarily store program instructions and control information concerning which instruction that is to be executed next. The storage type memory unit is designed to store information, which is more permanent in nature. An external storage device may be defined as a device other than main memory on which information or data can be stored and from which the information can be retrieved for processing of some subsequent

48

Chapter 2 - Storage, Records and File Organisation

point in time. The storage and retrieval operations are referred to as writing and reading, respectively. External storage devices have a larger capacity and are less expensive per bit of information stored than in main memory. The time required to access the information however is much greater with these devices. The primary use of external storage device include 1. Backup of programs during execution. 2. Storage of programs and subprogram for future use. 3. The storage of information in files The most common external storage devices in order of their initial development and use are magnetic tape, magnetic drum and magnetic disk. A file is a collection of logical information. Each file has an associated file name. A file contains many records. One record consists of one or more fields. A field that identifies a record is called the record key. A primary key identifies a record uniquely. A secondary key may or may not identify a record uniquely. Records can be stored in two forms, fixed length and variable length. There are three fundamental file organization techniques, These are sequential, Index-sequential and direct file organization. The selection of the appropriate organization for a file in an information system is important to the performance of that system. The fundamental factors that influence the selection process include the following: 1 Nature of operation to be performed 2 Characteristics of storage media to be used. 3 Volume and frequency of transaction to be processed 4 Response time requirements.

Check your progress


1. 2. 3. 4. 5. 6. 7. 8. 9. A ________ is used for the temporary storage and manipulation of information. The _______ type memory unit is used to store information, which is more permanent in nature. The storage and retrieval operations are referred to as ______ and _________. Record is a collection of ________. It is common practice to order the records in a file according to a ________. A particular operation involving a record or set of records is called ___________. The technique used to represent and store the records on a file is called ____________. _________ are examples of direct access storage devices _________ are examples of sequential storage devices.

BSIT 24 Basics of DBMS

49

10. 11. 12. 13. 14. 15.

In a _______ file, records are stored one after the other on storage device. A sequential file that is indexed is called an _________________ The index provide the _________ access to records The usual method of direct mapping is by performing some ________manipulation of the key value. A _________ is said to occurs when two distinct key values are mapped to the same storage location. Hashing is very good for _______ keys

Answers
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. Register Storage Writing and Reading. Fields. Key. Transaction. File Organization. Magnetic disks. Magnetic tapes Sequential Index sequential file Random Arithmetic Collision Large

Answer the following questions


1. What are the two different type of memory units ? Explain. 2. List the primary use of external storage device. 3. What is a record? Give its structure with an example. 4. With a diagram explain information structure hierarchy for file processing. 5. What are the types of record organization? Explain.

50
6. 7. 8. 9. 10. 11. 12. 13. 14. 15. What do you mean by file organization? Explain. Explain clearly sequential file organization.

Chapter 2 - Storage, Records and File Organisation

Explain briefly the advantages and problems associated with sequential organization. What is an index sequential file? Explain. Explain different types of indexes. What are secondary indexes? Explain. Explain the structure of an index sequential file. Explain the concept of direct file organization. Explain some hashing techniques. Explain the advantages and disadvantages of hashing.

References
1. Tremblay and Sorenson, An introduction to Data structures with applications 2nd Edition 1984, Mc Graw Hill publications Bipin Desai, An Introduction to data base system, Golgotia Publications New Delhi.

2.

Chapter 3

En t i t y-Rel at io n sh ip Mo d el

3.0 OBJECTIVES

n this chapter you will learn

Entities and Attributes Attribute types Keys Relationship type and sets Weak entity types Nonbinary relationship Entity-Relationship Diagrams Reducing ER Diagrams into Tables

3.1 INTRODUCTION
The database design consists of three components: Conceptual design on the basis of user requirements, Data modeling (use of E-R diagrams and normalization) and physical design and Implementation. The Entity-Relationship (E-R) model is used as an information model to develop conceptual structure. The E-R data model considers the real world consisting of a set of basic objects (entities), attributes for these entities and relationships among these objects. The ER model describes data as entities, relationships BSIT 24 Basics of DBMS

51

52

Chapter 3 - Entity-Relationship Model

and attributes. E-R diagram uses graphical notations to represent them. The diagram is documented as entity-Relationship diagram.

3.2 The database system development process


There are five phases of the life cycle

Requirement analysis: to obtain a clear and concise description of the application area to be modelled and to derive information about the nature and volume of data to be stored and processed. Data modelling: to develop a global design for the database with the ultimate objective of achieving an efficient implementation which satisfies the requirements. Implementation: to transfer the design into a database system which operates under the control of a particular DBMS. Testing: to discover any errors that have risen during the modelling and implementation phases and to ascertain, in conjunction with the user community, whether the system satisfies the information demands of users and the requirements of application programs. Maintenance: to correct errors discovered during testing; to modify the system due to changes in users requirements and to improve system performance and user interfaces.

The E-R modeling is associated with the second phase here.

3.3 ENTITIES AND ATTRIBUTES


The basic object that the ER model represents is an entity, which is a thing in the real world with an independent existence. An entity may be an object with a physical existence, for example, a particular person, car, book, house, or employee or it may be an object with a conceptual existence, for example, a company, a job, or a university course, an account. Each entity has attributesthe particular properties that describe it. For example, an employee entity may be described by the employees name, age, address, qualification, salary, and job. A particular entity will have a value for each of its attributes. The attribute values that describe each entity become a major part of the data stored in the database. Several types of attributes occur in the ER model:

simple versus composite, single-valued versus multivalued, and Stored versus derived.

BSIT 24 Basics of DBMS

53

We first define these attribute types and illustrate their use via examples. We then introduce the concept of a null value for an attribute.

Composite versus Simple (Atomic) Attributes


Composite attributes can be divided into smaller subparts, which represent more basic attributes with each having independent meanings. For example, the Address attribute of the employee entity can be subdivided into StreetAddress, City, State, and Pincode, with the values 24/17 SLane, Mysore, Karanataka, and 570 008. Attributes that are not divisible further are called simple or atomic attributes. Composite attributes can form a hierarchy; for example, StreetAddress can be further subdivided into three simple attributes: Number, Street, and ApartmentNumber. The value of a composite attribute is the concatenation of the values of its constituent simple attributes. Address

The above figure shows a hierarchy of composite attributes. Composite attributes are useful to model various situations in which a user sometimes refers to the composite attribute as a unit but at other times refers specifically to its components. If the composite attribute is referenced only as a whole, there is no need to subdivide it further into component attributes. For example, if there is no need to refer to the individual components of an address (pin code, street, apartnumber and so on), then the whole address can be designated as a simple attribute. Similarly there is no need to refer to the individual components of a name( first name, middle name, last name) then the whole name can be designated as a simple attribute.

3.3.1 Single-Valued versus Multivalued Attributes


Most attributes have a single value for a particular entity; such attributes are called singlevalued. For example, Age is a single-valued attribute of a person. Similarly Roll Number is a single valued attribute of a student. In some cases an attribute can have a set of values for the same entityfor example, a Colors attribute for a car, or a collegeDegrees attribute for a person, dependent names of a person. Cars with one color have a single value, whereas two-colors cars have two values for Colors.

54

Chapter 3 - Entity-Relationship Model

Similarly, one person may not have a college degree, another person may have one, and a third person may have two or more degrees, therefore, different persons can have different numbers of values for the CollegeDegrees attribute. A person can have one or more dependents, Such attributes are called multivalued. A multivalued attribute may have lower and upper bounds to constrain the number of values allowed for each individual entity. For example, the Colors attribute of a car may have between one and four values, if we assume that a car can have at most four colors. The CollegeDegrees attribute may have between one and three values, if we assume a person can have at most three degrees. Similarly the dependents attribute can have values from one to six, depending on number of dependent names.

3.3.2 Stored versus Derived Attributes


In some cases, two (or more) attribute values are relatedfor example, the Age and BirthDate attributes of a person. For a particular person entity, the value of Age can be determined from the current (todays) date and the value of that persons BirthDate. The Age attribute is hence called a derived attribute and is said to be derivable from the BirthDate attribute, which is called a stored attribute. Some attribute values can be derived from related entities; for example, an attribute NumberOfEmployees of a department entity can be derived by counting the number of employees related to (working for) that department. In addition to the above attributes it is possible to store null values for an attribute

3.3.3 Null Values


In some cases a particular entity may not have an applicable value for an attribute. For example, the ApartmentNumber attribute of an address applies only to addresses that are in apartment buildings and not to other types of residences which are individual, such as single-family homes. Similarly, a CollegeDegrees attribute applies only to persons with college degrees. For such situations, a special value called null is created. An address of a single-family home would have null for its ApartmentNumber attribute, and a person with no college degree would have null for CollegeDegrees. Null can also be used if we do not know the value of an attribute for a particular entityfor example, if we do not know the home phone of Stephan. The meaning of the former type of null is not applicable, whereas the meaning of the latter is unknown. The unknown category of null can be further classified into two cases. The first case arises when it is known that the attribute value exists but is missingfor example, if the Weight attribute of a person is listed as null. The second case arises when it is not known whether the attribute value existsfor example, if the Homephone attribute of a person is null.

BSIT 24 Basics of DBMS

55

3.4 ENTITY TYPES


An entity type defines a collection (or set) of entities that have the same attributes. Each entity type in the database is described by its name and attributes. For example EMPLOYEE and COMPANY, are two entity types and there is a list of attributes for each. An entity type is represented in ER diagrams as a rectangular box enclosing the entity type name. Attribute names are enclosed in ovals and are attached to their entity type by straight lines. Composite attributes are attached to their component attributes by straight lines. Multivalued attributes are displayed in double ovals. An entity type describes the schema or intension for a set of entities that share the same structure.

3.5 ENTITY SETS


The collection of all entities of a particular entity type in the database at any point in time is called an entity set; the entity set is usually referred to using the same name as the entity type. For example, EMPLOYEE refers to both a type of entity as well as the current set of all employee entities in the database. The collection of entities of a particular entity type are grouped into an entity set, which is also called the extension of the entity type.

3.6 KEYS
Differences between entities in an entity set must be expressed in terms of attributes known as keys. These facilitate us to uniquely identify each entity in a set. Keys can be of various types, they are

Super key Key Composite key Candidate key Primary key

Super Key
It is a set of one or more attributes, which put together, enable us to identify uniquely an entity in the entity set. For example STU_NAME and ROLLNUM form a super key for the entity set Student. But STU_NAME alone can not act as a super key since two students could have the same name.

56
Key

Chapter 3 - Entity-Relationship Model

A superkey is a set of attributes that uniquely identifies every entity in the entity set, while a key is a minimal set of such attributes. The word minimal comes from the fact that we cannot exclude any attribute from a key and still identify an entity uniquely. To understand this concept let us consider an example. Consider an entity type student containing two attributes STU_NAME and ROLLNUM. Then the concept of superkey and a key are, both the attributes put together make up the superkey. However it is not a key because it is not a minimal set of attributes. On the other hand ROLLNUM is a key, because it is a minimal set of attributes that can identify an entity uniquely in the entity set.

Composite Key
There are situations exist where a single attribute cannot constitute a key. That means a single attribute can not uniquely identify every entity in the entity set. In such situations we need to have two or more attributes together in order to identify every entity in the entity set uniquely. A key consisting of two or more attributes is called as a composite key For example, consider an entity type SUPPLIER_PART with attributes SUPPLIER_ID, PART_ID , QUANTITY. If you observe neither the SUPPLIER_ID nor the PART_ID can identify an entity in the entity set uniquely. However, the two of them together can easily identify any entity in the entity set uniquely. Hence it is a composite key.

Candidate key
A superkey may contain extraneous attributes, and we are often interested in the smallest superkey. A superkey for which no subset is a superkey is called a candidate key. For example in the entity type STUDENT, with attributes STU_NAME, ROLLNUM, ROLLNUM is a candidate key, as it is minimal, and uniquely identifies a Student entity. For example consider again the entity type salespersons containing the attributes SNUM,SNAME, REGION, QUANTITYSOLD, COMMISSION. From the list of attributes it may appear that apart from SNUM, SNAME can also be a key. This assumption is correct as long as we have unique sales person names. However if we cannot make this assumption, SNAME cannot be a candidate key. Now if we add one more attribute into the salespersons entity called PASSPORT_NUM, this can certainly be another candidate key, because we can identify a person uniquely based on the passport number. With this SNUM and PASSPORT_NUM are the two candidate keys.

Primary Key
The primary key identifies every entity in the entity set uniquely. It is a candidate key (there may be more than one) chosen by the database designer to identify entities in the entity set uniquely. When we have two or more candidate keys, we have to decide which of them becomes the primary key. Examples

BSIT 24 Basics of DBMS

57

for primary keys are ROLLNUM from Student entity, SNUM or PASSPORT_NUM from Salespersons entity. The criterion to choose a primary key from set of candidate keys is based on day-to-day working as well as data entry.

3.7 RELATIONSHIP TYPE AND SETS


A relationship type R among n entity types E1, E2, . . . , En defines a set of associationsor a relationship setamong entities from these entity types. As for the case of entity types and entity sets, a relationship type and its corresponding relationship set are customarily referred to by the same name, R. Informally, each relationship instance ri in R is an association of entities, where the association includes exactly one entity from each participating entity type. Each such relationship instance ri represents the fact that the entities participating in ri are related in some way in the corresponding miniworld situation. For example, consider a relationship type WORKS_FOR between the two entity types EMPLOYEE and DEPARTMENT, which associates each employee with the department for which the employee works. Each relationship instance in the relationship set WORKS_FOR associates one employee entity and one department entity In ER diagrams, relationship types are displayed as diamond-shaped boxes, which are connected by straight lines to the rectangular boxes representing the participating entity types. The relationship name is displayed in the diamond-shaped box. Relationship types can also have attributes, similar to those of entity types. For example, to record the number of hours per week that an employee works on a particular project, we can include an attribute Hours for the WORKS_ON relationship type. The degree of a relationship type is the number of participating entity types. Hence, the WORKS_FOR relationship is of degree two. A relationship type of degree two is called binary, and one of degree three is called ternary. The cardinality ratio for a binary relationship specifies the maximum number of relationship instances that an entity can participate in. For example, in the WORKS_FOR binary relationship type, DEPARTMENT:EMPLOYEE is of cardinality ratio 1:N, meaning that each department can be related to (that is,employs) any number of employees, but an employee can be related to (work for) only one department. The possible cardinality ratios for binary relationship types are 1:1, 1:N,N:1, and M:N. An example of a 1:1 binary relationship is MANAGES , which relates a department entity to the employee who manages that department. This represents the miniworld constraints thatat any point in timean employee can manage only one department and a department has only one manager. The relationship type WORKS_ON is of cardinality ratio M:N, because the miniworld rule is that an employee can work on several projects and a project can have several employees. Cardinality ratios for binary relationships are represented on ER diagrams by displaying 1, M and N on the diamonds.

58

Chapter 3 - Entity-Relationship Model

EMPLOYEE

WORKS ON
A binary relationship

PROJECT

SKILL

USES M N PROJECT
A ternary relationship

P PERSON

3.8 STRONG AND WEAK ENTITY SETS


Let us first of all understand what is a weak entity set ? The entity set which does not have sufficient attributes to form a primary key is known as a weak entity set, on the other hand, an entity set that has a primary key is known as a strong entity set. For example, consider an entity set PAYMENT which has three attributes: a) Payment - number b) Payment - date and c) Payment - amount Although each payment entity is distinct but payment for different loans may share the same payment number. Thus, this entity set does not have a primary key and it is a weak entity set. Each weak entity set must be a part of 1 : M relationship set. A member of a strong entity set is called dominant entity and a member of weak entity set is called as a subordinate entity. A weak entity set does not have a primary key but we need a means of distinguishing among all those entities in the entity set that depend on one particular strong entity set. Also the discriminator of a weak entity set is a set of attributes that allows this distinction to be made.

BSIT 24 Basics of DBMS

59
Payment-date

Loan-no

Amount

Payment-no

Payment amount

LOAN STRONG ENTITY

Loan payment

PAYMENT WEAK ENTITY

Fig: Relation between a strong and weak entity set.

As shown in figure, the attribute payment-number acts as a discriminator for the payment entity set. It is also called as partial key of the entity set. Then, how to form the primary key of a weak entity set? The rule is The primary key of a weak entity set is formed by the primary key of the strong entity set on which the weak entity set is existence dependent plus the weak entity sets discriminator. In this example, (loan-no, payment-no) acts as a primary key for payment entity set. The relationship between the weak entity and strong entity set is called as an Identifying Relationship. Like in our example above, loan-payment is the identifying relationship for the PAYMENT entity.

3.9 THE ENTITY-RELATIONSHIP DIAGRAM


The overall logical structure of a database can be expressed graphically by an E-R diagram. The E-R diagram uses three basic concepts, entities, their attributes and the relationship, that exists between the entities. The graphical notations are used for representing them.

60
Notations used in ER-diagrams Sl.No. Symbol

Chapter 3 - Entity-Relationship Model

Meaning

1.

Entity type

2.
Diamonds

Weak entity type

3.
Diamonds

Relationship

4.

Identifying relationship

5.

Attribute

6.

Primary key attribute

7.

Multi-valued attribute

8.

Derived attribute

9. E1 10. E1 I R N E2 D iam onds R E2 D iam onds

Composite attribue

Total participation of E2 in R

11.

Cardinality relation 1 : N for E1 : E2 in R

BSIT 24 Basics of DBMS

61

The rectangles, ovals, diamonds and lines are important graphical symbols used for representing the entities, attributes and relationships in an E-R diagram.

Rectangles : Represent entity sets Ellipses : Represent attributes

Diamonds : Represents relationships among entity sets Lines : Link attributes to entity sets and entity sets to relationships

3.10 EXAMPLE FOR AN E-R DIAGRAM


Let us consider two entities CUSTOMER and ACCOUNT. CUSTOMER is having the attributes CUSTNAME, ADDRESS, CITY, PHONENUM. ACCOUNT has attributes ACC_NUM and BALANCE. The relationship between these two entities is formed called HAS, with the attribute DATE. The whole representation in the form of E-R diagram is shown as below.

3.11 MAPPING CARDINALITIES


It indicates number of entities with which another entity can be associated via a relationship. The degree of relationship is called cardinality. For a binary relationship the cardinality is two. The mapping cardinality between two entity sets A and B must be one of the following. They are

One-to-One (1:1) One-to-Many (1:N) Many-to-One (N:1) Many-to-Many(M:N)

62

Chapter 3 - Entity-Relationship Model

One-to-One : An entity in A is associated with at most one entity in B, and an entity in B is associated with at most one entity in A. Example: Each Department has one manager and each manager is associated with one department. Assume the attributes for both DEPT and MANAGER entities. One-to-Many: An entity in A is associated with any number of entities in B. An entity in B is associated with at least one entity in A. Example:

One department has many employees working for it, whereas one employee has only one department. That is for one occurrence of department there are many occurrences of employees, and so it is a one-tomany relationship. Many-to-One: An enity in A is associated with at most one entity in B. An entity in B is associated with any number of entities in A. Example:

Dept

Has

Employees

In case of one-to-many relationship, if we reverse the association from employee to dept , then it becomes many-to-one relationship. That is many employees are working for a department, and one department has many employees. That is for many occurrences of employees there is one occurrence of department, and so it is many-to-one relationship. Many-to-Many: Entities in A and B are associated with any number from each other.

Supplier

Supplies

Parts

BSIT 24 Basics of DBMS

63

A supplier supplies many parts to the same customer. A customer can buy the same part from many suppliers. This association between the entities in Supplier and Parts is an example for many-to-many relationship.

3.12 WEAK ENTITY SETS IN E-R DIAGRAMS


A weak entity set is indicated by a double-outlined box. To illustrate the concept of weak entity, let us consider to entity sets ACCOUNT and TRANSACTION. ACCOUNT, the set of all accounts created and maintained in the bank. Attributes of ACCOUNT are acc# and balance. TRANSACTION, the set of all account transactions executed in the bank. Attributes of TRANSACTION are transaction#, date and amount. Here transaction is the weak entity set, because it is not possible to identify each row uniquely even by considering all the attributes. Hence the weak entity set transaction depends on strong entity set account via the relationship A_T. This is shown in the following figure.

When we link weak entity to a strong entity, normally, these relationship sets are many-to-one, and have no descriptive attributes. The primary key of the weak entity set is the primary key of the strong entity set, and it is existence dependent on, plus its discriminator. For example in the above figure transaction is the weak entity and its existence depends on account, and hence primary key of transaction contains acc# and transaction# both. In this case the relationship A_T becomes redundant since the attributes of A_T are same as that of transaction, and is thus redundant.

3.13 NONBINARY RELATIONSHIPS


More than one relationship can also exist between two entities. For example, if we have two entities FACULTY and DEPARTMENT, with attributes for each entity, then a faculty can be head of the department, he is also doing the job of teaching, or might have formerly worked in that department. Then all these can be represented using a nonbinary relationship, as shown in the figure below.

64

Chapter 3 - Entity-Relationship Model

Former

D ia m o n d s

Faculty

Teach

D iDepartment am onds

Heads

D ia m o n d s

3.14 TERNARY RELATIONSHIP


If a relationship is formed between three entities, then it is called ternary relationship. For example consider three entities, CUSTOMER, ACCOUNT, and BRANCH. Each entity has its own attributes. If we draw an E-R diagram among these three entities, then the diagram says that a customer may have several accounts, each opened in a specific bank branch, and that an account may belong to several different customers. This relationship is also called nonbinary relationship.

BSIT 24 Basics of DBMS

65

3.15 ADVANTAGES AND DISADVANTAGES OF ERM


Advantages
Exceptional conceptual simplicity : All database models available yield a much better logical view of the data than the traditional file management systems. However, the ERM yields a particularly easily viewed and understood conceptual view of a databases main entities and their relationships. Therefore complex database designs can be easily created and managed using ERMs. Effective communication tool: There are many users of the database. The ERM allows the database designer to capture the different views of data as seen by the database designers, programmers, managers and end users. Therefore the ERM effectively integrates the different views of the database. Visual representation: The data and data relationships can be easily represented using graphical symbols, therefore the ERM gives the designers, programmers, and end users, easy to understand the data and relationships. Integrated with the relational model: The relational database model can be easily derived looking at ERM. The ERM facilitates the relational database design a very structured process.

Disadvantages
Limited Constraint representation: Difficult to represent all constraints using this model. Constraints such as the student grade point average ranges between 0.0 and 4.0. and a worker may not be allowed to work more than 10 consecutive hours of duty time can not be represented. Limited relationship representation: Relationships are represented with in the diagram as occurring between entities. No data manipulation language: manipulation commands. The ERM is not complete, because of lack of data

Loss of information content: The models/diagrams tend to become crowded when attributes are represented. Therefore, database designers usually avoid attribute mapping, thus decreasing the models information content.

3.16 RELATIONAL DATABASE DESIGN


The usual method is to make an E-R design first and map it into a relational one for implementation in some Database product. An alternative is to is to start with arbitrary tables and re-arrange their schemas in stages giving new tables each time called normalization. (Normalization can also be used after E-R and mapping to test a design usually few changes will be required)

66

Chapter 3 - Entity-Relationship Model

3.17 THE E-R TO RELATIONAL MAPPING ALGORITHM :


The steps involved Step 1. For each regular entity type of ER schema that you identify - include a relation (table) and in it put the simple attributes of the entity. Underline the key. (For example EMPLOYEE: Fname, Lname, Enum, Bdate, Address, Sex, Salary). If a key is multi-part then more than one attribute will be underlined. Step 2. Weak entity types - include all simple attributes of any weak entity W in a separate table R. Then include as foreign key attributes of R the primary key(s) of each table that corresponds to the owner entity type(s). This represents the identifying relationship type of W. (For example DEPENDENT is the table the corresponds with the weak entity type DEPENDENT, and it has in addition to the obvious attributes, the foreign key ESSN which is the primary key of EMPLOYEE - we might re-name it as EESSN, though it is not necessary to do so since the same attribute name can be re-used as long as it is in different tables. The primary key of the new table R will be the foreign key attribute(s) plus any partial key that the weak entity has so e.g. ESSN+DependentName . Step 3. Binary 1:1 relationships - Identify the tables (S and T) corresponding to the participating entities. Choose one of the tables (say S) and include as foreign key the primary key of the other table. S should have total participation if possible in the relationship. All simple attributes of the relationship are included also in S. (Example- for MANAGES pick DEPARTMENT as the S table since every department must have a manager. We thus include as a foreign key the attribute e.g. MGRSSN which is a renaming of the primary key of EMPLOYEE i.e. SSN. Next we include an extra attribute to represent the StartDate MGRSTARTDATE ). Step 4. Regular 1:N relationships- Take the table S representing the N-side entity and include as foreign key the primary key of the table T that represents the other participating entity. (Examples: For WORKS_FOR we include the primary key of department as foreign key DNO in the EMPLOYEE table. For SUPERVISES we do the same thing - except that the primary key of a table is included as a foreign key in the same table!) Any simple attributes of the relationship are treated as in step 3. Step 5. Each M:N relationship type : create a new table S with foreign key attributes which are the primary keys of each of the participating entities. Their combination becomes the primary key of S. This table is not avoidable and is a relationship (or link) table as it corresponds directly to a relationship type. Include also any attribute of the relationship type as an attribute of S. For example: WORKS_ON is represented as a table WORKS_ON which include the primary keys of EMPLOYEE and project (renamed as) ESSN and PNO. We include the Hours attribute of the relationship type as HOURS in WORKS_ON, whose primary key will be the combination ESSN , PNO. Note: it is also possible to treat 1:1 and 1:N relationships like this (i.e. a separate table with in this case just one foreign key attribute as its primary key

BSIT 24 Basics of DBMS

67

- this will be the primary key of the entity table on the N side in the case of. This is useful if there will be few instances of the relationship type and thus there would be many null values in foreign keys.) Step 6. Each multi-valued attribute Ai : Let R be the table that has A as multi-valued attribute. Create a new table Si which includes the primary key (K) of R plus an attribute corresponding to Ai . The combination of K and Ai provides the primary key of Si For example we have the multi valued attribute Locations of DEPARTMENT in the ER diagram - to deal with this we create DEPT_LOCATIONS which includes the primary key of DEPARTMENT as a foreign key (DNUMBER) , as well as the attribute DLOCATION which is of course single valued, to represent the multi valued attribute Locations . The primary key of DEPT_LOCATIONS will be the combination {DNUMBER, DLOCATION}. Thus there will be a separate row in the table for each location of a department. Step 7. For each ternary or higher order (n-ary) relationship type R we create a new table S to represent it. We include as foreign keys each participating entity types primary key. The primary key of the new table is made up (as required) of the combination of these foreign key attributes. Comparison of terms: E.R. Model: entity type simple attribute composite attribute multivalued attribute Relational Model: entity table attribute set of simple atttributes table and foreign key(s)

1:1, or 1:N relationship foreign key (or relationship table) N:M n-ary relationship value set key attribute relationship table relationship table and n foreign keys domain primary key

3.18 SUMMARY
In this chapter we presented the modeling concepts of a high-level conceptual data model, the EntityRelationship (ER) model. The E-R data model considers the real world consisting of a set of basic objects (entities), attributes for these entities and relationships among these objects. We started by discussing the role that a high level data model plays in the database design process.

68

Chapter 3 - Entity-Relationship Model

There are five phases of the life cycle The database system development process, they are, requirement analysis, data modeling, implementation, testing, and maintenance. We then defined the basic ER model concepts of entities and their attributes. The basic object that the ER model represents is an entity, which is a thing in the real world with an independent existence. Each entity has attributesthe particular properties that describe it. Several types of attributes occur in the ER model: simple versus composite, single-valued versus multivalued, and Stored versus derived. We also discussed null values. We then discussed various type of keys, like, super key, key, composite key, candidate key and primary key. We gave the concept of relationship types and sets, and also weak entity types. We then discussed the ER model concepts. Cardinality ratios (1:1, 1:N, M:N for binary relationships) Entity-Relationship schemas can be represented diagrammatically as ER diagrams. We also discussed how ER diagram could be reduced into relational model, by converting it into set of tables.

Check your progress


1. 2. 3. 4. 5. 6. 7. 8. 9. 10. The basic object that the ER model represents is an _________. Each entity has ___________. _________ attributes can be divided into smaller subparts Attributes that are not divisible further are called ____________. An attribute which has a single value is called ________ attribute. An attribute which has more than one value is called _____ attribute. An entity type is represented in ER diagrams as a _________ box enclosing the entity type name. _________ attributes are displayed in double ovals. _________ can be used if we do not know the value of an attribute for a particular entity. The collection of all entities of a particular entity type in the database at any point in time is called an __________. Differences between entities in an entity set must be expressed in terms of attributes known as ________. A key consisting of two or more attributes is called as a ____________ A superkey for which no subset is a superkey is called a _____________ The primary key identifies every entity in the entity set __________ The partial key attribute is underlined with a _______________. The _________ notations are used for representing E-R diagrams. _________ Represent entity sets

11. 12. 13. 14. 15. 16. 17.

BSIT 24 Basics of DBMS

69

18. 19. 20. 21. 22. 23. 24. 25.

__________ Represent attributes _________ Represents relationships among entity sets _________ Link attributes to entity sets and entity sets to relationships For a binary relationship the cardinality ratio is ________ A weak entity set is indicated by a __________ box If a relationship is formed between three entities, then it is called ______ relationship. ________ indicates number of entities with which another entity can be associated via a relationship. More than one relationship is also called ____________

Answers
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. Entity. Attributes Composite Simple or atomic attributes. Single-valued Multivalued rectangular Multivalued Null Entity set. Keys. Composite key Candidate key. Uniquely dashed or dotted line graphical Rectangles Ellipses Diamonds Lines

70
21. 22. 23. 24. 25. two double-outlined ternary Mapping cardinalities. Nonbinary relationship

Chapter 3 - Entity-Relationship Model

Answer the following questions


1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. Explain briefly five phases of database development life cycle. What is an entity ? give examples. What are attributes ? Give examples for attributes. Explain briefly different types of attributes. What is the difference between single-valued and multivalued attributes? Give examples. What is the difference between simple and composite attributes ? Explain. Explain the concept behind stored versus derived attributes. What is intension and extension? Explain. Explain briefly various types of keys. Explain relationship type and sets giving examples. Explain the following terms i) Binary relationship iii) Degree of a relationship 12. 13. 14. 15. 16. ii) Ternary relationship iv) Cardinality ratio

Explain weak entity type with an example. Explain the symbols used to draw E-R Diagrams. With the help of an example explain the E-R diagram. Explain giving examples various mapping cardinalities. Explain the following with the help of examples. i) Weak entity ii) Nonbinary relationship iii) Ternary relationship

17. 18.

Explain the steps of E-R to Relational mapping. Imagine and give your own examples for the following mapping cardinalities

BSIT 24 Basics of DBMS

71

i) One-to-One ii) One-to-Many iii) Many-to-One iv) Many-to-Many 19. Draw an E-R diagram for modelling the relationship between students who learn many subjects taught by many teachers. We need to maintain information such as the students names, addresses and divisions, and the teachers names and addresses. What are the candidate keys in the following relationships i) Employee_number, Employee_name, Employee_DL_number, Employee_Passport_number, Address, Qualification, Phone_number ii) Part_Id, Part_name, Part_description, Unit_price, Colour, Make_of_Part iii) Student_Roll_Num, College_Id_Num, Student_Name, Adress, Year_of_study, Class_obt iv) Account_num, Customer_name, Customer_address, Phone_number. 21. What are the primary keys in the follwing relationships ? i) Movie_ID, Movie_name, Actor_name, Actress_name, Director_name ii) Part_ID, Part_desc, Unit_Price, Quantity, Colour iii) Player_name, Runs_made, Match_number, Ground, Date iv) Roll_number, Student_name, Subject, Marks, Grade v) Emp_ID, Emp_name, age, salary 22. 23. Is a 1:N relationship is same as the N:1 relationship ? Why ? Here is the Company ER Schema Mapped to Relation Schema. Identify the primary key for each of these tables and underline them.

20.

EMPLOYEE(FNAME, MINIT, LNAME, SSN, BDATE, ADDRESS, SEX, SALARY, SUPERSSN,DNO) DEPARTMENT(DNAME, DNUMBER, MGRSSN, MGRSTARTDATE) DEPT_LOCATIONS(DNUMBER, DLOCATION) PROJECT(PNUMBER , PNAME, PLOCATION,DNUM ) WORKS_ON(ESSN, PNO, HOURS) DEPENDENT(ESSN, DEPENDENT_NAME, SEX, BDATE, HOW_RELATED)

72

Chapter 3 - Entity-Relationship Model

24.Construct an E-R diagram for a car-insurance company that has a set of customers, each of whom owns one or more cars. Each car has associated with it zero or any number of recorded accidents. 25.Construct an E-R diagram for a hospital with a set of patients and a set of medical doctors. Associate with each patient a log of the various tests and examinations conducted. 26.Reduce the following E-R diagram into Relational data model (Note: Some of the symbols in the diagram are not explained here, please go through reference books for these.)

BSIT 24 Basics of DBMS

73

References

Bipin C. Desai, An Introduction to Database System, Galgotia Publications, New Delhi. Elmasri & Navathe, Fundmentals of Database systems, Addision Wesley. Rajesh Narang, Database Management Systems, Preentice-Hall,New Delhi. Atul Kahate, Introduction to Database Management Systems.

74

Chapter 3 - Entity-Relationship Model

26.The following figure shows an E-R Schema diagram for the company database. Reduce this into Relational data model.

Dname Fname Minit Name Ssn Bdate Sex EMPLOYEE 1 Managers Startdate DEPARTMENT Lname Address Salary WorksFor 1 Dnumber Dlocation

Supervisior

Super -visee Supervision 1

Hours Workson
DEPENDENTS -OF

CONTROLS

N DEPENDENT Name Sex Pnumber

Project

Birthdate Relationship

Pname Plocation

Chapter 4

Dat a Mo d els

4.0 OBJECTIVES

n this chapter, you will be able to learn,

Data models Relational data model Network data Model Hierarchical data Model

4.1 INTRODUCTION
The architecture of database systems consists of three different views: internal, conceptual and external. The information stored in the database appears to the user at external level according to the approach used for storing the data in the database. There are three different approaches for storing the data. The relational, network and hierarchical approaches can be used in a data base management system to represent information to the users. Definition of Data Model: It is defined as an integrated collection of concepts for describing and manipulating data, relationships between data and constraints on the data in an organization. Data Model is a set of concepts to describe the structure of a database, and certain constraints that the database should obey. Data Model Operations for specifying database retrievals and updates by referring to the concepts of the data model.

BSIT 24 Basics of DBMS

75

76 4.2 CATEGORIES OF DATA MODELS

Chapter 4 - Data Models

Conceptual (high-level, semantic) data models: Provide concepts that are close to the way many users perceive data. (Also called entity-based or object-based data models.) Physical (low-level, internal) data models: Provide concepts that describe details of how data is stored in the computer. Implementation (record-oriented) data models: Provide concepts that fall between the above two, balancing user views with some computer storage details.

4.3 RELATIONAL MODEL


A database based on the relational model developed by E.F. Codd. A relational database allows the definition of data structures, storage and retrieval operations and integrity constraints. In such a database the data and relations between them are organised in tables. Each table has a unique name. A table is a collection of records and each record in a table contains the same fields. A substantial theory has been developed for relational database. Consider a database consisting of three tables namely, Title, Publisher, and Publisher_Title and the data in them is in the relational form. AS we know there are three tables. Each table has five columns. These represent the attributes of respective entity types. The tables along with their columns are as follows Title( Title_ID, Title_name, T_Price, T_discount, T_category) Publisher( Pub_ID, Pub_name, Pub_city) Publisher_Title( Pub_ID, Title_ID, Num_copies_sold) The contents of these tables are Title table

Title_ID T1 T2 T3 T4 T5 T6

Title_name, Oracle C++ C DB2 JAVA UNIX

T-Price, 440 320 280 380 480 600

T_siscount 15% 10% 10% 10% 20% 20%

T_category RDBMS PL PL RDBMS PL OS

Note: It is the view of data that enables the user to apply the powerful operations and expressions of relational algebra to data manipulations.

BSIT 24 Basics of DBMS

77
Publisher Table Pub_ID Pub_name Pub_city

1000 1500 2000 2500

Tata_Mcgraw Prentice_Hall Pearson Mcgraw-Hill

Bangalore New delhi Mumbai Chennai

Publisher_Title Table Pub_ID Title_ID Num_copies_sold

1000 1500 2000 1500 1000 1500 2000 1000

T1 T2 T1 T4 T5 T6 T4 T4

600 800 400 700 600 900 500 300

4.3.1 Properties of Relational Tables


Values Are Atomic Each Row is Unique Column Values Are of the Same Kind The Sequence of Columns is Insignificant The Sequence of Rows is Insignificant Each Column Has a Unique Name

Certain fields may be designated as keys, which means that searches for specific values of that field will use indexing to speed them up. Where fields in two different tables take values from the same set, a

78

Chapter 4 - Data Models

join operation can be performed to select related records in the two tables by matching values in those fields. Often, but not always, the fields will have the same name in both tables. For example, an orders table might contain (customer-ID, product-code) pairs and a products table might contain (productcode, price) pairs so to calculate a given customers bill you would sum the prices of all products ordered by that customer by joining on the product-code fields of the two tables. This can be extended to joining multiple tables on multiple fields. Because these relationships are only specified at retreival time, relational databases are classed as dynamic database management system. The RELATIONAL database model is based on the Relational Algebra.

4.3.2 Relational Rules


There are three major rules for relational database which when applied to tables in a relational database. Rule#1 Each table in the relational database contains only one type of record Rule#2 Every record in the table has the same number of fields. Field names are unique. No table can contain a variable number of fields or a set of repeating fields. Rule#3 Each table has a unique identifier( Which acts as primary key). The primary key identifies each row in the table uniquely. These fields(primary key) are used to link related data from different fields. Relational Model proposed in 1970 by E.F. Codd (IBM), first commercial system in 1981-82. Now implemented in several commercial products (ORACLE, SYBASE, INFORMIX, CA-INGRES).

4.3.3 Advantages and Disadvantages of Relational Model


The relational database is a single data repository in which data independence is maintained. The relational model has advantages and disadvantages.

Advantages

Structural independence: The relational database model achieves the structural independence, means it is possible to make changes in the database structure without affecting the DBMSs ability to access the data. Therefore data access paths are irrelevant to relational database designers, programmers and end users.

BSIT 24 Basics of DBMS

79

Improved conceptual simplicity : The relational database model is much simpler in the conceptual level compare to hierarchical and network data models. We can concentrate only on the logical view of the data ignoring the physical data storage. Easier database design, implementation, management and use : It is easier to design and manage relational database because it achieves both data independence and structural independence. Ad hoc query capability : SQL can be used here as a query language to obtain the required information from the database by writing queries. A powerful database management system : The system complexity is hidden from the both database designers and the end user in the case of relational database.

Disadvantages

Substantial hardware and system software overhead : More powerful computers are required to perform RDBMS-assigned tasks. It needs software also in the form of operating system as well as for applications. Poor design and implementation is made easy: If database is designed without giving much thought to what it should contain, leads to improper design. Lack of proper design tends to slow the system down and to produce the data anomalies.

4.4 NETWORK MODEL


The popularity of the network data model coincided with the popularity of the hierarchical data model. Some data were more naturally modeled with more than one parent per child. So, the network model permitted the modeling of many-to-many relationships in data. In 1971, the Conference on Data Systems Languages (CODASYL) formally defined the network model. The basic data-modeling construct in the network model is the set construct. A set consists of an owner record type, a set name, and a member record type. A member record type can have that role in more than one set, hence the multiparent concept is supported. An owner record type can also be a member or owner in another set. The data model is a simple network, and link and intersection record types (called junction records by IDMS) may exist, as well as sets between them. Thus, the complete network of relationships is represented by several pairwise sets; in each set some (one) record type is owner (at the tail of the network arrow) and one or more record types are members (at the head of the relationship arrow). Usually, a set defines a 1:M relationship, although 1:1 is permitted. The CODASYL network model is based on mathematical set theory. The first one to be implemented by Honeywell in 1964-65 (IDS System). Adopted heavily due to the support by CODASYL (CODASYL - DBTG report of 1971). Later implemented in a large variety of systems - IDMS (Cullinet - now CA), DMS 1100 (Unisys), IMAGE (H.P.), VAX -DBMS (Digital).

80 4.4.1 Basic structure

Chapter 4 - Data Models

In many respects the network database model resembles the hierarchical database model. For example, as in the hierarchical model, the user perceives the network database as a collection of records in 1:M relationships. But unlike hierarchical data model, network model allows a record to have more than one parent. Using network database terminology, a relationship is called a set. Each set is composed of at least two record types: an owner record that is equivalent to the hierarchical models parent, and a member record that is equivalent to the hierarchical models child. A set represents a 1:M relationship between the owner and the member. In the relational model, the data and the relationships among data are represented by a collection of tables. The network model differs from the relational model in that data are represented by collections of records, and relationships among data are represented by links.

4.4.2 Example for Network model


Consider a database representing a customer_account relationship in a banking system. There are two record types, customer and account. If we define the customer record type and account record type using Pascal like notation, then The customer record type is as follows type customer = record customer-name: string; customer-street: string; customer-city: string; end The account record type can be defined as follows type account = record Account-number: string Balance: integer; end

BSIT 24 Basics of DBMS

81

The data structure diagram for this is

The data structure diagram for this is CustomerCustomer Customer-street Customer-city depositor AccountThe sample database in the network model is AAA800 1000 2000 account balance

Rajan Raman Raghu

MG road MG road IG road

Mysore Mysore Mysore

4.4.3 Advantages and Disadvantages of Network Model


Advantages

Conceptual simplicity: Conceptual view of the database is simple. This provides design simplicity. Handles more relationship types: M:N relationships are easier to implement in this model, than in he hierarchical database model. Data access flexibility: Data access is more flexible compare to hierarchical database, an application can access owner record and all the member records within a set. Promotes database integrity: the user must define owner record type and then the member, this enforces database integrity. Data independence: The application provides written can be isolated from complex physical storage details. This achieves data independence. Conformance to standards: standards in the form DDL and DML can be imposed onto this data model. This greatly facilitates database administration and portability.

82
Disadvantages

Chapter 4 - Data Models

Not user - friendly: The network model is not a design for user-friendly system and is a highly skill-oriented system. System complexity: The network database was not designed to produce a user friendly system. The data are accessed one record at a time, this leads to system complexity. Lack of structural independence: Even though the network data model achieves data independence, it does not produce structural independence, where changes to the structure of the database all application programs existing on to the data must be revalidated before they can access the database. Operational anomalies: Since network user pointers for navigation so its implementation becomes quite complex.

4.5 Hierarchical model


The hierarchical data model organizes data in a tree structure. There is a hierarchy of parent and child data segments. This structure implies that a record can have repeating information, generally in the child data segments. Data in a series of records, which have a set of field values, attached to it. It collects all the instances of a specific record together as a record type. These record types are the equivalent of tables in the relational model, and with the individual records being the equivalent of rows. To create links between these record types, the hierarchical model uses Parent Child Relationships. These are a 1:N mapping between record types. This is done by using trees, like set theory used in the relational model, borrowed from maths. For example, an organization might store information about an employee, such as name, employee number, department, salary. The organization might also store information about an employees children, such as name and date of birth. The employee and children data forms a hierarchy, where the employee data represents the parent segment and the children data represents the child segment. If an employee has three children, then there would be three child segments associated with one employee segment. In a hierarchical database the parent-child relationship is one to many. This restricts a child segment to having only one parent segment. Hierarchical DBMSs were popular from the late 1960s, with the introduction of IBMs Information Management System (IMS) DBMS, through the 1970s. Hierarchical Data Model is implemented in a joint effort by IBM and North American Rockwell around 1965. Resulted in the IMS family of systems. The most popular model. Other system based on this model: System 2k (SAS inc.)

4.5.1 Basic structure


In the network model, the data are represented by collections of records, and relationships between data are represented by links. This structure holds for the hierarchical model as well. The only difference is that, in the hierarchical model, records are organized as collections of trees, rather than arbitrary graphs.

BSIT 24 Basics of DBMS

83

Hierarchical data model is a model comprising records stored in a general tree structure with one root record type that has zero or more dependent record types.

84

Chapter 4 - Data Models

In general tree structure, there are two possible methods of accessing all the nodes (record types) within the tree. Pre-order traversal ~ access the root first and then proceed down the tree accessing the subtrees in order from left to right. Post-order traversal ~ start access at the bottom and proceed upwards accessing the subtrees in order from left to right and finishing with the root.

4.5.2 Advantages and Disadvantages of Hierarchical Model


Advantages

Conceptual simplicity: Hierarchical model becomes easier to view the database conceptually, and hence making its design process simpler. The relationship between various layers is logically simple. Database security: Database security is provided and enforced by the DBMS. Security is also enforced uniformly through out the system. Data independence: Data independence is nothing but when a change in data type takes place it will be automatically cascaded throughout the database by the DBMS. This feature eliminates the need to make changes in the program segments that references the changed data type. Database integrity: The hierarchical database promotes database integrity, because, the child segment is always referenced to its parent. Given the parent/child relationship, there is always link between the parent segment and its child segments(s). Efficiency: The hierarchical model is very efficient when a database contains a larger volume of data in 1:M relationships.

Disadvantages

Complex implementation: The database designer and programmer must have detailed knowledge of the physical data storage characteristics. Therefore the implementation a database design may be very complicated. Difficult to manage: Any changes in the database structure, such as relocation of segments, require changes to be made to all the application programs that access the database. Lacks structural independence: structural independence means when a change to database structure occurs it should not affect the DBMSs ability to access the data. But hierarchical

BSIT 24 Basics of DBMS

85

model does not support this, as any changes occur to the database the corresponding application programs get affected.

Application programming and use complexity: How the data is stored as well as accessed must be understood by the programmers, this restricts the programmer choice, for easy programming. Implementations limitations: Even though 1:1, 1:N relationships can be easily implemented, it is difficult to implement M:N relationship in this model. Lack of standard: There is precise set of standards which all can follow, as far as implementation is concerned. Due to this portability was limited, where it was difficult to move from one hierarchical DBMS to another.

Comparison of the three modesl in Tabular Form: Sl. No 1. Hierarchial Data model Network Data model Relational Data model

2.

3.

4.

5.

6.

Relationship between records Relationship between records Relationship between records is is of the parent child type is expressed in the form of represented by a relation that contains a key for each record (Trees) pointers or links (shapes) involved in the relationship. Many to may relationship Many-to-many relationship Many-to-many relationship can be easily implemented. cannot be expressed in this can also be implemented model. It is a simple, straightforward Record relationship Relationship implementation is and natural method of implemented is very complex very easy through the use of a key or composite key field(s). implementing record due to the use of pointers. relationships. This type of mode is useful Network model is useful for Relational model is useful for only when there is some representing such records representing most of the real hierarchical character in the which have many to may world objects and relationship among them. database. relationships In order to represent links In Network model also the Relational model does not among records, pointers are record relations are physical. maintain physical connection among records. Data is used. Thus relations among organized logically in the form records are physical. of rows and columns and stored in table. Searching for a records is very Searching a record is easy A unique indexed key field is difficult since one can retrieve since there are multiple access used to search for a data elements. a child only after going through paths to a data elements. its parent record.

86 4.6 OTHER IMPORTANT MODELS

Chapter 4 - Data Models

In addition to the above three models, we can also add other two important models, they are

Object/Relational Model Object-oriented Model

4.6.1 Object/Relational Model


Object/relational database management systems (ORDBMSs) add new object storage capabilities to the relational systems at the core of modern information systems. These new facilities integrate management of traditional fielded data, complex objects such as time-series and geospatial data and diverse binary media such as audio, video, images, and applets. By encapsulating methods with data structures, an ORDBMS server can execute complex analytical and data manipulation operations to search and transform multimedia and other complex objects. As an evolutionary technology, the object/relational (OR) approach has inherited the robust transactionand performance-management features of it s relational ancestor and the flexibility of its object-oriented cousin. Database designers can work with familiar tabular structures and data definition languages (DDLs) while assimilating new object-management possibilities. Query and procedural languages and call interfaces in ORDBMSs are familiar: SQL3, vendor procedural languages, and ODBC, JDBC, and proprietary call interfaces are all extensions of RDBMS languages and interfaces. And the leading vendors are, of course, quite well known: IBM, Inform ix, and Oracle.

4.6.2 Object-oriented Model


Object DBMSs add database functionality to object programming languages. They bring much more than persistent storage of programming language objects. Object DBMSs extend the semantics of the C++, Smalltalk and Java object programming languages to provide full-featured database programming capability, while retaining native language compatibility. A major benefit of this approach is the unification of the application and database development into a seamless data model and language environment. As a result, applications require less code, use more natural data modeling, and code bases are easier to maintain. Object developers can write complete database applications with a modest amount of additional effort. In contrast to a relational DBMS where a complex data structure must be flattened out to fit into tables or joined together from those tables to form the in-memory structure, object DBMSs have no performance overhead to store or retrieve a web or hierarchy of interrelated objects. This one-to-one mapping of object programming language objects to database objects has two benefits over other storage approaches:

BSIT 24 Basics of DBMS

87

it provides higher performance management of objects, and it enables better management of the complex interrelationships between objects. This makes object DBMSs better suited to support applications such as financial portfolio risk analysis systems, telecommunications service applications, world wide web document structures, design and manufacturing systems, and hospital patient record systems, which have complex relationships between data.

4.7 SUMMARY
In this chapter we studied three important data models used to describe the structure of a database. Data Model is a set of concepts to describe the structure of a database, and certain constraints that the database should obey. Data Model Operations for specifying database retrievals and updates by referring to the concepts of the data model. There are three important categories of data models, they are, Conceptual (high-level, semantic) data models, Physical (low-level, internal) data models, and, Implementation (record-oriented) data models. The data models we studied are from category three and they are

Relational data model Network data model Hierarchical data model

A database based on the relational model developed by E.F. Codd. A relational database allows the definition of data structures, storage and retrieval operations and integrity constraints. In such a database the data and relations between them are organized in tables. Each table has a unique name. The network model differs from the relational model in that data are represented by collections of records, and relationships among data are represented by links. In the hierarchical model, records are organized as collections of trees, rather than arbitrary graphs. We also discussed examples for each of these data models and also the advantages, disadvantages associated with these models. In addition to these we also discussed other two important data models object/relational model and object-oriented model.

Check Your Progress


1. 2. Data Model is a set of concepts to describe the _________of a database The relational model developed by __________

88
3. 4. 5. 6. 7. 8.

Chapter 4 - Data Models

In relational database the data and relations between them are organized in ________ Each table in the relational database contains only one type of _________ a relationship in network model is called a __________ A set represents a _______ relationship between the owner and the member. In network model relationships among data are represented by ________ The hierarchical data model organizes data in a __________

Answers
1. 2. 3. 4. 5. 6. 7. 8. Structure E.F. Codd. Tables Record Set 1:M Links tree structure.

Answer the following questions


1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. What is a data model? Explain Explain three different categories of data models. Explain the relational data model-giving example. List the properties of relational tables. List and explain the rules of relational model Explain advantages and disadvantages of relational model Explain the network data model with an example. Explain advantages and disadvantages of network data model Explain the hierarchical data model with an example. Explain advantages and disadvantages of hierarchical data model. Give the comparison of the three data models.

Chapter 5

Dat ab ase an d File Or g an izat io n

5.0 OBJECTIVES

n this chapter, you will learn

Hashing Hashing Techniques Internal Hashing External Hashing Dynamic Expansion Using Hashing Techniques

5.1 INTRODUCTION
In chapter 2, we have emphasized the fundamentals of records, file storage and structure. In chapter 4, we view the database, in the relational model as a collection of the tables. The logical model of the database is the correct level of the database users to focus on. We have described the various methods of storing and organization of data like sequential and index sequential file organization in chapter 2. In this chapter we are narrating hash file organization. One of the disadvantage of sequential file organization is that it must access an index structure to locate data, or at most binary search, and that results in more input-output operations. File organization based on techniques of hashing allows us to avoid accessing an index structure. Hashing also provides a way of constructing indices. We are going to study file organization and indices based on hashing in the following sections.

BSIT 24 Basics of DBMS

89

90 5.2 HASHING TECHNIQUES

Chapter 5 - Database and File Organization

Hash file organization, provides very fast access to records on certain search conditions. The search conditions must be an equality condition on single field, called hash field of the file. In many cases, the hash field is also a key field of the file and it is called a hash key. The concept of hashing is to provide a function h called a hash function, that is applied to the hash field value of the record it stored. A search for the record within a block can be carried out in main memory buffer. For most records we need only a single block access to retrieve that record. Hashing is also used as an internal search structure within a program, whenever exclusively using the value of one field accesses a group of records. We describe the use of hashing for internal files, then we show how it is modified to store external files on the disk. We are also discussing techniques for extending hashing to dynamically growing files.

5.3 INTERNAL HASHING


Hashing is implemented as a hash table through the use of an array of records. Suppose that the array index is from 0 to M-1. Figure 5.1 shows the array of M positions for using in internal hashing. Figure 5.1 consists of M slots whose addresses correspond to the array indexes. We choose a hash function that transforms the hash field value into an integer between 0 and M-1, one common hash function is the h(K)= K mod M function, which computes the reminder of an integer hash field value K after division by M, this value is then used for the address.

Figure 5.1 : Array of M positions for internal hashing.

BSIT 24 Basics of DBMS

91

Non integer hash field value can be transformed into integers before the mod function is applied. For character strings, the numeric (ASCII) codes associated with characters can be used in the transformation. Other hashing functions can also be used. One technique called folding, involves applying arithmetic function such as addition or logical function such as exclusive or to different portions of the hash field value to calculate the hash address, another technique involves picking some digits of the hash field value. For example, the third, fifth and eighth digits form the hash address. The problem with most hashing functions is that they do not guarantee that distinct values will hash to distinct addresses, because the hash field space the number of possible values a hash field can take is usually much larger than the address space the number of available addresses of records. The hashing function maps the hash field space to the address space. A collision occurs when the hashes field value of record that is being inserted to an address that already contains a different record. In this situation we must insert the new record in some other position, since its hash address is occupied. The process of finding another position is called collision resolution. There are several methods of collision resolution, including the following.

Open addressing: Proceeding from the occupied position specified by the hash address the program checks the subsequent positions in order until an unused position is found. Chaining: For this method, various overflow locations are kept, usually by extending the carry with number of overflow positions. A collision is resolved by placing the new record in an unused overflow location and setting the pointer of the occupied hash address location to the address of that overflow location. A linked list of overflow records for each hash address is maintained as shown in figure 5.2 Multiple Hashing: The program applies a second hash function if the first function results in a collision. If another collision results, the program uses open addressing or applies a third hash function and then uses open addressing if necessary.

Each collision resolution method requires its own algorithms for insertion, retrieval, and deletion of records. The algorithm for chaining is the simplest. Deletion algorithms for open addressing are rather tricky. The goal of good hashing function is to distribute the records uniformly over the address space, so as to minimize collisions while not having many unused locations.

92

Chapter 5 - Database and File Organization

Figure 5.2 : Collision resolution by chaining records


Overflow pointer refers to position of next record in linked list Null Pointer = -1

5.4 EXTERNAL HASHING


Hashing for disk files is called external hashing. To suit the characteristics of disk storage, the target address space is made of buckets, each of which holds multiple records. A bucket is either one disk block or a cluster of contiguous blocks. The hashing functions map a key into a relative bucket number, rather than assign an absolute block address to the bucket. A table maintained in the file header converts the buckets because, as many records can fit in a bucket, can hash to the same bucket without causing problems. However, we must make provisions for the case, where a bucket is filled to the capacity and a new record being inserted hashes to that bucket. We can use variations of chaining in which a pointer is maintained in each bucket to a linked list of overflow records for the records as shown in figure 5.3.

BSIT 24 Basics of DBMS

93

Figure 5.3 Matching Bucket number to disk block address

The pointers in the linked list should be record pointers, which include both a block address and a retrieval record position within a block. Hashing provides the fastest possible access for retrieving an arbitrary record given the value of its hash field. Although most good hashing functions do not maintain records in order of hash field values, some functions called order preserving. A simple example of an order preserving hash function is to take the leftmost 3 digits of an invoice number field as the hash address and keep the records sorted by invoice within each bucket. The hashing scheme described is called static hashing because a fixed number of buckets M is allocated. This can be a serious drawback for dynamic files. Suppose that we allocate m bucket for the address space and we let m be the maximum number of records that can fit in the allocated space. If the number of records is less than (m*M), then a lot of space is unused. On the other hand, if the number of records increases to substantially more than (m*M), numerous collisions will occur and retrieval will slowdown because of the long list of overflow records. In either case, we may have to change the number of blocks M allocated and then use new hashing functions to redistribute the records. This reorganization can be quite time consuming for large files. New dynamic file organization based on hashing allows the number of buckets to vary dynamically with localized reorganization. When using external hashing, searching for record given a value of some field other than hash field is as expensive as in the case of unordered file. Record deletion can be implemented by removing records from the bucket, if the bucket has an overflow chain and the record to be deleted is already in overflow. This is done easily by maintaining a linked list of unused overflow locations.

94

Chapter 5 - Database and File Organization

Modifying a record field value depends on two factors (1) the search condition to locate the record and (2) the field to be modified. If the search condition is an equality comparison on a hash field, we can locate the record efficiently by using the hashing function, otherwise we must do linear search. Changing the record and rewriting in the same bucket can modify a non-hash field. Modifying the hash field means that the record can move to another bucket, which requires deletion of the old record followed by insertion of the modified record.

5.5 DYNAMIC FILE EXPANSION USING HASHING TECHNIQUES


Major drawback of static hashing schemes just discussed is that the hash address space is fixed. Hence it is difficult to expand or shrink the file dynamically. The dynamic in nature schemes is extendable hashing and linear hashing. Extendable hashing stores an access structure in addition to the file and hence it is similar to indexing. The main difference is that the access structure based on the values that result after applications of the hash function to the search field. The second technique called linear hashing , does not require additional access structure. It allows a hash file to expand and shrink its number of buckets dynamically without needing a directory. Suppose that the file stores with MN buckets numbered 0,1,2 M-1 and used the mod hash function n(K) = K and Mi. This hash function is called initial hash function hi. Overflow because of collision is still needed and can handle by maintaining individual overflow chairs for each bucket. Other data structures can be used for primary file organizations. For example, if both the record size and the number of records in a file are small, some DBMS offer the option of a B Tree data structure as the primary file organization.

5.6 SUMMARY
We have described the various methods of storing and organization of data like sequential and index sequential file organization in chapter 2. One of the disadvantage of sequential file organization is that it must access an index structure to locate data, or at most binary search, and that results in more inputoutput operations. File organization based on techniques of hashing allows us to avoid accessing an index structure. Hashing also provides a way of constructing indices. In this chapter we discussed about database and file organization. We began this chapter by discussing hashing functions to organize the file. Hashing which provides very fast access to any arbitrary record of a file, given the value of its key. Hashing is implemented as a hash table through the use of an array of

BSIT 24 Basics of DBMS

95

records. Many hashing functions can be used for this purpose. A collision occurs when the hashes field value of record that is being inserted to an address that already contains a different record. There are several methods of collision resolution, they are, open addressing, chaining, multiple hashing. Hashing for disk files is called external hashing. The most suitable method for external hashing is bucket techniques, with one or more configured blocks corresponding to each bucket. Collisions causing bucket overflow are handled by chaining. Access on non-hash field is slow, and so is ordered access of the records on any field. We then discussed two hashing techniques for files that grow and shrink in the number of records dynamically namely, extendible and linear hashing.

Check Your Progress


1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Hash file organization, provides very _______ access to records A search for the record within a block can be carried out in ________buffer. Hashing is implemented as a _______through the use of an array of records The hashing function maps the hash field space to the __________ The process of finding another position when collision occurs is called _______ Hashing for disk files is called ____________ Each collision resolution method requires its own __________for insertion, retrieval, and deletion of records The goal of good hashing function is to distribute the records uniformly over the ________ Major drawback of static hashing schemes is that the hash address space is ______ ________ hashing does not require additional access structure

Answers
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. fast main memory hash table address space collision resolution. external hashing algorithms address space fixed linear

96
Answer the following questions
1. What is the difference between static files and dynamic files? 2. 3. 4. 5. 6. 7. 8. 9. 10.

Chapter 5 - Database and File Organization

What is hashing? Describe different hashing with suitable examples. Explain hashing technique called folding with an example. What is collision? How to resolve it. What is meant by collision resolution? Explain. Explain the methods available for collision resolution. Briefly explain the collision resolution in hashing. What is a bucket? Explain. What are the factors on which modifying a record field depends? Discuss the techniques for allowing hash file to expand and shrink dynamically. What are the advantages and disadvantages of each?

Chapter 6

Dat ab ase Se cu r i t y

6.0 OBJECTIVES

n this chapter, you will learn,

Security Security types Issues of security

6.1 INTRODUCTION
Techniques are used for protecting the database against persons who are not authorized to access database either part of a database or whole database. Security issues and overview of topics are covered in this chapter. Methods that are used to grant and cancel privileges in relational data base system are discussed. Those methods are referred to as discretionary access control and mandatory access control. The discretionary access control access for enforcing the multilevel security levels and mandatory access control mainly concerned in database system security. Security of statistical database problems are also presented.

6.2 SECURITY TYPES


Database security is a broad area that addresses various issues, some of them are as follows: BSIT 24 Basics of DBMS

97

98

Chapter 6 - Database Security

Legal and ethical issues regarding right to access certain information. Some information may be purely private and cannot be accessed legally by unauthorized persons. Policy issues at the governmental, institutional or corporate levels are confidential and most important information should not be made publicly available for example, personal medical records and credit card ratings. System related issues: at system levels various security functions should be enforced, for example whether a security function should be handled at the physical hardware level, DBMS level, Operating system level. In some of the organizations it is necessary to identify multiple security levels and categorize the data and users based on these classifications for example confidential, secret, top secret and unclassified. The security policy of the organization with respect to permitting access to various classifications of data must be enforced.

DBMS must provide techniques to enable certain users or group of users to access selected portions of the database without gaining access to the rest of the database. This is most important when a large integrated database is to be used by many different users within the same organization. For example, sensitive information such as employee salaries, or performance reviews should be kept confidential, so that only restricted users can access it. A DBMS typically includes a database security and authorization subsystem that is responsible for ensuring the security of portion of a database against unauthorized access. It is referred to as two types of database security mechanisms.

Discretionary Security mechanisms: These are used to grant privileges for users to read, write, delete and update a specific data files or records or fields. Mandatory security mechanisms: these are used to enforce multi-level security by classifying data and users into various security levels and implementing suitable security policy of the organization. For example, a typical security policy is to permit users at a certain classification level to see only the data items at the users own classification level.

6.3 ISSUES OF SECURITY


a) A security issue consists of preventing an unauthorized person from using a system itself either to obtain information or to make illegal changes on a portion of the database. The security mechanism of a DBMS must include provisions for restricting access to the database as a whole. This function is called access control and is handled by creating users accounts and passwords to control the log-in process by DBMS.

BSIT 24 Basics of DBMS

99

b) Security problem associated with a database is about controlling the access to statistical database, which is used, for providing statistical information based on different criteria. For example, a database for population statistics may provide statistics based on age groups, income levels, size of household, education levels etc. Statistical database uses such as market research firms or government statisticians are allowed to access the database to retrieve statistical information about population but not to access the detailed confidential information on specific individuals. Security for statistical database must ensure that information about individuals cannot be accessed. It is possible to deduce certain facts concerning individuals from queries that involve only summary statistics on groups. Consequently this must not be permitted either. c) Data Encryption: It is also an important security issue, which is used to protect sensitive data such as card numbers that are being transmitted via some type of communication network. Encryption can be used to provide additional protection for sensitive portions of a database. The data is encoded by using some coding algorithm. An unauthorized user who accesses encoded data will have difficulty deciphering, but authorized users are given decoding or decrypting algorithm to decipher data.

6.3.1 DBA and Security


Database administrator (DBA) is the central authority for managing a database system. DBA is responsible for granting privileges to users and classifying users and data in accordance with the policy of the organization. DBA has account in the DBMS, sometimes is called a super user account which provides powerful capabilities that are not made available to regular database accounts and users. DBA privileged commands include commands for granting and canceling privileges to individual accounts, users or user groups and for performing the following types of actions. 1. Account Creation: This action creates a new account and password for a group of users to enable them to access DBMS. 2. Privilege Granting: This action permits the DBA to grant creating privileges to certain accounts. 3. Privilege revocation: This action permits the DBA to cancel certain privileges given to certain users. 4. Security level assignment: This action consists of assigning user accounts to the appropriate security classification levels. The DBA is responsible for the overall security of the database system.

100 6.4 PROTECTION FOR DATA BASE ACCESS

Chapter 6 - Database Security

When a user needs to access a database system, he first applies for an account. Thus DBA will create a new account number and password for the user if there is legitimate need for accessing the database. The user must login to the DBMS by entering the account number and password whenever database access is needed. The DBMS checks that the account number and password is valid, if they are, the user is permitted to use the DBMS and to access the database application programs can also be considered as users and can be required to supply passwords. It is straightforward to keep track of database users and their accounts and passwords by creating an encrypted table or file with two fields account number and password. These tables can be easily maintained by the DBMS. Whenever a new account is created, a new record is inserted into the table. When an account is cancelled the corresponding record must be deleted from the table. The database system must also keep track of all operations on the database that are applied by a certain user throughout each login session, which consists of the sequence of database interaction that a user performs from the time of logging in to the time of logging off. When a user logs in the DBMS can record the users account number and associate with it the terminal from which the user logged in. It is particularly important to keep track of the update operations that are applied to the database so that, if the database is tampered with, the DBA can find out which user did the tampering. To keep a record of all updates applied to the database and of the particular user who applied each database. The system log includes an entry for each operation applied to the database that may be required for recovery from truncation failure or system crash. If any tampering with the database is suspected a database audit is performed, which consists of reviewing the log to database during a certain time period when an illegal or unauthorized operation is found, the DBA can determine the account number used to perform this operation.

6.5 DISCRETIONARY ACCESS CONTROL


Discretionary control is based on the granting and revoking of privilege. The main idea is to include additional statements in the query language that allow the DBA and selected users to grant and cancel privileges.

Types of Discretionary Privileges


There are two levels for assigning privileges to use the database system. 1. The account level: at this level the DBA specifies the particular privileges that each accounts hold independently of the relations in the database. 2. The relation(table) level: At this level, we can control the privileges to access each individual relation or view the database.

BSIT 24 Basics of DBMS

101

6.6 MANDATORY ACCESS CONTROL FOR MULTILEVEL SECURITY


Granting and revoking privileges by discretionary access control techniques on relations has traditionally main security mechanism for relational database systems. Additional security policy is needed that classified data users based on security classes. Those security are unable to provide by discretionary access control techniques. The approach known as mandatory access control, it is important to note that most commercial DBMSs currently provide mechanism only for discretionary access control, however, the need for multilevel security exists in government, military and intelligent applications.

6.7 STATISTICAL DATA BASE SECURITY


Statistical databases are used mainly to produce statistics on various populations. The database may contain confidential data on individuals, which should protected from user access. However, users are permitted to retrieve statistical information on the population, such as averages, sums, counts, maximums, minimums and standard deviations. Statistical queries involve applying statistical functions to a population at tuples. For example, we may want to retrieve the number of individuals in a population or the average income in the population. However, statistical users are not allowed to retrieve individual data, such as the income of a person. Statistical database security techniques must prohibit the retrieval of individual data. This can be controlled by prohibiting queries that retrieve values and by allowing only queries that retrieve values and by allowing only queries that involve statistical aggregate functions such as COUNT, SUM, MINI, MAX, AVERAGE, and STANDARD DEVIATION. Such queries are some times called statistical queries. In some cases, it is possible to infer the individual tuples from a sequence of statistical queries. This is particularly true when the conditions result in a population consisting of a small number of tuples.

6.8 SUMMARY
In this chapter we discussed important techniques for enforcing security in database systems. Security enforcement deals with controlling access to database system as a whole and controlling authorization to access specific portions of a database. Assigning accounts with passwords to users usually does the former. The latter can be accomplished by a system of granting and revoking privileges to individual accounts for accessing specific pads of the database. This approach is generally referred to as discretionary access control. Then we gave an overview of mandatory access control mechanism that enforce multilevel security. Finally, we discussed the problem of controlling access to statistical databases to

102

Chapter 6 - Database Security

protect the privacy of individual information while concurrently providing statistical access to population of records.

Check Your Progress


1. 2. 3. 4. __________is the central authority for managing a database system. DBA has account in the DBMS, sometimes is called a ____________ ________ is responsible for the overall security of the database system The user must _______ to the DBMS by entering the account number and password whenever database access is needed. Discretionary control is based on the granting and revoking of _________ Statistical databases are used mainly to produce statistics on various ___________

5. 6.

Answers
1. 2. 3. 4. 5. 6. Database administrator (DBA) super user account DBA Login Privilege populations

Answer the Following Questions


1. 2. 3. 4. 5. 6. 7. What do you mean by database security ? Explain. Explain the issues to be addressed in database security. Narrate the different types of security in database security system? Explain how DBA is responsible for securing data in the database. What are the methods used to grant and cancel privileges in relational database system? Explain them. Explain briefly discretionary and mandatory access control methods. What is the importance of statistical database security?

Chapter 7

In t r o d u ct io n t o Micr o so f t Access

7.0 OBJECTIVES

n this chapter you will learn

Accessing Microsoft Access Opening a Database Working with Access Database Creating a Table Data manipulation in DBMS Creating and Customizing Creating Reports

7.1 INTRODUCTION
This chapter gives you an introduction, as to what an RDBMS is, and what is the difference between MS-Access, an RDBMS and other packages. Also you will learn to open an existing database and see all the objects present in an Access database. A database is a collection of data related to a particular topic. Database, typically consists of a heading that describes the type of information it contains, and each row contains some information. In database BSIT 24 Basics of DBMS

103

104

Chapter 7 - Introduction to Microsoft Access

terminology, the columns are called fields and the rows are called records. This kind of organization in a database is called a Table. A DBMS is a system that stores and retrieves information in a database. Data management involves creating, modifying, deleting and adding data in files and using this data to generate reports or answer adhoc queries. The software that allows us to perform these functions easily is called a DBMS.

7.1.1 Microsoft Access database


Microsoft Access is a relational DBMS. Microsoft Access is also a database like any other database. Why one should go in for MS-Access, why not for any other one, like FoxBASE or Dbase? In MS-Access unlike other databases it is possible to display an image on screen apart from all the other details, that is you can store pictures in Access but not in other databases. As an example, let us introduce the personal information system of a company. The company has many departments. There are many employees working under the organization. The company wants to maintain a database, which will store the details and the entire information about the employees. The details of the database are Employee number, Employee name, Data of joining, Sex, Basic salary, Qualification and Department.

7.1.2 Tables and Queries


A Table - Data A Microsoft Access database is a collection of database files, which are also known as tables. And each table is a collection of records, and a record is a collection of fields. If the company wants to store the employee details, they will have to form a table, which will be part of some database. The information about an employee will make one record of that table and the information will be stored under fields such as Employee number, Employee name, and others. Example

BSIT 24 Basics of DBMS

105

Each record in a table contains the same set of fields and each field contains the same type of information for each record.

A Query A question and an answer


In MS-Access, a Query is a question you ask about the data in your database. The answer to the question can be from a single table or several tables; the query brings the data together. Example Suppose in the personal information system, the manager of the company wants to know the total basic salary of all the employees. The answer to the query may be Yes or No. Keeping track of a large number of employees is difficult. For Example The total basic salary of all the employees.

You create a query that describes the set of records you want. When you use the query to access the data, you automatically get current data from the table/s.

7.1.3 Forms and reports


A Form information on the screen There are two ways in which you can view data, which is stored in a table. First way:

106

Chapter 7 - Introduction to Microsoft Access

Second way

The second way of viewing data is more preferable. A query output can be viewed as in the first way. But it can be viewed in the second way by using Forms. A form is a customized way of viewing, entering and editing records in the database. You can specify

how data is to be displayed when you design the form. Forms can be created to resemble more closely the way data would be entered on paper form so that the user feels familiar with the operation.

BSIT 24 Basics of DBMS

107

A Report Required results in print


Forms and queries present the data on screen. Reports are used to present data on printed paper. It provides a way to retrieve and present data as meaningful information, which might include totals and sub totals, which have to be shown across a set of records.

7.1.4 Accessing Microsoft Access


As any other windows based application, you can start and quit Microsoft Access in the same way.

To start Microsoft Access


1. Open the program group that contains the Microsoft Access icon.

2. Double click the Microsoft Access icon. Microsoft Access starts and displays Microsoft Access window, where you can create or open a database.

To quit Microsoft Access


Choose Exit from file menu.

7.1.5 Opening a database


A Microsoft Access database is a collection of objects. A database file contains the tables, queries, forms and reports that help you to use information in the database.

108
To open a database
1. Choose Open database from the file menu. It will show the following Open database window

Chapter 7 - Introduction to Microsoft Access

1. Select the directory from directories list that contains the database file. 2. Select database from file name list box 3. Click on Open to display Microsoft Access Database window. As soon as you click on Open, a database window will be displayed as shown below. The database window displays a list of the tables created in the database.

7.1.6 Database window


When a database is opened, Microsoft Access displays its database window in the Microsoft Access window. From Access window you can create and use any object in your database and other features of the Microsoft access.

Title bar is located at the top of the screen and displays the name of the program. Menu bar is located below the title bar. It lists the various options. Tool bars generally located below the menu bar, provides quick access to most frequently used commands and utilities. It can be customized by dragging the tool bars and placed in convenient positions by the user.

BSIT 24 Basics of DBMS

109

Status bar is a horizontal bar at the bottom of the screen that displays information about commands, toolbar buttons and other options.

7.1.7 Objects of the Access database


Tables, queries, forms, reports, macros and modules are objects of the Access database. The object buttons in the database window provide direct access to every object in the database. Example To view all Tables created by you: Click the Table button in the database window. Microsoft Access displays the list of tables stored in the database.

110

Chapter 7 - Introduction to Microsoft Access

Similarly all other objects in the database window can be viewed by clicking on the appropriate object buttons. To close a database Select Close database from the File menu.

7.2 WORKING WITH ACCESS DATABASE

7.2.1 Introduction
Now, we are familiar with opening an existing database and all the objects in the database. Let us learn to create a new database and objects in the database. A table is a collection of data stored about a particular subject. The data in a table is presented in columns and rows. We will also learn to create the basic structure of a table, to add rows (records) and to edit them.

7.2.2 Creating a Microsoft Access database


When a Microsoft Access database is created, one file that contains all the tables in the database as well as queries, forms, reports and other objects that help us to use the information is created.

BSIT 24 Basics of DBMS

111

To create a Microsoft Access database


Select New database from the File menu. The following dialog box is displayed. Select Blank Database and Click Ok.

The following File new database dialog box is displayed.

112

Chapter 7 - Introduction to Microsoft Access

1. Select the directory in which you want to create the database. Enter a database name, which can contain upto 8 characters but no spaces in the file name box. No need to give extension because Microsoft Access automatically adds an extension to the database name. 2. Click on Create to create an empty database file.

7.2.3 Creating objects


A database contains different types of objects. Now we know how to create a database. The next step is the creation of objects in this database. Tables are the first objects to be created in the database. The number of tables to be created is based on the user requirements. To get desired information from the database, the next step is to create queries, forms, reports and other objects. Create / modify an object To create a Microsoft Access object 1. Select the object type to create from the database window. 2. Click on the New button.

To modify the design of an object 1. Select the object type to modify from the database window.

BSIT 24 Basics of DBMS

113

2. Select the object name from the list to modify. 3. Click the Design button to display object window in design view. Note: There is an option to create objects yourself or through the of access wizard. An access wizard is like a database expert, which prompts you with queries about the object and then builds the object based on the answers to the queries. Creation of objects with the help of wizards will be covered later.

7.2.4 Customizing toolbars


Microsoft Access provides a wide variety of graphical tools, which can be used

To create and modify objects in the database. When you start, Microsoft Access displays tools only for opening and creating a database. After a database is opened, new toolbars get added to the existing ones. The toolbars get or loose focus as and when you open any object (forms, tables, queries, reports, etc.) in Design, open or New view. Initially, the toolbar appears at the top of the Microsoft Access window and the tools are arranged in a single row. We can customize the toolbar into vertical side of window, bottom of the window and middle of window and change its shape. To Customize toolbars 1. Select Toolbars from view menu to display toolbars dialog box. Toolbar customize window is displayed Use of different options allows the toolbars to be customized.

114
2. In toolbars dialog box we can:

Chapter 7 - Introduction to Microsoft Access

Click Large buttons to enlarge or return them to the original size Show ToolTips. Click on Close button to close the dialog box.

7.2.5 Fields and datatypes


The first step in designing the database is to make the table structure. Each table in the database represents a single subject, for example employee information or an invoice. Before designing a table one should be very clear about the data that is to be stored in the table, based on which a table structure is created. For example, details of employee information stored in a table requires employee number, employee name, date of joining, sex, basic salary, qualification, department. These details are referred to as fields in database terminology. Fields can be of different data types like number, character or date. Microsoft Access uses the Datatypes to decide how much storage to give to a field and to ensure that the right kind of data is entered in the field. For example, a text cannot be entered in a numeric field. Choosing the right Datatypes for a field is important before entering data in the table. Datatypes of a field that already contains data can be changed but if the Datatypes are not compatible there may be loss of data. Example Structure of an EMPLOYEE table

Field name EMP_NO EMP_NAME DOJ SEX BASIC_SALARY QUALIFICATION DEPT_CODE

Field type N C D C N C C

Size 5 20 1 7 10 5

Decimal

We are trying to store the following details of an employee: Employee number (EMP_NO)

BSIT 24 Basics of DBMS

115

Employee name (EMP_NAME) Date of joining (DOJ) Sex (SEX) Basic salary (BASIC_SALARY) Qualification (QUALIFICATION) Department (DEPT_CODE). EMP_NO and BASIC_SALARY fields will have numeric data and so can be of type number EMP_NAME, SEX, QUALIFICATION, and DEPT_CODE store character data and hence can be of type text DOJ is for storing a date and so can be of type date

7.2.6 Creating a table


A table first created is an empty container for data. The table is designed to contain specific type of data. To create a table 1. Click on Table button in datasheet window. 2. Click on New button to display the new table dialog box.

116

Chapter 7 - Introduction to Microsoft Access

3. Click the New table button to open table window in Design view. 4. Click Ok to display the table structure in Design view.

We now have a window where we can specify the fields in our table and what kind of data they will be storing. The creation of table structure begins from here. The window below depicts the table in design view. The table window has two portions. The upper portion has field name, data type and description of the field. The lower portion has field properties like size, format, etc. For creating the structure: a. Enter the first field name EMP_NO in field name box. Field name can consist of upto 64 characters. b. Press Tab key to go in data type box and select datatype, for example Number. c. Press Tab key to go in Description box and type, for example Employee number. This description appears in the status bar when data is being entered in the field. Press Tab key to go in to the next field. d. Repeat steps a, b, and c to add other fields. To set a field property 1. Select field in the upper portion of the table window in design view. 2. Set field properties in the lower portion of the table window.

BSIT 24 Basics of DBMS

117

7.2.7 Field properties


You can control the appearance of data, specify default values and speed up searching and sorting by setting field properties in tables design view. Field size: Suppose the EMP_NAME should not exceed 20 characters, set the field size to 20 or limit the range of allowable values in case it is a number field. Format: You can specify the number or date fields in any of the following formats:

118

Chapter 7 - Introduction to Microsoft Access

Decimal places: Display a certain number of places after the decimal point when using a format for a number or currency field. Default value: Suppose if the user does not enter a value for a field, some value should be taken for that field. In such a case use the default value. For example, if DOJ is not entered by the user, current date should be taken as DOJ. Use of default value will automatically fill the current date in DOJ field, in new records. Indexed: Data is indexed on this field (default is NO)

7.2.8 Save and close a table


Save the table design before you can add any records. To save and name a table. 1. Select Save from the file menu. 2. If you are saving the table for the first time, type a name for the table and click Ok. Table name can be upto 64 characters. To close a table. Select Close from file menu.

7.2.9 Add and save records


After designing, you can add records to a table. To add records 1. Select table to add records from the database window. 2. Click the Open button from the database window to open table in datasheet view. 3. Enter a value in each field pressing Tab key to move to the next field.

BSIT 24 Basics of DBMS

119

4. After you fill in all the fields, press Tab key to move to the new blank record. When you move to the next record, Microsoft Access saves the record added to the data sheet. When you finish adding records, close the data sheet, you dont have to save your work.

7.2.10 Edit records and close a table


To edit / change the value in a field When you open a data sheet, the first field of the first record is selected. Use the mouse to select the contents of the field you want to modify. Type the new value for the field. To cancel all editing changes to a field, press Esc key. To close a table, select close from File menu.

7.2.11 Modify fields in a table


If any modifications to fields in a table are desired, you can rearrange them, edit them, delete them or insert new fields also. To edit a field a. Select the field to edit. b. Edit name, data type or description of the field in the upper portion of the table window. c. Modify the field properties in the lower portion of the table window. d. Save it and close the table. To move a field

Select the field by clicking the field selector to the left of the field name.

120

Chapter 7 - Introduction to Microsoft Access

Click the field selector again and hold the mouse button and drag it to the new location.

3. Save it and close the table. To delete a field. a) Select the field by clicking the field selector to left of the field name. b) Press DEL key or select Delete row from the edit menu. c) Save it and close the table. To insert a field b. Select Insert row from the edit menu. c. It inserts an empty row before the current row. d. Enter field name and other information in empty row. e. Set field properties in the lower portion of the table window. f. Save it and close the table.

7.2.12 Modify columns and rows in datasheet


If the columns in a data sheet dont fit the field values they display, the width of each column, the height of each row can be changed. Also you can rearrange the data sheet columns. Change the width of a column 1. Position the mouse pointer at the right side of the field selector for the column to be resized. When mouse changes shape, you can resize the column. 2. Drag the column border to the desired size or select column width from the format menu and select best fit to fit the data it displays.

BSIT 24 Basics of DBMS

121

Change the row height. 1. Position the pointer between two records selectors at the left side of the data sheet. When mouse changes shape, you can change height of row. 2. Drag the row to the desired size. All rows in the data sheet change to the new row height. Move a column. 3. Select a column you want to move by clicking the field selector. 4. Click the field selector again and drag the column to its new position. As you drag the column a solid bar between columns indicates its destination. Save and close data sheet Layout

Select Save from the file menu to save data sheet. Select Close from the file menu to close data sheet.

7.2.13 Validation rule to a field


Microsoft Access automatically validates values based on fields data type. For example, a text cannot be entered in a number field. You can set more specific rules for data using validation rules. You can set validation rule, property for the field. When a validation rule property is set, it specifies the requirements for data that is entered into a field. For example, employee name should not be left blank for which a validation rule can be specified. If the validation rule is violated when an entry is being made, some kind of message to be displayed is specified in the validation text. This text is displayed when an entry in the field breaks the validation rule. Examples Validation rule M or F <> 0 > 200 Validation text Enter M or F Enter a non zero value Value must be greater than 200

To set validation rule 1. Open table in design view.

122
2. Select field to attach validation rule.

Chapter 7 - Introduction to Microsoft Access

3. Set the rule to the validation rule and validation text of the field properties in the lower portion of the table window. 4. Save and close the table.

Check your progress


Create a table STUDENT to store the details of marks scored by a student.

Field Student_ID Name Class English Hindi Maths Science Social_science

Type Numeric Text Numeric Numeric Numeric Numeric Numeric Numeric

Width 5 20 2 3 3 3 3 3

Constraint Unique

Create a table TRANSACTION to have the following fields.

Field Trans_No Item_No Item_name Trans_date Quantity

Type Numeric Numeric Text Date Numeric

Width 5 5 25 5

Constraint Unique

After creating the tables, do the following: 1 Set field properties of each field. 2 Modify fields in the table. 3. Modify the table STUDENT to include the following fields:

Field Aggregate Average

Type Numeric Numeric

Width 4 5

Constraint 2 decimal places

4 Apply necessary validation rules to each field. 5 Add records.

BSIT 24 Basics of DBMS

123

7.3 DATA MANIPULATION IN DBMS 7.3.1 Introduction


Table is used to store data. Stored data can be retrieved whenever required. There are many ways in which data stored in a table can be viewed based on some criteria. Let us learn find, filter, query and sort to view data.

7.3.2 Find a value


Suppose you require the details of an employee where employee number is 1234. One way of getting the details is to open the table in open mode and browse through all the records one by one. The other way is to use the find option. When you want to find the specific record or find certain values within the fields, you can use the find option to go directly to a record. You can also use the find option to navigate through records and find one record after another. To find a specific value in a field 1. Select the field you want to search 2. Select Find from the edit menu

3. In the find what box, type the value you want to find 4. Click the Find first button to move to the record if it exists. 5. Click the Find next button to find the next occurrence of the specified value 6. At the end click the Close button to close the dialog box.

7.3.3 Find and replace a value


It may be required to make the same change in several places of the data. Microsoft Access provides

124

Chapter 7 - Introduction to Microsoft Access

to find occurrences of specific text and to replace them with different text by using the replace command. Replacements can be made either individually or globally. To find and replace occurrences of specified text 1. Select the field where you want to search and replace in the open view. 2. Select Replace from the edit menu The replace dialog box is shown below:

3. Type the text in the find what box. 4. Type replace text in the replace with box. 5. Now, click the Replace All button to replace all occurrences of the specified text or click the Find Next button to replace occurrences of the specified text one at a time. 6. When you finish replacing, click the Close button to close the dialog box.

7.3.4 Create and apply a filter


Microsoft Access provides two ways to create a customized view of data in tables. A query or a Filter for a table can be created. A filter is like a simple query except that it applies only to an open table. A filter is best for temporarily changing the set of records being viewed. In Microsoft Access, Filter is used to view a subset of records in a table by specifying the criteria and the sort order in the filter window. To filter

Open table in the data sheet view. Select filter from the Records Menu.

BSIT 24 Basics of DBMS

125

Select the required option to filter the records Select Apply filter / Sort from the Records menu to display some filtered records in the table.

5. To remove a filter, select Remove filter / Sort from the Records menu.

7.3.5 Sort records


Records in a table can be sorted in a different order than they are usually displayed by using the Sort command. Sorting records for display could be either Ascending or Descending order. To sort records in a table

Select the column in a data sheet to Sort. Select Sort from the Records menu and then select Ascending or Descending. The sorted records by Emp_name for the above datasheet view is as shown below.

7.3.6 Create a query


A query is a question about the data stored in the tables. The query tells exactly how the data is to be retrieved. Microsoft Access gives you a great deal of flexibility in designing queries. Queries help to 1. Choose fields.

126
2. Choose records, that is specify criteria. 3. Sort records, that is specify order. 4. Look for data in several tables. 5. Perform calculations. 6. Make changes to data in tables. To create a Query

Chapter 7 - Introduction to Microsoft Access

a. Click the Queries button in the database window

2. Click the New button to display the new query dialog box.

Click the OK button to open a select query window and displays the Show table dialog box, which displays the Tables and the Queries in the database.

BSIT 24 Basics of DBMS

127

b. Select the table and click on Add to display a field list for each table.

c. Click the Close button.

7.3.7 Query window


As soon as you close Add table window, you will see a Query window. The query window has three views Design view, Datasheet view and SQL view. Design view Use this option to create a query or change the design of an existing query. You can use graphical query tools to create a query.

128

Chapter 7 - Introduction to Microsoft Access

Datasheet view Use this option to see the data retrieved by query.

SQL view Use this option to enter SQL (Structured Query Language) statements to create or change a query.

BSIT 24 Basics of DBMS

129

The tool used to create a query in design view is called QBE (Graphical Query by Example). With Graphical QBE queries can be created by dragging fields from the field list in the upper portion of the query window to the QBE grid in the lower portion of the window.

In the QBE qrid, each column contains information about a field included in the query.

7.3.8 Join tables


To create a query from more than one table, you add the tables you want and make sure that the tables are joined to each other. We can join the tables by drawing the join lines between tables, although in many cases Microsoft Access creates join lines automatically. In most cases, a join lines Microsoft Access: Select the records from both the tables that have the same values in the fields that are joined. This is referred to as inner join. The fields join in this way are called join fields. Example Suppose you have two tables: EMPLOYEE and DEPARTMENT. EMPLOYEE table contains EMP_NO, EMP_NAME, DOJ, SEX, BASIC_SALARY, QUALIFICATION and DEPT_CODE. DEPARTMENT table contains DEPT_CODE AND DEPT_NAME. If you want a query that contains DEPT_CODE, you will have to join the two tables. To join two tables in the query window Select a field in one table and drag it to the equivalent field in the other table. It draws a join line from one table to another.

130

Chapter 7 - Introduction to Microsoft Access

To delete a join between two tables in the query window Select the join line and press DEL key

7.3.9 Select fields


After adding tables to the query, fields can be included in the query. The fields selected determine the output of the query in the datasheet view. If you add more than one table, field can be seen for each table. To add a field to a query 1. Drag the field from the field list to a cell in the field row of the QBE grid. 2. Repeat the same until all the fields of the query are shown in the QBE grid.

7.3.10 Specify criteria


To limit the querys Dyanset (records displayed as output) to certain records, specify criteria is defined. For this, an expression is used. An expression tells Microsoft Access which records to include in the querys Dynaset. To specify criteria for a field 1. To define an expression, select the criteria cell in the QBE grid. 2. Type the expression in the criteria box. 3. To check the results, select datasheet from the view menu. 4. Use the sort option in the criteria box to view the data in sorted order.

BSIT 24 Basics of DBMS

131

7.3.11 Calculate totals


To calculate totals 1 Select Totals from the View menu to display the totals row in the QBE grid. It automatically fills Group By in each box. 2 Select the field to total on it 3. Select sum from the list of total cell

4. Select Datasheet from the view menu to see the results.

132 7.3.12 Modify and save a query


You can easily move, delete columns in the query. To move a column in a query

Chapter 7 - Introduction to Microsoft Access

i) Click the field selector (column heading) of the column in design view.

ii) Click the field selector again, hold down the mouse button and drag the column to its new location.

To delete a column in a query 1. Click the field selector (column heading) of the column in design view. 2. Press DEL key To exclude a field from the querys Datasheet 2 Clear the fields Show box by clicking it. To save a query 2 Select Save from the File menu to display Save as dialog box (if first time) 3 Type name in query name box 4 Click Ok to save query in the database.

Check your progress


For the tables created in previous chapter
1. 2. Apply filters to list students with marks greater than 70. Apply filters to get transactions for a date.

BSIT 24 Basics of DBMS

133

3. 4. 5.

Sort students by name. Sort transactions by date. Create queries to list students with marks > 70, Total transaction quantity for a date.

7.4 CREATING AND CUSTOMIZING FORMS 7.4.1 Introduction


A Query or a Filter is used to view the records in raw form from a table. To view the data in customized way we use Forms. A Form provides an easy way to view data and all the values for one record. Switch to datasheet view of the form to see all the records for that form. A Form offers the most convenient layout for entering, changing and viewing the records in the database. The form design tools in Microsoft Access help to design forms that present data in an attractive format with special fonts, and other effects.

7.4.2 Creation with Form Wizard


Forms can be created with or without the aid of Form Wizards. Form Wizards speed up the process of creating forms. When you use a form to enter or display data in the database, connect the form to the table or query which forms the source of data. If all the data is in one table, base the form on that table. If the data is in more than one table, base the form on a query. To create a Form by using a Form Wizard 2 Click Form button in the database window 3. Click the New button to display a New Form dialog box

134

Chapter 7 - Introduction to Microsoft Access

4. Select a Table / Query in the list box

5 Click Ok to create the form by choosing required fields (double click on the required fields ), a format (say tabular) and title for the form At the end, click on finish button to save and open the form . The form displays the first record in the table.

7.4.3. View, Add, Delete and Save records


The above form can be used to view, change, add, and delete records in the table. The objects on the form are called Controls. These controls are used to change and view the data. The controls are: 1 A label which displays text 2. A text box provides a space to display or type text corresponding to the label to be stored in the database.

BSIT 24 Basics of DBMS

135

To switch to datasheet view, select datasheet from the view menu to display forms data in datasheet view. To switch to form view, select forms from the view menu to display records in form view. To move from record to record in form view, use navigation buttons to go to first, last, next or previous records.

To add a new record, 2 Select New Record from the Insert menu. 3. A new blank record is displayed.

4 Type the value in the first text box. 5 Press Tab key to move to the next field. 6 Repeat to enter all other information.

136

Chapter 7 - Introduction to Microsoft Access

7 After all the fields are entered and Tab key is pressed to move to the next record, Microsoft Access saves the record in the table.

7.4.4 Close a Form


To close a form select close from the file menu.

7.4.5 Change Form Design


To make changes of a form in the forms design view, open the form in design view from the database window. Change this form ...

.. to look like this

To open a form in Design view 2 Click the form button in the database window. 3 Select form from the forms list

BSIT 24 Basics of DBMS

137

4 Click the Design button to open from in design view. Microsoft Access presents the form in three sections in design view: 2 Form Header contains the heading label of the form. It appears at the top of the window 3 Detail section contains the fields from the table to view data. It repeats for each record 4 Form footer appears at the bottom of the window. All forms have a detail section but may or may not have form header and footer. A form in design view:

To add form header and footer, select Form Header / Form Footer from the list box.

7.4.6 Select, Resize, Move and Delete controls


Controls on the form are labels and text boxes. In design view, these controls can be selected and resized. To select a control 2 Click the text box, to display size and move handles around the control. 3 Drag the handles on the top and bottom to size the text box vertically. 4 Drag the handles on the left and right sides to size the text box horizontally. 5 Drag the handles in the corners to size the text box both vertically and horizontally. To resize a control 1 Position the pointer at the corner of the text box.

138
2 Drag the border to the resize the control.

Chapter 7 - Introduction to Microsoft Access

All the text box controls have attached label controls. They can be moved together or separately. To move a control 1 Select the control to move. 2 Position the pointer anywhere on the control and hold down the mouse button. 3 Drag the control (text box and label together move) 4 Release the mouse button when the control is placed at the desired place. To move the attached label separately 4) Select the control 5) Position the pointer at the left top corner of the label and hold down the mouse button. 6) Move the label around 7) Release the mouse button when the label is positioned at the desired place. To delete a control 4) Select the control to delete 5) Press DEL key. It deletes the text box and its attached label.

7.4.7 Change fonts, size and color of Text


Microsoft Access provides choices for the appearance of controls on forms. You can change the size, font and color of the text. 4) Select the label 5) Click bold or italic button to change text 6) Select font from the font list to change the appearance of the text 7) Select size from the size list to resize the control. 8) Click the palette button on the toolbar to display the palette. 9) Select the Fore color or Back color or Border color, to change the text color, fill color and border color respectively. Click the palette button to close the palette.

BSIT 24 Basics of DBMS

139

7.4.8 Showing data from more than one table


Forms could be customized using more than one table or a Query. Using a subform is one way to include information from more than one table in a form. A subform is a form within a form. When a subform is used, relationship is made between records from two or more tables. The main form and the subform are linked so that the subform displays only records that are related to those in the main form. When you create a Form/Subform using the wizard, data can be viewed in the subform in either datasheet view or form view. To use a query to include fields from more than one table A form can be based on a query. Query is used to display limited or sorted information from one or more tables.

To create a query 2 Click the query button; click the new button to open the new query window. 3 Add the two tables, to display data in the form. 4 Connect the tables with join line. 5 Drag the fields from the field list to the QBE grid. 6 Save and close the query. To base a form on a query 2 Click form button in the database window. 3 Click new button to display New form dialog box. 4 Select the query just created from the list box. 5 Click Ok to create the form by choosing the required fields, a format and a title for the form. At the end click on Finish button to save and open the form.

140
Check Your Progress
Using the tables created previously 1. Create forms to view data.

Chapter 7 - Introduction to Microsoft Access

2. Add, delete and save records through the forms created. Change the structure of the form in design view.

7.5 CREATING REPORTS 7.5.1 Introduction


Reports are used to present data on paper. A report is information organized and formatted to fit some specification. Examples are employee details, department details, etc. With Microsoft Access different design elements such as text, data, pictures, lines, boxes and graphs are used to create reports. You can create a design for a report and save it. It can be used again and again. Current data at that time is printed. You can create reports that 1 Organize and present data in-groups. 2 Calculate running totals, group totals, grand totals, and percentage of totals. 3 Include sub reports and graphs. 4 Present data in an attractive format with pictures, lines and special fonts.

7.5.2 Create a Report


1 Click the report button in database window. 2. Click new button to display the new report dialog box.

BSIT 24 Basics of DBMS

141

3. Choose Report Wizard from the dialog box and Click OK.

4 Make the following choices through the dialog box. i) Choose the fields you want on the report. Fields can be from more than one table or query. For example Emp_no, Emp_name, Basic_salary from Employee table Dept_name from Department table.

142

Chapter 7 - Introduction to Microsoft Access

ii) Make a choice to view the data. For example By department.

BSIT 24 Basics of DBMS

143

iii. Add grouping levels.

iv. Select the sort order and summary options for the detail records.

For example

144

Chapter 7 - Introduction to Microsoft Access

Choose ascending order of Emp_no, Emp_name and descending order of Basic_salary and Summary options Sum, Min, Max. v) Choose a layout for the report.

BSIT 24 Basics of DBMS

145

vi) Select a style for the report.

vii) Give a title for the report and click on Finish button to create and open the report in Print Preview.

146
Report in print Preview:

Chapter 7 - Introduction to Microsoft Access

7.5.3 Preview, Print and Save a report


After the wizard creates the report, Microsoft Access displays the report, as it would appear in print. To see a whole page in report, position the pointer over the report in Print Preview, click the report to display a view of the whole page. Click the report again to zoom back and view data. To scroll in a page, click the horizontal and vertical scroll bars and to scroll through pages, click the page buttons to scroll in other pages. To print a report, select print from file menu. A Print dialog box is shown. Choose the appropriate options in the box. Click on Ok to print. To close the report, choose the close option from the file menu.

BSIT 24 Basics of DBMS

147

7.5.4 Report in Design View


The design of the report can be modified in design view. To open a report in design view 2 Click the report button in the database window. 3 Select the report to modify. 4 Click the Design button to open the report in design view. In design view, the report is divided in sections such as report header and footer, page header and footer, group header and footer, detail section. Reporter header and footer prints information once in the report. Page header and footer print the information on every page. Group header and footer prints information on change of every group (group by which the report is grouped). Detail section prints each record.

Check Your Progress


Using the tables created in previously and / or related queries, generate the following reports:
1. List of students with marks greater than 60 in English.

148
2. 3. 4. List of students whose average is greater than 80. List of Items for a Transaction date.

Chapter 7 - Introduction to Microsoft Access

Day-wise transactions for each month under the months heading showing total transaction at the end.

7.6 SUMMARY
A database is a collection of data related to a particular topic. Database, typically consists of a heading that describes the type of information it contains, and each row contains some information. In database terminology, the columns are called fields and the rows are called records. This kind of organization in a database is called a Table. A DBMS is a system that stores and retrieves information in a database. Data management involves creating, modifying, deleting and adding data in files and using this data to generate reports or answer adhoc queries. The software that allows us to perform these functions easily is called a DBMS. In this chapter we have introduced MS-Access. Microsoft Access is a relational DBMS. Microsoft Access is also a database like any other database. In MS-Access unlike other databases it is possible to display an image on screen apart from all the other details, that is you can store pictures in Access but not in other databases. A Microsoft Access database is a collection of database files, which are also known as tables. And each table is a collection of records, and a record is a collection of fields. Each record in a table contains the same set of fields and each field contains the same type of information for each record. In MS-Access, a Query is a question you ask about the data in your database. The answer to the question can be from a single table or several tables; the query brings the data together. A form is a customized way of viewing, entering and editing records in the database. You can specify how data is to be displayed when you design the form. Forms can be created to resemble more closely the way data would be entered on paper form so that the user feels familiar with the operation Forms and queries present the data on screen. Reports are used to present data on printed paper. A Microsoft Access database is a collection of objects. A database file contains the tables, queries, forms and reports that help you to use information in the database. When a database is opened, Microsoft Access displays its database window in the Microsoft Access window. From Access window you can create and use any object in your database and other features of the Microsoft access. Tables, queries, forms, reports, macros and modules are objects of the Access database. The object buttons in the database window provide direct access to every object in the database. A table is a collection of data stored about a particular subject. The data in a table is presented in columns and rows. We discussed how to create Microsoft access database, creating a table, save and close a table, add,

BSIT 24 Basics of DBMS

149

edit and save records, modify fields in a table, modify columns and rows in a datasheet, and also validation rule to a field. Table is used to store data. Stored data can be retrieved whenever required. There are many ways in which data stored in a table can be viewed based on some criteria. We learnt how to find, filter, query and sort as well as to view data. A Query or a Filter is used to view the records in raw form from a table. To view the data in customized way we use Forms. A Form provides an easy way to view data and all the values for one record. Switch to datasheet view of the form to see all the records for that form. A Form offers the most convenient layout for entering, changing and viewing the records in the database. The form design tools in Microsoft Access help to design forms that present data in an attractive format with special fonts, and other effects. We learnt how all these are possible. Reports are used to present data on paper. A report is information organized and formatted to fit some specification. Examples are employee details, department details, etc. With Microsoft Access different design elements such as text, data, pictures, lines, boxes and graphs are used to create reports. You can create a design for a report and save it. It can be used again and again. Current data at that time is printed. We also discussed all these aspects provided by Microsoft access.

Check Your Progress


1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. Microsoft Access is a relational ___________ A Microsoft Access database is a collection of database files, which are also known as __________ a _________ is a question you ask about the data in your database __________ can be used to view the data Forms and queries present the data on ____________. Reports are used to present data on __________ To quit Microsoft Access, choose ________ from file menu. ______ button in the database window, displays the list of tables stored in the database. The data in a _________ is presented in columns and rows A database name can contain name up to ________ characters The number of tables to be created is based on the user __________ Table name can be up to ________ characters. Click on ________ to create an empty database file. To save and name a table select ________ from the file menu. To close a table select _______ from the file menu

150
16. Reports are used to present data on _________

Chapter 7 - Introduction to Microsoft Access

Answers
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. DBMS Tables Query Forms Screen Printed paper Exit Table Table Eight Requirements 64 create save close paper

Answer the Following Questions


1. 2. 3. Why one should go for MS-Access in place of other DBMSs ? Justify. Explain how data is stored in MS-Access database. Explain the following terms i) query ii) form 4. 5. 6. 7. 8. iii) report

Explain how you start Microsoft Access Explain how a database is opened and closed. Explain the different types of bars available when database window is opened. Explain the steps involved in creating Microsoft Access Database. What are the data types available in MS-Access ? Explain.

BSIT 24 Basics of DBMS

151

9. 10. 11. 12. 13.

Explain how a table is created, saved and closed. What are the field properties available? Explain. Explain how records can be added, edited and saved. Explain how to modify fields in a table. Explain briefly the following operations i) Find a value ii) Find and replace a value iii) Create and apply a filter iv) Sort records v) Create a query

14. 15. 16.

Explain how forms can be used to view, add, delete and save records. What is a report in Microsoft Access? Explain How a report is created? Explain

MORE EXERCISES FOR HANDS ON SESSION


Note : Try These exercises for more familiarity with MS-Access.
1. Create a table BSIT_STUDENT to store the details of marks scored by a student.

Field Roll_num Name Sem OS DSC DC BDBMS DS_Lab BDBMS_Lab

Type Numeric Text Numeric Numeric Numeric Numeric Numeric Numeric Numeric

Width 5 20 2 3 3 3 3 3 3

Constraint Unique

After creating the table, do the following: 3 Set field properties of each field. 4 Modify the table STUDENT to include the following fields:

152
Field Aggregate Average Class_obtained Type Numeric Numeric Text Width 4 5 10 Constraint

Exercises

2 decimal places

5 Apply necessary validation rules to each field. 6 Add 25 records. For the table created above 1. Apply filters to list students with marks greater than 50. 2. Sort students by name. 3. Create queries to list students with marks > 50 Using the table created above 1. Create forms to view data. 2. Add, delete and save records through the forms created. Change the structure of the form in design view. Using the table created above, generate the following reports: 3. List of students with marks greater than 50 in OS 4. List of students whose average is greater than 70. 5. List of students who got first class with distinction 6. List of students who have failed. 2. Create a table DAILY_TRANSACTION to have the following fields.

Field Trans_No Item_No Item_name Trans_date Quantity

Type Numeric Numeric Text Date Numeric

Width 5 5 25 5

Constraint Unique

After creating the table, do the following: 1. Set field properties of each field. 2 Apply necessary validation rules to each field.

BSIT 24 Basics of DBMS

153

3 Add 30 records, with different transaction numbers, item numbers with different dates. For the table created above 1. 2. 3. Apply filters to list items sold with quantity more than 100 Sort all transactions by increasing order of date. Create queries to list items sold with quantity less than 60

Using the table created above 1. 2. Create forms to view data. Add, delete and save records through the forms created. Change the structure of the form in design view. Using the table created above, generate the following reports: 1. List of items sold on a specific date. 2. List of items sold with quantity more than 200. 3. List of particular item sold on different dates. 4. List of items sold more than once in a particular day. 3. Consider the Car insurance database given below. The primary keys for each table is underlined. Create all the tables, along with their fields given for each table an their field type and size. PERSON

Field Driver_ID D_name D_address


CAR

Type Text Text Text

Width 20 30 30

Constraint Unique

Field Regnum Model Year

Type Text Text Numeric

Width 20 15 04

Constraint Unique

154
ACCIDENT

Exercises

Field Report_number A_Date A_Location


OWNS

Type Numeric Date Text

Width 04 30

Constraint Unique

Field Driver_ID Regnum


PARTICIPATED

Type Text Text

Width 20 20

Constraint Unique

Field Driver_ID Regnum Report_number Damage_amount

Type Text Text Numeric Numeric

Width 20 20 04 06

Constraint Unique

After creating all the above tables by properly specifying primary keys and the foreign keys i) Enter at least 10 tuples for each relation, provide data of your liking ii) Write a query to count number of drivers iii) Write a query to count number of cars of each model iv) Write a query to find total number of accidents v) Write a query to find total number of accidents on a specified date. vi) Write queries to obtain a. Regnum of cars participated in more than one accident b. Driver_ID of drivers involved in more than one accident c. Sum of the damage amount d. Average of the damage amount e. Maximum damage amount paid

BSIT 24 Basics of DBMS

155

f. Minimum damage amount paid vii) Find the total number of people who owned the cars that were involved in accidents in 2005 viii) Find the number of accidents in which cars belonging to a specific model were involved ix) Update the damage amount for the car with a specific Regnum in the accident with report number 12 to 25000 x) Add a new accident to the database xi) Generate some suitable reports of your choice. xii) Create forms to view data xiii) Add, delete and save the data through these forms 4. Consider the following database of student enrolment in courses and books adopted for each course. The primary keys for each table is underlined. Create all the tables, along with their fields given for each table an their field type and size. STUDENT

Field Regno Name Major B_date


COURSE

Type Text Text Text Date

Width 10 30 20

Constraint Unique

Field Course_num C_name Dept


ENROLL

Type Numeric Text Text

Width 04 15 20

Constraint Unique

Field Regno Course_num Sem Marks

Type Text Numeric Numeric Numeric

Width 10 04 02 03

Constraint

156
BOOK_ADOPTION

Exercises

Field Course_num Sem Book_ISBN


TEXT

Type Numeric Numeric Numeric

Width 04 02 06

Constraint

Field Book_ISBN Book_title Publisher Author

Type Numeric Text Text Text

Width 06 40 25 30

Constraint

After creating all the above tables by properly specifying primary keys and the foreign keys i) Enter at least 10 tuples for each relation, provide data of your liking ii) Write a query to count number of text books iii) Write a query to count number of books by each publisher iv) Write a query to find total number of students v) Write a query to find total number of students enrolled. vi) Demonstrate how you add a new book to the database and make this book be adopted by some department. vii) Produce a list of text books in the alphabetical order for courses offered by the CS department that use more than two books. viii) List any department that has all adopted books published by a specific publisher.

ix) Add a new accident to the database x) Generate some suitable reports of your choice. xi) Create forms to view data xii) Add, delete and save the data through these forms

4. Define a database in Microsoft Access having the following schema.

BSIT 24 Basics of DBMS

157

Students (Name, Rollnum) Courses (Title, Code) Classes (Code, Rollnum) a. Provide data of your liking to each table. b. Write a query to list Name, Rollnum, Title, and Code. c. Prepare a report showing for each student name the titles of the courses taken by the student

CHECK YOUR OVERALL PROGRESS


Say True or False
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. Database system is a collection of database, a DBMS, hardware and users. Information is processed data Integrated databanks stored in Computer Systems are called Databases. ASCII means All Standard Code for Information Interchange A field consists of grouping of characters A group of related records is a data FILE A DATABASE is an integral collection of logically related records or objects. The person who has such a central control over the database system is called the Database Manager In case of database systems Redundancy can be minimized A significant disadvantage of the DBMS is its cost Internal view is closest to the user. The file-based applications are data dependent In relational model data is stored in the form of tables. The schema is the skeleton of a database Record is a collection of fields Sequential files are more suitable for answering queries An index sequential file consist of the data and one more levels of indexes The ER model describes data as entities, relationships and attributes

158
19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. Multivalued attributes are displayed in double ovals A key consisting of two or more attributes is called as a composite key In ER diagrams, relationship types are displayed as rectangle boxes In relational model each column has a unique name. In relational model two or more tables can have the same name. A set in network model represents a 1:M relationship between the owner and the member Hash file organization, provides very fast access to records on certain search conditions The DBA is responsible for the overall security of the database system The first step in designing the database is to make the table structure A table first created is an empty container for data Reports are used to present data on paper Reporter header and footer prints information once in the report

Exercises

Answers To say True or False


1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. T T T F T T T F T T F T T T T F

BSIT 24 Basics of DBMS

159

17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30.

T T T T F T F T T T T T T T

Fill in the Blanks


1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. Information is processed _________ ASCII stands for A _______ consists of grouping of characters Related data fields are grouped to form a ___________ A group of related records is a _____________ A _________ is an integral collection of logically related records or objects. The person who has such a central control over the database system is called the _____________________ In case of database systems Redundancy can be ___________ A significant disadvantage of the DBMS is its _______ Internal view is closest to the ______________ The schema is the __________ of a database Record is a collection of ________ A sequential file that is indexed is called an ___________ Each entity has __________

160
15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. __________ attributes are displayed in double ovals In ER diagrams, relationship types are displayed as ________boxes Relational data model is based on ________ algebra CODASYL expands to ____________ The hierarchical data model organizes data in a _______ structure Hashing for _______ files is called external hashing The ______ is responsible for the overall security of the database system Reports are used to present data on __________ Reporter header and footer prints information ________ in the report Page header and footer print the information on ________ page To close a form select close from the ________ menu

Exercises

Answers
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. data American Standard Code for Information Interchange Field Record Datafile DATABASE Database Administrator Minimized Cost Physical storage Skeleton Fields index sequential file Attributes Multivalued diamond-shaped relational

BSIT 24 Basics of DBMS

161

18. 19. 20. 21. 22. 23. 24. 25.

Conference on Data Systems Languages Tree Disk DBA Paper Once Every File

Answer the following questions


1. 2. 3. 4. Explain the difference between data and information giving examples. List and explain the tasks handled by the DBMS packages. Explain how data is classified by giving examples. Explain briefly i) Data representation ii) Data size iii) Relationship 5. Expalin briefly giving examples i) Character ii) Field iii) Record iv) File 6. 7. 8. 9. 10. 11. 12. 13. Explain briefly the four components of a database system. Explain briefly disadvantages of file oriented systems Explain briefly advantages/disadvantages of file oriented systems. Explain the three level architecture of DBMS What are the three schemas associated with DBMS ? Explain What is data independence ? Explain its types. Explain briefly the objectives and components of DBMS. Explain briefly how database is classified

162
14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. Explain briefly the important data models. Explain the primary use of external storage devices Explain briefly different types of records What is meant by file organization ? Explain its different types. Explain briefly the types of indexes. Explain the structure of index sequential file. Explain briefly available hashing techniques. Explain the five phases of the life cycle of database system development process What is an entity ? give examples Explain different types of attributes giving examples. What is a null value ? Explain by giving example. Explain the following by giving examples i) Super key ii) Candidate key iii) Primary key 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. Explain the ternary relationship-giving example. Explain different types of mappings giving examples. Explain the relational model giving example Explain the relational rules. Give the differences between relational, network and hierarchical data models. Explain object/relational and object oriented models. What is collision ? When it occurs ? Explain Explain external and internal hashing Explain briefly the issues of security Explain how DBA is responsible for overall security of the database. Explain how database is created in MS Access. Explain how form is created in MS Access Explain briefly how reports are created in MS Access.

Exercises

Das könnte Ihnen auch gefallen