Beruflich Dokumente
Kultur Dokumente
In t r o d u ct io n t o DBMS
1.0 OBJECTIVES
Database definition Concept behind data processing Flat file and its disadvantages Database and Database system Advantages and disadvantages of database system Data independence Architecture of database system Databases and Their management Objectives of DBMS Components of DBMS Types of Databases Database Models
2 1.1 INTRODUCTION
A database is a collection of operational data that is organized so that its contents can easily be accessed, managed, and updated. Database contains the aggregations of data records or files, such as sales transactions, purchase details, product information, inventory records and also customer details etc. Database system is a collection of database, a DBMS, hardware and users. Database Management system is application software supplied by the vendors, which helps in managing the database. Database Management is an important aspect of data processing. It involves, several data models evolving into different DBMS software packages. These packages demand certain knowledge in discipline and procedures to effectively use them in data processing applications.
The observed data is usually represented by symbols such as numbers words, codes (composed of a mixture of numerical and alphabetical and other characters). It could even take other forms like, voice, images, pictures, drawings, etc. If the observed / collected data is converted into a useful and meaningful form, then it becomes Information. Data is usually subjected to a value-added process called Data Processing OR Information Processing, where 1. its form is aggregated, manipulated and organized, 2. its content is analyzed and evaluated and 3. It is presented in a context meaningful to a human user. Thus we see that information is processed data, placed in a context that gives its value for specific end users. Difference between Data and Information Sl. No. 1. 2. 3. 4. 5. 6. Data Data is raw fact and figure. Ex. 90 is data Data is not significant to a business. Data are atomic level pieces of the Information. Data does not help in decision-making Data is generally in unrecognized form. Data is collected from the source directly and hence is not dependent on Information. Information Data when stored in some form live marks : 90; then it becomes information. Information is significant to a business. Information is a collection of data. Information helps in organized form. Information is in organized form. Information is dependent on the data that is gathered
Information may further be processed to form knowledge. Information containing wisdom is known as knowledged. So, we can say that: Data when processed Information when processed Knowledge
systems, expert systems etc., which are used by different levels of management in a business organization. In spite of all these variations and differences, all Information systems have some things in common. 1. They all use some kind of computerized techniques to store all the data and information generated in the system. 2. They all access the stored information in different ways to do further processing or presentation. Thus we see that data storage and retrieval is one of the central activities in Information processing. Such collection and organization of information is called Data bank. In early days of business, Data banks existed in the minds of key Personnel in the business. As the volume and complexity increased several tools like, Books, records, manuals, drawings etc., were devised as Data banks and manual procedures and skills were evolved to retrieve information from these banks when needed. However these techniques were not reliable and fast enough when the information involved was huge and complex. Hence business decisions could not be accurate and timely. To correct this Lacuna, Information systems were computerized. The speed and accuracy of computers resulted in tremendous improvement of reliability and timeliness of information generated. This process however, involved the development of techniques and tools to handle data banks on computers, namely, the tools to store and retrieve information in computers. The development of such techniques and tools resulted in what are known as DBMS packages. Integrated databanks stored in Computer Systems are called Databases. The Computer Software Packages (a set of tools and utilities) that facilitate the creation use and managing of Databases is called DBMS (Data Base Management Systems). DBMS provides computational capacity to store, retrieve, edit, sort and perform computations including statistics upon data, which it extracts from its storage. The tasks handled by DBMS packages can be classified as: a. Database Development - Define, organize the content, relationships and structure of the data needed to build a database. b. Database Interrogation - Access the data in a database to display information in various Formats. Users can selectively retrieve and display information and produce forms, reports, and other documents. c. Database Maintenance - Add, delete, update, and correct the data in a database. d. Application Development - Develop prototypes of queries, presentation forms, reports for a Proposed business application. Let us try and understand these tasks in detail later. First let us start a detailed study of Data.
0 and 1, some kind of coding is needed. Integers get directly represented as binary numbers. Real numbers are represented using a technique called Floating point representation. Strings are represented through an elaborate coding mechanism called ASCII (American Standard Code for Information Interchange). This coding uses 8 bits (binary digits) to represent a character. Example : Letter A could be 00110000; Letter B could be 00111000 etc. (You will have the details of data representations in other modules) Even pictorial/images and voice/audio and video data gets coded into a large number of 0s and 1s.
1.2.3.3 Relationship
Even though data items are individual entities, they never occur in isolation in the real world. They are always associated with other data item. Ex: Data item price is related to the vehicle in question, Date of transaction and the seller. There are 3 different types of data relationships. Let us understand each one of them. Simplest of all is 1 : 1 relationship. For each value of a data item there is one and only corresponding value in the other item. E.g.: Student ID and the student name. E.g.: Vehicle number and vehicle. Normally all such data items are grouped and kept together as a record. Second type of relationship is one to many (1: M). Here for every value of one data item there are several values of the other data item. However on the reverse, several values of other data items are related to a unique value of this data item.
E.g.: 1. A book has several chapters. But several chapters correspond to one and only one book. 2. A person can own several vehicles; all vehicles will have only one owner. One to many relationships can be represented in computers using pointers and arrays. (Details later) Third type of relationships is called Many to Many. (N: M). Most of the relationships in real world are this type. E.g.: - 1. A student has several teachers; A teacher might have several students. 2. A book can have several Authors. An author might have written several books. This type of relationships is difficult to represent and handle in computers. Hence, as far as possible we try to reduce them to two one to many relations (1: M and N: 1) and eliminate one which is irrelevant to the user. The Database must maintain all the data and their relationships and allow the user to access data based on these relations. E.g.: Get me all vehicles owned by a person. Get me the subjects taught by a teacher.
1.2.4.1 Character
Character is the most basic logical data element, which consists of a single alphabetic / numeric or other symbol. E.g.: The grade obtained in a subject could be A or B or C or D or E. Sex of a person could be M or F. Subject taught during hour.
8 1.2.4.2 Field
Field is the next higher level of data. A field consists of grouping of characters. E.g.: 1. Persons name field will be grouping of alphabetic characters. 2. Sales amount field will be grouping of numeric characters. 3. Teacher teaching the subject for a class. A field represents an attribute of some entity (object, person, place, or event) E.g.: An employees salary is an attribute that is a typical data field associated with the entity employee (in 1: 1 relation)
1.2.4.3 Record
Related data fields are grouped to form a RECORD. A record, thus is a collection of attributes that describe an entity. E.g.: 1. An employee record could consists of attributes like, his ID, name and salary he draws etc. 2. Set of subjects taught for a class during each hour.
1.2.4.4 File
A group of related records is a data FILE. E.g.: 1. A group of all employee records showing one record for each employee could be an employee file. Files are frequently classified by application for which they are used. 2. Timetable for a class for a week showing subjects taught each hour on each day of the week. Files are frequently classified by the application for which they are primarily used such as payroll file, Inventory file etc.
1.2.4.5 Database
A DATABASE is an integral collection of logically related records or objects. It consolidates records
stored in various files into common pool of data records that provide data for several users. Fig. 1.1 shows the databases, files, records and fields. E.g.: 1. The timetable for an entire school showing the details of classes, subjects, room, teachers etc. A Personnel database consolidates data files like, Payroll files, Personnel action files, employee skill files etc.
Payroll File
Inventory File
Employee Rec # 1
Malt Rec # 2
Fig.1.1 Database, Files,Fig.1.1 Records and Fields Database, Files, Records and Fields
10
as wastage of storage space. In future the customer address is to be changed, then we need change in more than one places, this may lead to inconsistency of the data.
11
5) They are more secure. 6) They are multiple user oriented. 7) They have shared data. 8) They have complex and sophisticated backups/ recovery.
5) 6) 7) 8)
They are not secure. They are often single user oriented. They have isolated data. They have simple, primitive backup/ recovery mechanism.
1.5 DATABASE
A database is a collection of stored operational data used by the application systems of some particular organization. It is defined as the collection of logically interrelated data and description of this data, designed to meet the information needs of an organisation. For example, a dictionary, a telephone directory, student record register etc. They all store data in some particular arranged form.
Data
Each database is a repository or storage of the data. The database is integrated and shared. Integrated means the whole data is available in one single place. The term-shared means the individual data items to database can be shared among several users. A database is not just shared by users sequentially but also concurrently, that is at the same time. A database system supporting this form of sharing is called multiuser system.
Software
Database management system (DBMS) is a software lies between the physical database and users of the system. All the requests coming from users for data manipulation are handled by the DBMS. DBMS shields the database users from the hardware level details and supports users operation by retrieving data for a query like Select all the employee records whose salary is more than 10,000 per month
Hardware
The hardware of DBMS consists of two components, namely:
12
a) Processor and main memory b) Secondary storage devices.
The processor and main memory are required to support the execution of DBMS. The seoncary devices like hard disk, CDs etc., are used to store data.
Users
There are three types of users:
End-user
Naive users
Fig: 1.1. Database users.
Sophisticated users
The main reason for using DBMSs is to have central control over both the data and the programs that access those data. The person who has such a central control over the database system is called the Database administrator (DBA). Applications programmers are persons who write application programs for using database. These persons uses programming languages for writing these programs which manipulates the database. End users are the persons who interact with the database using application programs written by the application programmers. These persons know how to use the programs, but they do not know how exactly the programs have been written. End users are classified as follows: Naive users: They are usually unaware of DBMS. They dont know anything about database or DBMS. Sophisticated users: The sophisticated end-users is familiar with the structure of the database and the facilities that DBMS provides. Figure 1.1 shows Database users.
13
14
The disadvantages are
Problems associated with centralization of data: Centralization leads to accessing the data from one single place, the data has to be maintained properly, so that only authorized person will access the data present in the database. Cost of hardware and software: DBMS software usually costs more. We need to spend for developing the required software depending on the application even after purchasing the DBMS software. Hardware needed depends on the amount of data to be maintained and manipulated. Cost of migration: The replacement from one type of database to another also costs more. Complexity associated with backup and recovery: As the data in the database increases the complexity with backup and recovery during failure increases.
Such an architecture was proposed by the American National Standards Committee on Computers and Information Processing. Most current DBMSs support, to various extents, the separation of the physical database, the conceptual schema, and the user view. The three-tier DBMS architecture is shown in figure 1.2. The external or user view is at the highest level, where it concerns to a user or application program. A schema called an external schema describes each external view. This level is concerned with the way the data is viewed by individual users. End users can be of any degree of sophistication, and/or with different authorisations. As a result, they may need to be given different views of the database. An individual view is just a subset of the conceptual schema. Conceptual or global view represents the entire database. The conceptual schema defines this conceptual view. The description of data at this level is in a format independent of its physical representation. It also includes various features that specify various checks to maintain data consistency and integrity. It is also a representation of the entire information content of the database in a form that is more abstract in comparison with the way in which the data is physically stored i.e. it is the logical description of the entire database, the overall logical view of the data and their relationships, as seen by database developers, by the system administrators, and by the authorized users who require access to the entire database.
15
End users view End users view External Schema B External view A External view B
External Schema A
External/Conceptual mapping A
External/Conceptual mapping B
Conceptual view
Conceptual / Internal mapping A Physical or Internal schema Stored database (Internal view)
Fig: 1.2. Three-tier DBMS architecture.
Internal view is closest to the physical storage, indicates how data will be stored and describes the data structures an access methods to be used by the database. The internal view is defined by internal schema. It is concerned with the way that data is physically stored. This is an aspect of the database seen only by system programmers concerned with issues such as performance optimisation. Ordinary users are not to be involved at this level. The three schemas are only descriptions of data. The only place that the data is actually stored is at the physical level. Mappings between the three levels are provided by the DBMS.
16
In the case of DBMS data independence is defined as the capacity to change a schema at one level of a database system without having to change the schema at the next higher level. There are two levels of independence: Physical data independence which insulates applications from the underlying physical storage organisation of the data, i.e. changes at the physical level do not have to affect the conceptual schema or the external schema. Logical data independence which insulates applications from changes made to the logical organisation of data, i.e. changes made to the conceptual schema should not affect the individual views unnecessarily.
User
User
User
Creation of database involves specifying data types, structures and their relationship constraints for the data stored in database. Construction of a database is the process of storing the database, by populating data in it in the computer storage medium.
17
Maintenance of database includes such functions as updating and accessing the data in the database to reflect changes in the real world. E.g.: Let us consider a college environment, wherein we need to maintain data about class scheduling. Data like a) Courses and sections b) Subjects to be taught for each course c) Teachers teaching the subjects d) Rooms in which classes are held e) Timing for teaching the subject. The basic entities in this example are subjects, courses, teachers, rooms, students etc.; there will be associations or relationships linking these entities. E.g.: Subject and Teacher have N: M association. A teacher may teach several subjects. Several teachers may teach a subject.
A user of the database gives a request for accessing the record stored in the database using data manipulation language (DML) DBMS takes the request from the user and interprets it DBMS inspects the database DBMS carries out the required operation on the stored database
18
3. Provide prompt response to the users request for data. 4. Allow for the modification of data in a consistent manner. 5. Eliminate or reduce the redundant data. 6. Allow multiple users to be active at a time.
19
Languages: to create, use, and maintain the database. Utilities: to provide support facilities such as report generation, graphical output, statistical operations, and various interfaces. Operational routines: for run-time management such as back-up and recovery, and for concurrency control.
DBMS Languages:
The categories of DBMS language are shown in the following figure 1.3.
20
Languages of DBMS
DDL
DML
Non-procedural DML
e.g., PL/SQL
Relational algebra
Relational calculus
SQL
QBE
QUEL
This is the storage definition language to specify the internal schema. The schema DDL is a high level notation for describing the record types and relationships existing in the database in terms of a underlying data model. Considerations of physical details such as storage structure are not to be involved in the DDL at this level because of data independence. The view definition language (Sub-schema DDL) is to describe a view of the database and specify the mapping to the conceptual schema.
21
An interactive command language (like a DOS/UNIX command language). A library of pre-defined procedures that may be called by application programs written in a standard programming language. A procedural programming language, unique to a particular DBMS, but which may be based on a standard language with added facilities for data manipulation.
Some DBMSs do not often provide distinct DDLs and DMLs. Instead they offer an integrated language which combines the capabilities of both the DDL and DML.
General features of a QL
QLs range in power and sophistication from semi-procedural interactive programming languages to very high level natural languages. High level QLs are convenient for naive end users. Most QLs lack the power of conventional languages to perform complex computations.
22
system. Single-user systems support only one user at a time. These type of systems are mostly used with personal computers. Multi-user systems, supports multiple users concurrently. Majority of DBMSs is of these types. A third criterion is the number of sites over which the database is distributed. Databases can be centralized or distributed. If whole data is stored at a one single site, the DBMS is called centralized DBMS. In case of distributed DBMS the actual database and DBMS software is distributed over many sites. A fourth criterion is based on cost of the DBMS. Multi-user DBMS packages costs more, compare to Single-user packages.
Operational Databases
These databases store detailed data needed to support an entire organization. They are also called subject area databases, (SADB) Transaction databases and Production databases. These databases carry up-to-date information of business activities. Business supervisors in charge of day-to-day operation most frequently use them.
Analytical Databases
These databases contain information extracted from operational databases. They are used by the managers to study the trends and patterns emerging in the business to make strategic decisions and policy making. They are also known as Data warehouses, information Databases and Decision support Databases. They are generally used in query mode rather than update mode. Techniques like online Analytical Processing (OLAP) and Data Mining are used in these databases to generate meaningful information for business analysis, market research etc,
Distributed Databases
Many of the contemporary applications have geographical distribution. Advent of networking technology has made it possible to distribute the database across several computers connected in a network. This improves local access of data, and remote update without increasing the load on networks. Hence many organizations distribute copies or parts of databases to computer systems at different sites, linked to each other through networking. Such databases over a network of computers are known as Distributed Databases. Ensuring that all of the data in an organizations distributed databases are consistently and concurrently updated is a major challenge of Distributed Database Management.
23
Multimedia Databases
These databases include non-conventional data like, pictures, voice tracks along with conventional alphanumeric data. These databases tend to be huge in size and access is done through specialized access language constructs. The data accessed further needs to be interpreted and displayed by additional frontend software like Browsers and media players. From database management viewpoint, the set of interconnected multimedia data needs to be handled as specialized structures rather than simple records.
High level or conceptual data models provide concepts that are close to the way in which most users perceive the data. Low level or physical data models provide concepts that describe the details of how data are stored at the physical level. Representational or implementation data models are between the above. They provide concepts that may be understood by end users, and also have a close link with the low level data representations.
24
The three important data models are,
Hierarchical data model Network data model Relational data model Object oriented data model
An early data model widely used in 70s was HIERARCHICAL Model where the model captures the intuitive hierarchy of data elements. User is allowed to navigate through the data structures using the tree like hierarchies. The early generation database from IBM, namely IMS, is based on this model. Hierarchical models cannot represent many to many relationships in an elegant fashion. Such data relationships resulted in cumbersome structures with lot of duplication of data and slow access. To get over these limitations CODASYL committee proposed a NETWORK MODEL in 70s and 80s. IDMS from cullinet, DMS 1100 from Unisys Corporation, are typical representatives of this generation of databases. While the network model provided much more abstraction power and very good performance for large volume data, it lacked elegance. It required high level of skills to use these databases. Further, it was difficult to dynamically alter the structures. Mr. Codd of IBM later proposed an elegant and flexible RELATIONAL MODEL. The elegance, simplicity and a solid theoretical foundation made this the darling of database developers and users. Today, this is the most popular database available on range of machines from PCs to mainframes. DB2 of IBM, ORACLE, INFORMIX, ACCESS, LOTUS etc., are all based on this popular model. DBMSs built using this model use SQL (Structured Query Language) as the means to create and manipulate data. SQL is an elegant, simple yet powerful interface to all relational databases. The present day RDBMSs provide support for several other tools and utilities to ease application development. Most common utilities are A screen designer to generate user-friendly fill in form type interface to access and manipulate data. e.g.: ORACLE FORMS Report Generator to access data and present it in a printed format suitable for the end user. e.g.: ORACLE REPORT GENERATOR Utilities to load and extract Bulk data from the database are provided to speed up data loading and extraction. e.g.: Import, Export Features of ORACLE. DBA utilities to, manage security and limit access to data. Current generation DBMS packages provide most of these above utilities along with some more to
25
manage Databases effectively. They in fact, create a total environment under which the user can comfortably handle all his information processing needs. The object-oriented data model where objects and their relationships represent a database.
The hierarchical data model where a database is represented by tree-like structures. If the data are not naturally hierarchical, then this model imposes quite severe restrictions on the database designer/developer. The network data model where a database is represented by a directed graph, the nodes of which represent the data entities (of record types), and the arcs of which define the relationships among the entities. The relational data model, based on the mathematical notion of a relation. In this model both the data entities and their relationships are represented by two dimensional tables. The object-oriented data model where a database is represented by objects and their relationships. Object-oriented database systems have their origins in object-oriented programming languages (such as Smalltalk and C++). An object may be viewed as an information item that closely resembles the object in the real world. Some novel concepts, such as class hierarchies and class composition hierarchies, are provided so that the object-oriented model is closer to the high level data models and yet not far away from the physical level data models. Concepts are also provided to enable an effective transformation between the object-oriented model and a low level model.
1.14 SUMMARY
In this chapter we discussed many concepts on database, database systems and data base management systems. We defined a database as a collection of related data, where data in nothing but recorded facts. We also said a database is a collection of operational data that is organized so that its contents can easily
26
be accessed, managed, and updated. Database system is a collection of database, a DBMS, hardware and users. Database Management system is application software supplied by the vendors, which helps in managing the database. We discussed about data and information, data types, data representation, and data size. A flat file consists of only one file, with each entry in the form of a record containing all the required data defined within it. File oriented approach to data processing suffers from number of significant disadvantages. We said database system has several advantages and disadvantages over traditional file based systems. The advantages are
Redundancy can be minimized Inconsistency can be avoided The data can be shared Security can be enforced Integrity of Data can be kept intact Standards can be enforced Backup and recovery can be provided
Problems associated with centralization of data Cost of hardware and software Cost of migration Complexity associated with backup and recovery
We discussed about data independence, most important concept in DBMS, and architecture of DBMS. The architecture contains three views, external, conceptual and internal. Then we presented databases and their management, objective of DBMS as well as its components. The role of DBA is also most important in maintaining the database. We presented important functions of DBA. We listed how database is classified based on several criterias like, data models, number of users, number of sites, and cost. We then discussed various types of databases, like, operational databases, analytical databases, distributed database, personal end user databases, multimedia databases and special purpose databases. Finally, we discussed briefly various data models used for maintaining databases.
27
CHECK YOUR
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22.
PROGRESS
A database is a collection of ____________ data Database Management system is ______________software Data are ________facts DBMS expands to _________________ All data in computer must be represented using only 2 symbols namely __ and __. A field represents an ________of some entity A record, is a collection of ___________that describe an entity. A group of related records is a __________. A __________is an integral collection of logically related records or objects. A flat file consists of only _______ file The person who has central control over the database system is called the ___________ Applications programmers are persons who write ____________ The external or user view is at the _________ level ___________ view represents the entire database. Internal view is closest to the _____________ Meta Data is store in ______________ DDL expands to ___________ DML expands to ______________ Hierarchical, network, relational, object and object relational are called _________ Multi-user systems, supports multiple users ____________ supervisors in charge of day-to-day operation most frequently use ______________ databases. The databases include non-conventional data like, pictures, voice tracks along with conventional alphanumeric data are called _________ databases.
The three important data models are ________ , ____________ and ____________. ___________ committee proposed NETWORK MODEL __________ of IBM proposed an elegant and flexible RELATIONAL MODEL.
28
ANSWERS
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. Operational Application Raw Database Management Systems 0 and 1 attribute attributes data FILE DATABASE One Database administrator application programs highest Conceptual or global physical storage Data dictionary Data Definition Language Data Manipulation Language Data Models Concurrently Operational Multimedia Hierarchical, network and relational CODASYL Mr. Codd
29
3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26.
Explain how data is classified. Giving examples explain how data is represented. Explain different types of data relationships. Give examples. How data is organized ? Explain. What is a record? Give example. What is a file ? Explain. Give examples. What is a flat file ? Explain. What are the disadvantages of file oriented approach ? Explain. What is a database? What are the components of database system ? Explain. Explain briefly different type of database users. Explain clearly the advantages f database systems. What are the disadvantages of a DBMS ? Explain. What is data independence? Explain. Explain the architecture of a database system with figure. Explain the schematic of database management systems. List the steps associated with accessing data from the database. List the objectives of DBMS. What are the components of DBMS? Explain. Who is DBA? List and explain the functions of DBA. Explain the classification of DBMS. Explain briefly major type of databases. Explain multimedia and special purpose databases. Explain briefly three important data models. Write a note on DBMS languages along with figure.
REFERENCES
Bipin C. Desai, An Introduction to Database System, Galgotia Publications, New Delhi. Elmasri & Navathe, Fundmentals of Database systems, Addision Wesley. Rajesh Narang, Database Management Systems, Preentice-Hall, New Delhi.
Chapter 2
2.0 OBJECTIVES
Storage of information Record and Record Organization Files and file Organization Sequential file organization Index Sequential file organization Direct file organization
30
31
The CPU contains registers which holds the arguments ( i.e. operands or information) of the arithmetic computations. Besides storing operands and results from arithmetic operations registers are also used to temporarily store program instructions and control information concerning which instruction that is to be executed next. Because of their highly specialized nature, registers have a great deal of combinational logic (i.e. circuitry) associated with them. This makes them expensive relative to storagetype memory units in computer. Consequently, registers are only used to store information temporarily.
32
respectively. External storage devices have a larger capacity and are less expensive per bit of information stored than in main memory. The time required to access the information however is much greater with these devices. The primary use of external storage device include 1. Backup of programs during execution. 2 Storage of programs and subprogram for future use. 3. The storage of information in files The most common external storage devices in order of their initial development and use are magnetic tape, magnetic drum and magnetic disk.
Records Emp No: 98643 Emp No: 35679 Emp No : 34567 Name : Raksha Field Address : 23 B/29 Shivaji Nagar Date of Birth : 23-July-1968 Blood Group : A Doctor : Dr. Ram Dept. : Cardiology City : Bangalore Record
Fig: Records and fields.
33
34
between the records of the files than such collection of files as often referred to as a database or data bank. The following Figure shows the information structure hierarchy as it applies to a file processing application.
Database
File
File
File
Record
Record
Record
Item
Item
Item
Let us examine that some of the factors that effect the organization of a file. The prime factor, which determines the organization of a file, is the nature of operations that are to be performed on the file, as dictated by applications. The operations normally performed are namely, retrieval, addition, deletion and updation. A particular operation involving a record or set of records is called transaction. e.g.: Delete Raja from the student list for the First Year is a transaction. Add Joseph to student list for First Year
35
example it is unreasonable to bit-encode an item representing the net sales for the month. We can declare a record containing such an item in the programming language being used. Record : Monthly_Report Month Char(10), Net_sales Fixed (7,2) The net sales item can be range in value from -99999.99 to 99999.99. It unrealistic for the programmer to bit encode such wide range of item values when the compiler provides en efficient encoding of an item value in binary with a fixed decimal format. Record item represented by Month can be significantly reduced in size if we use a fixed-length a binary code of 0000B for January, 0001B for February,..... 1011B for December and declare the item to be type BIT(4). Because both of these items may be considered as fixed length items, they can technically be called precoordinated. That is fixed length item can only have a finite set of values which can be priori enumerated.
36
characteristics of storage device that influence the selection of a storage device, once the appropriate file organization techniques have been determined. Whether the device, allows direct access to particular record occurrences without accessing all physically prior record occurrences that are stored on the device, or allows only sequential access to record occurrences. Magnetic disks are examples of direct access storage devices, magnetic tapes are examples of sequential storage devices.
37
files are ordered by a key or index item, such as employee name, student identification number when the file is created. The key or index item should be the item, which is most often searched for when processing the file. To show the importance of the key selection, assume the MASTER file of employees identification number. Suppose we want to find the records of a number of employee given only there names. Finding the first employees record, say AGARKER is simply a matter of serially processing the file until the record with name item of AGARKER appears. Consider the processing of a second record, say for BAKER. Since the position of BAKERs record bears no relationship with position of AGARKER record, we have no alternative but to start once again serially processing at the beginning of the MASTER file. There are occasions in which, serial processing is all that is required on a file irrespective of the key or item index upon which the file is ordered. For example, if we are to add a pay increase of 1000 Rupees the wage item of all employees, it is irrelevant whether the file is sequenced by name or by employees identification number. In Sequential processing, transaction records are usually grouped together and sorted according to the same index item as records in the file. Each successive record of the file is read, compared with an incoming record and then processed in a manner that is usually dependent upon whether the value of the record index item is less than, equal to, or greater than the value of the index item of the transaction record. Sequential and serial processing are most effective when high percentage of the record in file must be processed. Since every record in the file must be scanned, a relatively large number of transactions should be grouped together for processing. If records are to be added to a file, it is necessary to create a new file unless the records are to be added to the end of the file. Important points of the sequential process of sequential files. 1. Sequential processing is most advantageous if a large number of transaction can be grouped to form a single run on the file. 2. A new file should be created if there are any additions and deletions requested. 3. Quick response time should not be expected for a transaction or a batch of transactions.
38
39
well as sequential processing. An index sequential file consists of three separate areas: the prime area the index area and the overflow area. An additional feature of this file system is the overflow area. This feature provides an additional space for record addition without necessitating the creation of a new file. The prime area is an area into which data records are written when the file is first created. The file is created sequential, that is, by writing records in prime area in a sequence dictated by the alphabetical ordering of the keys of the records. The cylinder of a disk. When this cylinder is filled writing continuos on the second track of the next cylinder and continues in this fashion until the files creation is completed. If the newly created file is accessed sequentially according to the key item, the records are processed in the order they were written.
SSN
JOB
SALARY
SEX
:
Acosta, Marc
40
Block 2
Adams, John Adams, Robin : Akers, Jan
Block 3
Wright Pam Wyatt,Charles : Zimmer, Byron
Figure 2.4: Some blocks on an ordered (sequential) file of Employee records with name as the ordering field
To create a primary index on the ordered file shown in figure 2.4, we use the Name field as primary key, because that is ordering key field to the file. Each entry in the index will have a Name value and pointer. Figure 2.5 illustrate this primary index. The total number of entries in the index will be the same as the number of disk blocks in the ordered data file. The first record in each block of the data file is called the anchor record of the block, or simply the block anchor similar to one described here can be used , with last record in each block, rather than the first, as block anchor, a primary index is an example of what is called non-dense index because it includes an entry for each disk block of the data file rather than for every record in the data file. A dense index, on the other hand, contains an entry for every record in the file. The index file for a primary index needs substantially fewer blocks than the data file for two reasons. First there are fewer index entries than there are records in the data file because an entry exist for each whole block of the data file rather than for each record. Second each index entry is typically smaller in size than a data record because it has only two fields, so more index entries than data records will fit in one block. A binary search on the index file will hence require fewer block accesses than a binary search on the data file.
41
DATA Figure 2.5
Major problem with primary index as with any ordered file is insertion and deletion of records. With primary index, the problem is compounded because if we attempt to insert in its correct position in the data file., we not only have to move records but also change some index entries because moving records will change the anchor records of some blocks. We can use unordered overflow file. Another possibility is to use a linked list of overflow records for each block in the data file. We can keep the records within each block and its overflow-linked list sorted to improve retrieval time. Record deletion can be handled using deletion markers.
42
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
9 5 13 8 6 15 3 17 21 11 16 2 24 10 20 1 4 23 18 14 12 7 19 22
Figure 2.6: A dense secondary index on a non ordering key field of a file
43
We again refer to the two field vales of index entry i as K(i), P(i). The entries are ordered by value of K(i), so we can use binary search on the index. Because the records of the data file are not physically ordered by value of the secondary key field, we cannot use block anchors. That is why index entry is created for each record in the data file rather than for each block as in the case primary index. Figure 2.6 illustrates a secondary index on key attributes of a data file. Notice that in figure 2.6 the pointers P (i) in the index entry are block pointers, not record pointers. Once appropriate block is transferred to main memory. A search for the desired record within the block can be carried out. A secondary index will usually need substantially more storage space than primary index because of its larger number of entries. However, the improvement in search time for an arbitrary record is much greater for a secondary index than it is for a primary index. Because we would have to do a linear search on the data file if the secondary index did not exist. For primary index, we could still use binary search on the main file even if the index did not exist because the records are physically ordered by the primary key field.
611
661 2
661 4
661 8
662 4
Multiple record belonging to same logical area may be chained to maintain logical sequencing. When records are forced into overflow area as result of insertion, the insertion process is simplified, but the search time is increased. Deletions of records from index-sequential files create logical gaps; the records are not physically removed but only flagged as having been deleted. If there were a number of deletions, we may have great amount of unused space.
44
45
reorganization will not be that expensive. Once the bucket address is generated from the key by hash function, a search in the bucket is also required to locate the address of the required record. However the bucket size is small, this overhead is small. The use of the bucket reduces the problem associated with the collisions. In spite of this, a bucket may become full and the resulting overflow could be handled by providing overflow buckets and using a pointer from the normal bucket to an entry in the overflow bucket. All such overflow entry are linked. Multiple entries from the same bucket results in a long list and slow down the retrieval of these records. In an alternative scheme, the address generated by the hash function is bucket address and the bucket is used to store the records directly instead of using a pointer to the block containing the record. Let S represent the value: S = upper bucket address value - Lower bucket address value + 1 S gives the number of buckets, simple hashing functions h(k) = k mod s, where k the numeric representation of the key and h(k) produces a bucket address. Simple Hashing Functions are given below 1) Use the lower order part of the key. For key that is consecutive integers with few gaps, this method can be used to map the keys to the available range. 2) Square all or part of the key and take a part from the result, the whole or some defined part of the key is squared and number of digits are selected from square as being part of the hash result. A variation is the multiplication scheme where one part of the key is multiplied by the remaining part and a number of digits are selected from the result. 3) End Folding, for a long key, we identify start, middle and end regions, such that sum of the lengths of the start and end regions equals the length of the middle region. The start and end regions are concatenated of digits is added to the middle region digits. This new number, mod s where s is the upper limit of the hash function, gives the bucket address: 123456 123456789012 654321 for above key the end folding gives the two values to be added as 123456654321 and 123456789012
496
176
176 Bucket2
177
610
362 Bucketn
47
2.6 SUMMARY
All businesses need to process data. Processing the data is necessary to obtain useful information. As data volume increases, the data processing becomes highly complex. Computers are used in this process. One important aspect of this computerized data processing is the storage and retrieval of data. Databases provide this functionality and DBMS packages are software tools to implement databases. Data as an entity has several important properties like form, size, organization and relationships. The form of data namely numeric, alphabetic, integers and real numbers represent the different types of data stored in databases. Size of the data plays a central role in deferring the volume of database and techniques needed to store them. Organizing and grouping of the data, into characters, fields, records and files of define the basic building blocks of the database. Databases are classified into different types of databases based on their usage. Different Data Models have resulted in different kinds of databases that provide the basic service of storage and retrieval of the data. In this chapter, we discussed storage of information in register, main memory and secondary memory. A register is used for the temporary storage and manipulation of information. Registers are also used to temporarily store program instructions and control information concerning which instruction that is to be executed next. The storage type memory unit is designed to store information, which is more permanent in nature. An external storage device may be defined as a device other than main memory on which information or data can be stored and from which the information can be retrieved for processing of some subsequent
48
point in time. The storage and retrieval operations are referred to as writing and reading, respectively. External storage devices have a larger capacity and are less expensive per bit of information stored than in main memory. The time required to access the information however is much greater with these devices. The primary use of external storage device include 1. Backup of programs during execution. 2. Storage of programs and subprogram for future use. 3. The storage of information in files The most common external storage devices in order of their initial development and use are magnetic tape, magnetic drum and magnetic disk. A file is a collection of logical information. Each file has an associated file name. A file contains many records. One record consists of one or more fields. A field that identifies a record is called the record key. A primary key identifies a record uniquely. A secondary key may or may not identify a record uniquely. Records can be stored in two forms, fixed length and variable length. There are three fundamental file organization techniques, These are sequential, Index-sequential and direct file organization. The selection of the appropriate organization for a file in an information system is important to the performance of that system. The fundamental factors that influence the selection process include the following: 1 Nature of operation to be performed 2 Characteristics of storage media to be used. 3 Volume and frequency of transaction to be processed 4 Response time requirements.
49
In a _______ file, records are stored one after the other on storage device. A sequential file that is indexed is called an _________________ The index provide the _________ access to records The usual method of direct mapping is by performing some ________manipulation of the key value. A _________ is said to occurs when two distinct key values are mapped to the same storage location. Hashing is very good for _______ keys
Answers
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. Register Storage Writing and Reading. Fields. Key. Transaction. File Organization. Magnetic disks. Magnetic tapes Sequential Index sequential file Random Arithmetic Collision Large
50
6. 7. 8. 9. 10. 11. 12. 13. 14. 15. What do you mean by file organization? Explain. Explain clearly sequential file organization.
Explain briefly the advantages and problems associated with sequential organization. What is an index sequential file? Explain. Explain different types of indexes. What are secondary indexes? Explain. Explain the structure of an index sequential file. Explain the concept of direct file organization. Explain some hashing techniques. Explain the advantages and disadvantages of hashing.
References
1. Tremblay and Sorenson, An introduction to Data structures with applications 2nd Edition 1984, Mc Graw Hill publications Bipin Desai, An Introduction to data base system, Golgotia Publications New Delhi.
2.
Chapter 3
En t i t y-Rel at io n sh ip Mo d el
3.0 OBJECTIVES
Entities and Attributes Attribute types Keys Relationship type and sets Weak entity types Nonbinary relationship Entity-Relationship Diagrams Reducing ER Diagrams into Tables
3.1 INTRODUCTION
The database design consists of three components: Conceptual design on the basis of user requirements, Data modeling (use of E-R diagrams and normalization) and physical design and Implementation. The Entity-Relationship (E-R) model is used as an information model to develop conceptual structure. The E-R data model considers the real world consisting of a set of basic objects (entities), attributes for these entities and relationships among these objects. The ER model describes data as entities, relationships BSIT 24 Basics of DBMS
51
52
and attributes. E-R diagram uses graphical notations to represent them. The diagram is documented as entity-Relationship diagram.
Requirement analysis: to obtain a clear and concise description of the application area to be modelled and to derive information about the nature and volume of data to be stored and processed. Data modelling: to develop a global design for the database with the ultimate objective of achieving an efficient implementation which satisfies the requirements. Implementation: to transfer the design into a database system which operates under the control of a particular DBMS. Testing: to discover any errors that have risen during the modelling and implementation phases and to ascertain, in conjunction with the user community, whether the system satisfies the information demands of users and the requirements of application programs. Maintenance: to correct errors discovered during testing; to modify the system due to changes in users requirements and to improve system performance and user interfaces.
simple versus composite, single-valued versus multivalued, and Stored versus derived.
53
We first define these attribute types and illustrate their use via examples. We then introduce the concept of a null value for an attribute.
The above figure shows a hierarchy of composite attributes. Composite attributes are useful to model various situations in which a user sometimes refers to the composite attribute as a unit but at other times refers specifically to its components. If the composite attribute is referenced only as a whole, there is no need to subdivide it further into component attributes. For example, if there is no need to refer to the individual components of an address (pin code, street, apartnumber and so on), then the whole address can be designated as a simple attribute. Similarly there is no need to refer to the individual components of a name( first name, middle name, last name) then the whole name can be designated as a simple attribute.
54
Similarly, one person may not have a college degree, another person may have one, and a third person may have two or more degrees, therefore, different persons can have different numbers of values for the CollegeDegrees attribute. A person can have one or more dependents, Such attributes are called multivalued. A multivalued attribute may have lower and upper bounds to constrain the number of values allowed for each individual entity. For example, the Colors attribute of a car may have between one and four values, if we assume that a car can have at most four colors. The CollegeDegrees attribute may have between one and three values, if we assume a person can have at most three degrees. Similarly the dependents attribute can have values from one to six, depending on number of dependent names.
55
3.6 KEYS
Differences between entities in an entity set must be expressed in terms of attributes known as keys. These facilitate us to uniquely identify each entity in a set. Keys can be of various types, they are
Super Key
It is a set of one or more attributes, which put together, enable us to identify uniquely an entity in the entity set. For example STU_NAME and ROLLNUM form a super key for the entity set Student. But STU_NAME alone can not act as a super key since two students could have the same name.
56
Key
A superkey is a set of attributes that uniquely identifies every entity in the entity set, while a key is a minimal set of such attributes. The word minimal comes from the fact that we cannot exclude any attribute from a key and still identify an entity uniquely. To understand this concept let us consider an example. Consider an entity type student containing two attributes STU_NAME and ROLLNUM. Then the concept of superkey and a key are, both the attributes put together make up the superkey. However it is not a key because it is not a minimal set of attributes. On the other hand ROLLNUM is a key, because it is a minimal set of attributes that can identify an entity uniquely in the entity set.
Composite Key
There are situations exist where a single attribute cannot constitute a key. That means a single attribute can not uniquely identify every entity in the entity set. In such situations we need to have two or more attributes together in order to identify every entity in the entity set uniquely. A key consisting of two or more attributes is called as a composite key For example, consider an entity type SUPPLIER_PART with attributes SUPPLIER_ID, PART_ID , QUANTITY. If you observe neither the SUPPLIER_ID nor the PART_ID can identify an entity in the entity set uniquely. However, the two of them together can easily identify any entity in the entity set uniquely. Hence it is a composite key.
Candidate key
A superkey may contain extraneous attributes, and we are often interested in the smallest superkey. A superkey for which no subset is a superkey is called a candidate key. For example in the entity type STUDENT, with attributes STU_NAME, ROLLNUM, ROLLNUM is a candidate key, as it is minimal, and uniquely identifies a Student entity. For example consider again the entity type salespersons containing the attributes SNUM,SNAME, REGION, QUANTITYSOLD, COMMISSION. From the list of attributes it may appear that apart from SNUM, SNAME can also be a key. This assumption is correct as long as we have unique sales person names. However if we cannot make this assumption, SNAME cannot be a candidate key. Now if we add one more attribute into the salespersons entity called PASSPORT_NUM, this can certainly be another candidate key, because we can identify a person uniquely based on the passport number. With this SNUM and PASSPORT_NUM are the two candidate keys.
Primary Key
The primary key identifies every entity in the entity set uniquely. It is a candidate key (there may be more than one) chosen by the database designer to identify entities in the entity set uniquely. When we have two or more candidate keys, we have to decide which of them becomes the primary key. Examples
57
for primary keys are ROLLNUM from Student entity, SNUM or PASSPORT_NUM from Salespersons entity. The criterion to choose a primary key from set of candidate keys is based on day-to-day working as well as data entry.
58
EMPLOYEE
WORKS ON
A binary relationship
PROJECT
SKILL
USES M N PROJECT
A ternary relationship
P PERSON
59
Payment-date
Loan-no
Amount
Payment-no
Payment amount
Loan payment
As shown in figure, the attribute payment-number acts as a discriminator for the payment entity set. It is also called as partial key of the entity set. Then, how to form the primary key of a weak entity set? The rule is The primary key of a weak entity set is formed by the primary key of the strong entity set on which the weak entity set is existence dependent plus the weak entity sets discriminator. In this example, (loan-no, payment-no) acts as a primary key for payment entity set. The relationship between the weak entity and strong entity set is called as an Identifying Relationship. Like in our example above, loan-payment is the identifying relationship for the PAYMENT entity.
60
Notations used in ER-diagrams Sl.No. Symbol
Meaning
1.
Entity type
2.
Diamonds
3.
Diamonds
Relationship
4.
Identifying relationship
5.
Attribute
6.
7.
Multi-valued attribute
8.
Derived attribute
Composite attribue
Total participation of E2 in R
11.
61
The rectangles, ovals, diamonds and lines are important graphical symbols used for representing the entities, attributes and relationships in an E-R diagram.
Diamonds : Represents relationships among entity sets Lines : Link attributes to entity sets and entity sets to relationships
62
One-to-One : An entity in A is associated with at most one entity in B, and an entity in B is associated with at most one entity in A. Example: Each Department has one manager and each manager is associated with one department. Assume the attributes for both DEPT and MANAGER entities. One-to-Many: An entity in A is associated with any number of entities in B. An entity in B is associated with at least one entity in A. Example:
One department has many employees working for it, whereas one employee has only one department. That is for one occurrence of department there are many occurrences of employees, and so it is a one-tomany relationship. Many-to-One: An enity in A is associated with at most one entity in B. An entity in B is associated with any number of entities in A. Example:
Dept
Has
Employees
In case of one-to-many relationship, if we reverse the association from employee to dept , then it becomes many-to-one relationship. That is many employees are working for a department, and one department has many employees. That is for many occurrences of employees there is one occurrence of department, and so it is many-to-one relationship. Many-to-Many: Entities in A and B are associated with any number from each other.
Supplier
Supplies
Parts
63
A supplier supplies many parts to the same customer. A customer can buy the same part from many suppliers. This association between the entities in Supplier and Parts is an example for many-to-many relationship.
When we link weak entity to a strong entity, normally, these relationship sets are many-to-one, and have no descriptive attributes. The primary key of the weak entity set is the primary key of the strong entity set, and it is existence dependent on, plus its discriminator. For example in the above figure transaction is the weak entity and its existence depends on account, and hence primary key of transaction contains acc# and transaction# both. In this case the relationship A_T becomes redundant since the attributes of A_T are same as that of transaction, and is thus redundant.
64
Former
D ia m o n d s
Faculty
Teach
D iDepartment am onds
Heads
D ia m o n d s
65
Disadvantages
Limited Constraint representation: Difficult to represent all constraints using this model. Constraints such as the student grade point average ranges between 0.0 and 4.0. and a worker may not be allowed to work more than 10 consecutive hours of duty time can not be represented. Limited relationship representation: Relationships are represented with in the diagram as occurring between entities. No data manipulation language: manipulation commands. The ERM is not complete, because of lack of data
Loss of information content: The models/diagrams tend to become crowded when attributes are represented. Therefore, database designers usually avoid attribute mapping, thus decreasing the models information content.
66
67
- this will be the primary key of the entity table on the N side in the case of. This is useful if there will be few instances of the relationship type and thus there would be many null values in foreign keys.) Step 6. Each multi-valued attribute Ai : Let R be the table that has A as multi-valued attribute. Create a new table Si which includes the primary key (K) of R plus an attribute corresponding to Ai . The combination of K and Ai provides the primary key of Si For example we have the multi valued attribute Locations of DEPARTMENT in the ER diagram - to deal with this we create DEPT_LOCATIONS which includes the primary key of DEPARTMENT as a foreign key (DNUMBER) , as well as the attribute DLOCATION which is of course single valued, to represent the multi valued attribute Locations . The primary key of DEPT_LOCATIONS will be the combination {DNUMBER, DLOCATION}. Thus there will be a separate row in the table for each location of a department. Step 7. For each ternary or higher order (n-ary) relationship type R we create a new table S to represent it. We include as foreign keys each participating entity types primary key. The primary key of the new table is made up (as required) of the combination of these foreign key attributes. Comparison of terms: E.R. Model: entity type simple attribute composite attribute multivalued attribute Relational Model: entity table attribute set of simple atttributes table and foreign key(s)
1:1, or 1:N relationship foreign key (or relationship table) N:M n-ary relationship value set key attribute relationship table relationship table and n foreign keys domain primary key
3.18 SUMMARY
In this chapter we presented the modeling concepts of a high-level conceptual data model, the EntityRelationship (ER) model. The E-R data model considers the real world consisting of a set of basic objects (entities), attributes for these entities and relationships among these objects. We started by discussing the role that a high level data model plays in the database design process.
68
There are five phases of the life cycle The database system development process, they are, requirement analysis, data modeling, implementation, testing, and maintenance. We then defined the basic ER model concepts of entities and their attributes. The basic object that the ER model represents is an entity, which is a thing in the real world with an independent existence. Each entity has attributesthe particular properties that describe it. Several types of attributes occur in the ER model: simple versus composite, single-valued versus multivalued, and Stored versus derived. We also discussed null values. We then discussed various type of keys, like, super key, key, composite key, candidate key and primary key. We gave the concept of relationship types and sets, and also weak entity types. We then discussed the ER model concepts. Cardinality ratios (1:1, 1:N, M:N for binary relationships) Entity-Relationship schemas can be represented diagrammatically as ER diagrams. We also discussed how ER diagram could be reduced into relational model, by converting it into set of tables.
69
__________ Represent attributes _________ Represents relationships among entity sets _________ Link attributes to entity sets and entity sets to relationships For a binary relationship the cardinality ratio is ________ A weak entity set is indicated by a __________ box If a relationship is formed between three entities, then it is called ______ relationship. ________ indicates number of entities with which another entity can be associated via a relationship. More than one relationship is also called ____________
Answers
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. Entity. Attributes Composite Simple or atomic attributes. Single-valued Multivalued rectangular Multivalued Null Entity set. Keys. Composite key Candidate key. Uniquely dashed or dotted line graphical Rectangles Ellipses Diamonds Lines
70
21. 22. 23. 24. 25. two double-outlined ternary Mapping cardinalities. Nonbinary relationship
Explain weak entity type with an example. Explain the symbols used to draw E-R Diagrams. With the help of an example explain the E-R diagram. Explain giving examples various mapping cardinalities. Explain the following with the help of examples. i) Weak entity ii) Nonbinary relationship iii) Ternary relationship
17. 18.
Explain the steps of E-R to Relational mapping. Imagine and give your own examples for the following mapping cardinalities
71
i) One-to-One ii) One-to-Many iii) Many-to-One iv) Many-to-Many 19. Draw an E-R diagram for modelling the relationship between students who learn many subjects taught by many teachers. We need to maintain information such as the students names, addresses and divisions, and the teachers names and addresses. What are the candidate keys in the following relationships i) Employee_number, Employee_name, Employee_DL_number, Employee_Passport_number, Address, Qualification, Phone_number ii) Part_Id, Part_name, Part_description, Unit_price, Colour, Make_of_Part iii) Student_Roll_Num, College_Id_Num, Student_Name, Adress, Year_of_study, Class_obt iv) Account_num, Customer_name, Customer_address, Phone_number. 21. What are the primary keys in the follwing relationships ? i) Movie_ID, Movie_name, Actor_name, Actress_name, Director_name ii) Part_ID, Part_desc, Unit_Price, Quantity, Colour iii) Player_name, Runs_made, Match_number, Ground, Date iv) Roll_number, Student_name, Subject, Marks, Grade v) Emp_ID, Emp_name, age, salary 22. 23. Is a 1:N relationship is same as the N:1 relationship ? Why ? Here is the Company ER Schema Mapped to Relation Schema. Identify the primary key for each of these tables and underline them.
20.
EMPLOYEE(FNAME, MINIT, LNAME, SSN, BDATE, ADDRESS, SEX, SALARY, SUPERSSN,DNO) DEPARTMENT(DNAME, DNUMBER, MGRSSN, MGRSTARTDATE) DEPT_LOCATIONS(DNUMBER, DLOCATION) PROJECT(PNUMBER , PNAME, PLOCATION,DNUM ) WORKS_ON(ESSN, PNO, HOURS) DEPENDENT(ESSN, DEPENDENT_NAME, SEX, BDATE, HOW_RELATED)
72
24.Construct an E-R diagram for a car-insurance company that has a set of customers, each of whom owns one or more cars. Each car has associated with it zero or any number of recorded accidents. 25.Construct an E-R diagram for a hospital with a set of patients and a set of medical doctors. Associate with each patient a log of the various tests and examinations conducted. 26.Reduce the following E-R diagram into Relational data model (Note: Some of the symbols in the diagram are not explained here, please go through reference books for these.)
73
References
Bipin C. Desai, An Introduction to Database System, Galgotia Publications, New Delhi. Elmasri & Navathe, Fundmentals of Database systems, Addision Wesley. Rajesh Narang, Database Management Systems, Preentice-Hall,New Delhi. Atul Kahate, Introduction to Database Management Systems.
74
26.The following figure shows an E-R Schema diagram for the company database. Reduce this into Relational data model.
Dname Fname Minit Name Ssn Bdate Sex EMPLOYEE 1 Managers Startdate DEPARTMENT Lname Address Salary WorksFor 1 Dnumber Dlocation
Supervisior
Hours Workson
DEPENDENTS -OF
CONTROLS
Project
Birthdate Relationship
Pname Plocation
Chapter 4
Dat a Mo d els
4.0 OBJECTIVES
Data models Relational data model Network data Model Hierarchical data Model
4.1 INTRODUCTION
The architecture of database systems consists of three different views: internal, conceptual and external. The information stored in the database appears to the user at external level according to the approach used for storing the data in the database. There are three different approaches for storing the data. The relational, network and hierarchical approaches can be used in a data base management system to represent information to the users. Definition of Data Model: It is defined as an integrated collection of concepts for describing and manipulating data, relationships between data and constraints on the data in an organization. Data Model is a set of concepts to describe the structure of a database, and certain constraints that the database should obey. Data Model Operations for specifying database retrievals and updates by referring to the concepts of the data model.
75
Conceptual (high-level, semantic) data models: Provide concepts that are close to the way many users perceive data. (Also called entity-based or object-based data models.) Physical (low-level, internal) data models: Provide concepts that describe details of how data is stored in the computer. Implementation (record-oriented) data models: Provide concepts that fall between the above two, balancing user views with some computer storage details.
Title_ID T1 T2 T3 T4 T5 T6
Note: It is the view of data that enables the user to apply the powerful operations and expressions of relational algebra to data manipulations.
77
Publisher Table Pub_ID Pub_name Pub_city
T1 T2 T1 T4 T5 T6 T4 T4
Values Are Atomic Each Row is Unique Column Values Are of the Same Kind The Sequence of Columns is Insignificant The Sequence of Rows is Insignificant Each Column Has a Unique Name
Certain fields may be designated as keys, which means that searches for specific values of that field will use indexing to speed them up. Where fields in two different tables take values from the same set, a
78
join operation can be performed to select related records in the two tables by matching values in those fields. Often, but not always, the fields will have the same name in both tables. For example, an orders table might contain (customer-ID, product-code) pairs and a products table might contain (productcode, price) pairs so to calculate a given customers bill you would sum the prices of all products ordered by that customer by joining on the product-code fields of the two tables. This can be extended to joining multiple tables on multiple fields. Because these relationships are only specified at retreival time, relational databases are classed as dynamic database management system. The RELATIONAL database model is based on the Relational Algebra.
Advantages
Structural independence: The relational database model achieves the structural independence, means it is possible to make changes in the database structure without affecting the DBMSs ability to access the data. Therefore data access paths are irrelevant to relational database designers, programmers and end users.
79
Improved conceptual simplicity : The relational database model is much simpler in the conceptual level compare to hierarchical and network data models. We can concentrate only on the logical view of the data ignoring the physical data storage. Easier database design, implementation, management and use : It is easier to design and manage relational database because it achieves both data independence and structural independence. Ad hoc query capability : SQL can be used here as a query language to obtain the required information from the database by writing queries. A powerful database management system : The system complexity is hidden from the both database designers and the end user in the case of relational database.
Disadvantages
Substantial hardware and system software overhead : More powerful computers are required to perform RDBMS-assigned tasks. It needs software also in the form of operating system as well as for applications. Poor design and implementation is made easy: If database is designed without giving much thought to what it should contain, leads to improper design. Lack of proper design tends to slow the system down and to produce the data anomalies.
In many respects the network database model resembles the hierarchical database model. For example, as in the hierarchical model, the user perceives the network database as a collection of records in 1:M relationships. But unlike hierarchical data model, network model allows a record to have more than one parent. Using network database terminology, a relationship is called a set. Each set is composed of at least two record types: an owner record that is equivalent to the hierarchical models parent, and a member record that is equivalent to the hierarchical models child. A set represents a 1:M relationship between the owner and the member. In the relational model, the data and the relationships among data are represented by a collection of tables. The network model differs from the relational model in that data are represented by collections of records, and relationships among data are represented by links.
81
The data structure diagram for this is CustomerCustomer Customer-street Customer-city depositor AccountThe sample database in the network model is AAA800 1000 2000 account balance
Conceptual simplicity: Conceptual view of the database is simple. This provides design simplicity. Handles more relationship types: M:N relationships are easier to implement in this model, than in he hierarchical database model. Data access flexibility: Data access is more flexible compare to hierarchical database, an application can access owner record and all the member records within a set. Promotes database integrity: the user must define owner record type and then the member, this enforces database integrity. Data independence: The application provides written can be isolated from complex physical storage details. This achieves data independence. Conformance to standards: standards in the form DDL and DML can be imposed onto this data model. This greatly facilitates database administration and portability.
82
Disadvantages
Not user - friendly: The network model is not a design for user-friendly system and is a highly skill-oriented system. System complexity: The network database was not designed to produce a user friendly system. The data are accessed one record at a time, this leads to system complexity. Lack of structural independence: Even though the network data model achieves data independence, it does not produce structural independence, where changes to the structure of the database all application programs existing on to the data must be revalidated before they can access the database. Operational anomalies: Since network user pointers for navigation so its implementation becomes quite complex.
83
Hierarchical data model is a model comprising records stored in a general tree structure with one root record type that has zero or more dependent record types.
84
In general tree structure, there are two possible methods of accessing all the nodes (record types) within the tree. Pre-order traversal ~ access the root first and then proceed down the tree accessing the subtrees in order from left to right. Post-order traversal ~ start access at the bottom and proceed upwards accessing the subtrees in order from left to right and finishing with the root.
Conceptual simplicity: Hierarchical model becomes easier to view the database conceptually, and hence making its design process simpler. The relationship between various layers is logically simple. Database security: Database security is provided and enforced by the DBMS. Security is also enforced uniformly through out the system. Data independence: Data independence is nothing but when a change in data type takes place it will be automatically cascaded throughout the database by the DBMS. This feature eliminates the need to make changes in the program segments that references the changed data type. Database integrity: The hierarchical database promotes database integrity, because, the child segment is always referenced to its parent. Given the parent/child relationship, there is always link between the parent segment and its child segments(s). Efficiency: The hierarchical model is very efficient when a database contains a larger volume of data in 1:M relationships.
Disadvantages
Complex implementation: The database designer and programmer must have detailed knowledge of the physical data storage characteristics. Therefore the implementation a database design may be very complicated. Difficult to manage: Any changes in the database structure, such as relocation of segments, require changes to be made to all the application programs that access the database. Lacks structural independence: structural independence means when a change to database structure occurs it should not affect the DBMSs ability to access the data. But hierarchical
85
model does not support this, as any changes occur to the database the corresponding application programs get affected.
Application programming and use complexity: How the data is stored as well as accessed must be understood by the programmers, this restricts the programmer choice, for easy programming. Implementations limitations: Even though 1:1, 1:N relationships can be easily implemented, it is difficult to implement M:N relationship in this model. Lack of standard: There is precise set of standards which all can follow, as far as implementation is concerned. Due to this portability was limited, where it was difficult to move from one hierarchical DBMS to another.
Comparison of the three modesl in Tabular Form: Sl. No 1. Hierarchial Data model Network Data model Relational Data model
2.
3.
4.
5.
6.
Relationship between records Relationship between records Relationship between records is is of the parent child type is expressed in the form of represented by a relation that contains a key for each record (Trees) pointers or links (shapes) involved in the relationship. Many to may relationship Many-to-many relationship Many-to-many relationship can be easily implemented. cannot be expressed in this can also be implemented model. It is a simple, straightforward Record relationship Relationship implementation is and natural method of implemented is very complex very easy through the use of a key or composite key field(s). implementing record due to the use of pointers. relationships. This type of mode is useful Network model is useful for Relational model is useful for only when there is some representing such records representing most of the real hierarchical character in the which have many to may world objects and relationship among them. database. relationships In order to represent links In Network model also the Relational model does not among records, pointers are record relations are physical. maintain physical connection among records. Data is used. Thus relations among organized logically in the form records are physical. of rows and columns and stored in table. Searching for a records is very Searching a record is easy A unique indexed key field is difficult since one can retrieve since there are multiple access used to search for a data elements. a child only after going through paths to a data elements. its parent record.
In addition to the above three models, we can also add other two important models, they are
87
it provides higher performance management of objects, and it enables better management of the complex interrelationships between objects. This makes object DBMSs better suited to support applications such as financial portfolio risk analysis systems, telecommunications service applications, world wide web document structures, design and manufacturing systems, and hospital patient record systems, which have complex relationships between data.
4.7 SUMMARY
In this chapter we studied three important data models used to describe the structure of a database. Data Model is a set of concepts to describe the structure of a database, and certain constraints that the database should obey. Data Model Operations for specifying database retrievals and updates by referring to the concepts of the data model. There are three important categories of data models, they are, Conceptual (high-level, semantic) data models, Physical (low-level, internal) data models, and, Implementation (record-oriented) data models. The data models we studied are from category three and they are
A database based on the relational model developed by E.F. Codd. A relational database allows the definition of data structures, storage and retrieval operations and integrity constraints. In such a database the data and relations between them are organized in tables. Each table has a unique name. The network model differs from the relational model in that data are represented by collections of records, and relationships among data are represented by links. In the hierarchical model, records are organized as collections of trees, rather than arbitrary graphs. We also discussed examples for each of these data models and also the advantages, disadvantages associated with these models. In addition to these we also discussed other two important data models object/relational model and object-oriented model.
88
3. 4. 5. 6. 7. 8.
In relational database the data and relations between them are organized in ________ Each table in the relational database contains only one type of _________ a relationship in network model is called a __________ A set represents a _______ relationship between the owner and the member. In network model relationships among data are represented by ________ The hierarchical data model organizes data in a __________
Answers
1. 2. 3. 4. 5. 6. 7. 8. Structure E.F. Codd. Tables Record Set 1:M Links tree structure.
Chapter 5
5.0 OBJECTIVES
Hashing Hashing Techniques Internal Hashing External Hashing Dynamic Expansion Using Hashing Techniques
5.1 INTRODUCTION
In chapter 2, we have emphasized the fundamentals of records, file storage and structure. In chapter 4, we view the database, in the relational model as a collection of the tables. The logical model of the database is the correct level of the database users to focus on. We have described the various methods of storing and organization of data like sequential and index sequential file organization in chapter 2. In this chapter we are narrating hash file organization. One of the disadvantage of sequential file organization is that it must access an index structure to locate data, or at most binary search, and that results in more input-output operations. File organization based on techniques of hashing allows us to avoid accessing an index structure. Hashing also provides a way of constructing indices. We are going to study file organization and indices based on hashing in the following sections.
89
Hash file organization, provides very fast access to records on certain search conditions. The search conditions must be an equality condition on single field, called hash field of the file. In many cases, the hash field is also a key field of the file and it is called a hash key. The concept of hashing is to provide a function h called a hash function, that is applied to the hash field value of the record it stored. A search for the record within a block can be carried out in main memory buffer. For most records we need only a single block access to retrieve that record. Hashing is also used as an internal search structure within a program, whenever exclusively using the value of one field accesses a group of records. We describe the use of hashing for internal files, then we show how it is modified to store external files on the disk. We are also discussing techniques for extending hashing to dynamically growing files.
91
Non integer hash field value can be transformed into integers before the mod function is applied. For character strings, the numeric (ASCII) codes associated with characters can be used in the transformation. Other hashing functions can also be used. One technique called folding, involves applying arithmetic function such as addition or logical function such as exclusive or to different portions of the hash field value to calculate the hash address, another technique involves picking some digits of the hash field value. For example, the third, fifth and eighth digits form the hash address. The problem with most hashing functions is that they do not guarantee that distinct values will hash to distinct addresses, because the hash field space the number of possible values a hash field can take is usually much larger than the address space the number of available addresses of records. The hashing function maps the hash field space to the address space. A collision occurs when the hashes field value of record that is being inserted to an address that already contains a different record. In this situation we must insert the new record in some other position, since its hash address is occupied. The process of finding another position is called collision resolution. There are several methods of collision resolution, including the following.
Open addressing: Proceeding from the occupied position specified by the hash address the program checks the subsequent positions in order until an unused position is found. Chaining: For this method, various overflow locations are kept, usually by extending the carry with number of overflow positions. A collision is resolved by placing the new record in an unused overflow location and setting the pointer of the occupied hash address location to the address of that overflow location. A linked list of overflow records for each hash address is maintained as shown in figure 5.2 Multiple Hashing: The program applies a second hash function if the first function results in a collision. If another collision results, the program uses open addressing or applies a third hash function and then uses open addressing if necessary.
Each collision resolution method requires its own algorithms for insertion, retrieval, and deletion of records. The algorithm for chaining is the simplest. Deletion algorithms for open addressing are rather tricky. The goal of good hashing function is to distribute the records uniformly over the address space, so as to minimize collisions while not having many unused locations.
92
Overflow pointer refers to position of next record in linked list Null Pointer = -1
93
The pointers in the linked list should be record pointers, which include both a block address and a retrieval record position within a block. Hashing provides the fastest possible access for retrieving an arbitrary record given the value of its hash field. Although most good hashing functions do not maintain records in order of hash field values, some functions called order preserving. A simple example of an order preserving hash function is to take the leftmost 3 digits of an invoice number field as the hash address and keep the records sorted by invoice within each bucket. The hashing scheme described is called static hashing because a fixed number of buckets M is allocated. This can be a serious drawback for dynamic files. Suppose that we allocate m bucket for the address space and we let m be the maximum number of records that can fit in the allocated space. If the number of records is less than (m*M), then a lot of space is unused. On the other hand, if the number of records increases to substantially more than (m*M), numerous collisions will occur and retrieval will slowdown because of the long list of overflow records. In either case, we may have to change the number of blocks M allocated and then use new hashing functions to redistribute the records. This reorganization can be quite time consuming for large files. New dynamic file organization based on hashing allows the number of buckets to vary dynamically with localized reorganization. When using external hashing, searching for record given a value of some field other than hash field is as expensive as in the case of unordered file. Record deletion can be implemented by removing records from the bucket, if the bucket has an overflow chain and the record to be deleted is already in overflow. This is done easily by maintaining a linked list of unused overflow locations.
94
Modifying a record field value depends on two factors (1) the search condition to locate the record and (2) the field to be modified. If the search condition is an equality comparison on a hash field, we can locate the record efficiently by using the hashing function, otherwise we must do linear search. Changing the record and rewriting in the same bucket can modify a non-hash field. Modifying the hash field means that the record can move to another bucket, which requires deletion of the old record followed by insertion of the modified record.
5.6 SUMMARY
We have described the various methods of storing and organization of data like sequential and index sequential file organization in chapter 2. One of the disadvantage of sequential file organization is that it must access an index structure to locate data, or at most binary search, and that results in more inputoutput operations. File organization based on techniques of hashing allows us to avoid accessing an index structure. Hashing also provides a way of constructing indices. In this chapter we discussed about database and file organization. We began this chapter by discussing hashing functions to organize the file. Hashing which provides very fast access to any arbitrary record of a file, given the value of its key. Hashing is implemented as a hash table through the use of an array of
95
records. Many hashing functions can be used for this purpose. A collision occurs when the hashes field value of record that is being inserted to an address that already contains a different record. There are several methods of collision resolution, they are, open addressing, chaining, multiple hashing. Hashing for disk files is called external hashing. The most suitable method for external hashing is bucket techniques, with one or more configured blocks corresponding to each bucket. Collisions causing bucket overflow are handled by chaining. Access on non-hash field is slow, and so is ordered access of the records on any field. We then discussed two hashing techniques for files that grow and shrink in the number of records dynamically namely, extendible and linear hashing.
Answers
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. fast main memory hash table address space collision resolution. external hashing algorithms address space fixed linear
96
Answer the following questions
1. What is the difference between static files and dynamic files? 2. 3. 4. 5. 6. 7. 8. 9. 10.
What is hashing? Describe different hashing with suitable examples. Explain hashing technique called folding with an example. What is collision? How to resolve it. What is meant by collision resolution? Explain. Explain the methods available for collision resolution. Briefly explain the collision resolution in hashing. What is a bucket? Explain. What are the factors on which modifying a record field depends? Discuss the techniques for allowing hash file to expand and shrink dynamically. What are the advantages and disadvantages of each?
Chapter 6
Dat ab ase Se cu r i t y
6.0 OBJECTIVES
6.1 INTRODUCTION
Techniques are used for protecting the database against persons who are not authorized to access database either part of a database or whole database. Security issues and overview of topics are covered in this chapter. Methods that are used to grant and cancel privileges in relational data base system are discussed. Those methods are referred to as discretionary access control and mandatory access control. The discretionary access control access for enforcing the multilevel security levels and mandatory access control mainly concerned in database system security. Security of statistical database problems are also presented.
97
98
Legal and ethical issues regarding right to access certain information. Some information may be purely private and cannot be accessed legally by unauthorized persons. Policy issues at the governmental, institutional or corporate levels are confidential and most important information should not be made publicly available for example, personal medical records and credit card ratings. System related issues: at system levels various security functions should be enforced, for example whether a security function should be handled at the physical hardware level, DBMS level, Operating system level. In some of the organizations it is necessary to identify multiple security levels and categorize the data and users based on these classifications for example confidential, secret, top secret and unclassified. The security policy of the organization with respect to permitting access to various classifications of data must be enforced.
DBMS must provide techniques to enable certain users or group of users to access selected portions of the database without gaining access to the rest of the database. This is most important when a large integrated database is to be used by many different users within the same organization. For example, sensitive information such as employee salaries, or performance reviews should be kept confidential, so that only restricted users can access it. A DBMS typically includes a database security and authorization subsystem that is responsible for ensuring the security of portion of a database against unauthorized access. It is referred to as two types of database security mechanisms.
Discretionary Security mechanisms: These are used to grant privileges for users to read, write, delete and update a specific data files or records or fields. Mandatory security mechanisms: these are used to enforce multi-level security by classifying data and users into various security levels and implementing suitable security policy of the organization. For example, a typical security policy is to permit users at a certain classification level to see only the data items at the users own classification level.
99
b) Security problem associated with a database is about controlling the access to statistical database, which is used, for providing statistical information based on different criteria. For example, a database for population statistics may provide statistics based on age groups, income levels, size of household, education levels etc. Statistical database uses such as market research firms or government statisticians are allowed to access the database to retrieve statistical information about population but not to access the detailed confidential information on specific individuals. Security for statistical database must ensure that information about individuals cannot be accessed. It is possible to deduce certain facts concerning individuals from queries that involve only summary statistics on groups. Consequently this must not be permitted either. c) Data Encryption: It is also an important security issue, which is used to protect sensitive data such as card numbers that are being transmitted via some type of communication network. Encryption can be used to provide additional protection for sensitive portions of a database. The data is encoded by using some coding algorithm. An unauthorized user who accesses encoded data will have difficulty deciphering, but authorized users are given decoding or decrypting algorithm to decipher data.
When a user needs to access a database system, he first applies for an account. Thus DBA will create a new account number and password for the user if there is legitimate need for accessing the database. The user must login to the DBMS by entering the account number and password whenever database access is needed. The DBMS checks that the account number and password is valid, if they are, the user is permitted to use the DBMS and to access the database application programs can also be considered as users and can be required to supply passwords. It is straightforward to keep track of database users and their accounts and passwords by creating an encrypted table or file with two fields account number and password. These tables can be easily maintained by the DBMS. Whenever a new account is created, a new record is inserted into the table. When an account is cancelled the corresponding record must be deleted from the table. The database system must also keep track of all operations on the database that are applied by a certain user throughout each login session, which consists of the sequence of database interaction that a user performs from the time of logging in to the time of logging off. When a user logs in the DBMS can record the users account number and associate with it the terminal from which the user logged in. It is particularly important to keep track of the update operations that are applied to the database so that, if the database is tampered with, the DBA can find out which user did the tampering. To keep a record of all updates applied to the database and of the particular user who applied each database. The system log includes an entry for each operation applied to the database that may be required for recovery from truncation failure or system crash. If any tampering with the database is suspected a database audit is performed, which consists of reviewing the log to database during a certain time period when an illegal or unauthorized operation is found, the DBA can determine the account number used to perform this operation.
101
6.8 SUMMARY
In this chapter we discussed important techniques for enforcing security in database systems. Security enforcement deals with controlling access to database system as a whole and controlling authorization to access specific portions of a database. Assigning accounts with passwords to users usually does the former. The latter can be accomplished by a system of granting and revoking privileges to individual accounts for accessing specific pads of the database. This approach is generally referred to as discretionary access control. Then we gave an overview of mandatory access control mechanism that enforce multilevel security. Finally, we discussed the problem of controlling access to statistical databases to
102
protect the privacy of individual information while concurrently providing statistical access to population of records.
5. 6.
Answers
1. 2. 3. 4. 5. 6. Database administrator (DBA) super user account DBA Login Privilege populations
Chapter 7
In t r o d u ct io n t o Micr o so f t Access
7.0 OBJECTIVES
Accessing Microsoft Access Opening a Database Working with Access Database Creating a Table Data manipulation in DBMS Creating and Customizing Creating Reports
7.1 INTRODUCTION
This chapter gives you an introduction, as to what an RDBMS is, and what is the difference between MS-Access, an RDBMS and other packages. Also you will learn to open an existing database and see all the objects present in an Access database. A database is a collection of data related to a particular topic. Database, typically consists of a heading that describes the type of information it contains, and each row contains some information. In database BSIT 24 Basics of DBMS
103
104
terminology, the columns are called fields and the rows are called records. This kind of organization in a database is called a Table. A DBMS is a system that stores and retrieves information in a database. Data management involves creating, modifying, deleting and adding data in files and using this data to generate reports or answer adhoc queries. The software that allows us to perform these functions easily is called a DBMS.
105
Each record in a table contains the same set of fields and each field contains the same type of information for each record.
You create a query that describes the set of records you want. When you use the query to access the data, you automatically get current data from the table/s.
106
Second way
The second way of viewing data is more preferable. A query output can be viewed as in the first way. But it can be viewed in the second way by using Forms. A form is a customized way of viewing, entering and editing records in the database. You can specify
how data is to be displayed when you design the form. Forms can be created to resemble more closely the way data would be entered on paper form so that the user feels familiar with the operation.
107
2. Double click the Microsoft Access icon. Microsoft Access starts and displays Microsoft Access window, where you can create or open a database.
108
To open a database
1. Choose Open database from the file menu. It will show the following Open database window
1. Select the directory from directories list that contains the database file. 2. Select database from file name list box 3. Click on Open to display Microsoft Access Database window. As soon as you click on Open, a database window will be displayed as shown below. The database window displays a list of the tables created in the database.
Title bar is located at the top of the screen and displays the name of the program. Menu bar is located below the title bar. It lists the various options. Tool bars generally located below the menu bar, provides quick access to most frequently used commands and utilities. It can be customized by dragging the tool bars and placed in convenient positions by the user.
109
Status bar is a horizontal bar at the bottom of the screen that displays information about commands, toolbar buttons and other options.
110
Similarly all other objects in the database window can be viewed by clicking on the appropriate object buttons. To close a database Select Close database from the File menu.
7.2.1 Introduction
Now, we are familiar with opening an existing database and all the objects in the database. Let us learn to create a new database and objects in the database. A table is a collection of data stored about a particular subject. The data in a table is presented in columns and rows. We will also learn to create the basic structure of a table, to add rows (records) and to edit them.
111
Select New database from the File menu. The following dialog box is displayed. Select Blank Database and Click Ok.
112
1. Select the directory in which you want to create the database. Enter a database name, which can contain upto 8 characters but no spaces in the file name box. No need to give extension because Microsoft Access automatically adds an extension to the database name. 2. Click on Create to create an empty database file.
To modify the design of an object 1. Select the object type to modify from the database window.
113
2. Select the object name from the list to modify. 3. Click the Design button to display object window in design view. Note: There is an option to create objects yourself or through the of access wizard. An access wizard is like a database expert, which prompts you with queries about the object and then builds the object based on the answers to the queries. Creation of objects with the help of wizards will be covered later.
To create and modify objects in the database. When you start, Microsoft Access displays tools only for opening and creating a database. After a database is opened, new toolbars get added to the existing ones. The toolbars get or loose focus as and when you open any object (forms, tables, queries, reports, etc.) in Design, open or New view. Initially, the toolbar appears at the top of the Microsoft Access window and the tools are arranged in a single row. We can customize the toolbar into vertical side of window, bottom of the window and middle of window and change its shape. To Customize toolbars 1. Select Toolbars from view menu to display toolbars dialog box. Toolbar customize window is displayed Use of different options allows the toolbars to be customized.
114
2. In toolbars dialog box we can:
Click Large buttons to enlarge or return them to the original size Show ToolTips. Click on Close button to close the dialog box.
Field type N C D C N C C
Size 5 20 1 7 10 5
Decimal
We are trying to store the following details of an employee: Employee number (EMP_NO)
115
Employee name (EMP_NAME) Date of joining (DOJ) Sex (SEX) Basic salary (BASIC_SALARY) Qualification (QUALIFICATION) Department (DEPT_CODE). EMP_NO and BASIC_SALARY fields will have numeric data and so can be of type number EMP_NAME, SEX, QUALIFICATION, and DEPT_CODE store character data and hence can be of type text DOJ is for storing a date and so can be of type date
116
3. Click the New table button to open table window in Design view. 4. Click Ok to display the table structure in Design view.
We now have a window where we can specify the fields in our table and what kind of data they will be storing. The creation of table structure begins from here. The window below depicts the table in design view. The table window has two portions. The upper portion has field name, data type and description of the field. The lower portion has field properties like size, format, etc. For creating the structure: a. Enter the first field name EMP_NO in field name box. Field name can consist of upto 64 characters. b. Press Tab key to go in data type box and select datatype, for example Number. c. Press Tab key to go in Description box and type, for example Employee number. This description appears in the status bar when data is being entered in the field. Press Tab key to go in to the next field. d. Repeat steps a, b, and c to add other fields. To set a field property 1. Select field in the upper portion of the table window in design view. 2. Set field properties in the lower portion of the table window.
117
118
Decimal places: Display a certain number of places after the decimal point when using a format for a number or currency field. Default value: Suppose if the user does not enter a value for a field, some value should be taken for that field. In such a case use the default value. For example, if DOJ is not entered by the user, current date should be taken as DOJ. Use of default value will automatically fill the current date in DOJ field, in new records. Indexed: Data is indexed on this field (default is NO)
119
4. After you fill in all the fields, press Tab key to move to the new blank record. When you move to the next record, Microsoft Access saves the record added to the data sheet. When you finish adding records, close the data sheet, you dont have to save your work.
Select the field by clicking the field selector to the left of the field name.
120
Click the field selector again and hold the mouse button and drag it to the new location.
3. Save it and close the table. To delete a field. a) Select the field by clicking the field selector to left of the field name. b) Press DEL key or select Delete row from the edit menu. c) Save it and close the table. To insert a field b. Select Insert row from the edit menu. c. It inserts an empty row before the current row. d. Enter field name and other information in empty row. e. Set field properties in the lower portion of the table window. f. Save it and close the table.
121
Change the row height. 1. Position the pointer between two records selectors at the left side of the data sheet. When mouse changes shape, you can change height of row. 2. Drag the row to the desired size. All rows in the data sheet change to the new row height. Move a column. 3. Select a column you want to move by clicking the field selector. 4. Click the field selector again and drag the column to its new position. As you drag the column a solid bar between columns indicates its destination. Save and close data sheet Layout
Select Save from the file menu to save data sheet. Select Close from the file menu to close data sheet.
122
2. Select field to attach validation rule.
3. Set the rule to the validation rule and validation text of the field properties in the lower portion of the table window. 4. Save and close the table.
Width 5 20 2 3 3 3 3 3
Constraint Unique
Width 5 5 25 5
Constraint Unique
After creating the tables, do the following: 1 Set field properties of each field. 2 Modify fields in the table. 3. Modify the table STUDENT to include the following fields:
Width 4 5
123
3. In the find what box, type the value you want to find 4. Click the Find first button to move to the record if it exists. 5. Click the Find next button to find the next occurrence of the specified value 6. At the end click the Close button to close the dialog box.
124
to find occurrences of specific text and to replace them with different text by using the replace command. Replacements can be made either individually or globally. To find and replace occurrences of specified text 1. Select the field where you want to search and replace in the open view. 2. Select Replace from the edit menu The replace dialog box is shown below:
3. Type the text in the find what box. 4. Type replace text in the replace with box. 5. Now, click the Replace All button to replace all occurrences of the specified text or click the Find Next button to replace occurrences of the specified text one at a time. 6. When you finish replacing, click the Close button to close the dialog box.
Open table in the data sheet view. Select filter from the Records Menu.
125
Select the required option to filter the records Select Apply filter / Sort from the Records menu to display some filtered records in the table.
5. To remove a filter, select Remove filter / Sort from the Records menu.
Select the column in a data sheet to Sort. Select Sort from the Records menu and then select Ascending or Descending. The sorted records by Emp_name for the above datasheet view is as shown below.
126
2. Choose records, that is specify criteria. 3. Sort records, that is specify order. 4. Look for data in several tables. 5. Perform calculations. 6. Make changes to data in tables. To create a Query
2. Click the New button to display the new query dialog box.
Click the OK button to open a select query window and displays the Show table dialog box, which displays the Tables and the Queries in the database.
127
b. Select the table and click on Add to display a field list for each table.
128
Datasheet view Use this option to see the data retrieved by query.
SQL view Use this option to enter SQL (Structured Query Language) statements to create or change a query.
129
The tool used to create a query in design view is called QBE (Graphical Query by Example). With Graphical QBE queries can be created by dragging fields from the field list in the upper portion of the query window to the QBE grid in the lower portion of the window.
In the QBE qrid, each column contains information about a field included in the query.
130
To delete a join between two tables in the query window Select the join line and press DEL key
131
i) Click the field selector (column heading) of the column in design view.
ii) Click the field selector again, hold down the mouse button and drag the column to its new location.
To delete a column in a query 1. Click the field selector (column heading) of the column in design view. 2. Press DEL key To exclude a field from the querys Datasheet 2 Clear the fields Show box by clicking it. To save a query 2 Select Save from the File menu to display Save as dialog box (if first time) 3 Type name in query name box 4 Click Ok to save query in the database.
133
3. 4. 5.
Sort students by name. Sort transactions by date. Create queries to list students with marks > 70, Total transaction quantity for a date.
134
5 Click Ok to create the form by choosing required fields (double click on the required fields ), a format (say tabular) and title for the form At the end, click on finish button to save and open the form . The form displays the first record in the table.
135
To switch to datasheet view, select datasheet from the view menu to display forms data in datasheet view. To switch to form view, select forms from the view menu to display records in form view. To move from record to record in form view, use navigation buttons to go to first, last, next or previous records.
To add a new record, 2 Select New Record from the Insert menu. 3. A new blank record is displayed.
4 Type the value in the first text box. 5 Press Tab key to move to the next field. 6 Repeat to enter all other information.
136
7 After all the fields are entered and Tab key is pressed to move to the next record, Microsoft Access saves the record in the table.
To open a form in Design view 2 Click the form button in the database window. 3 Select form from the forms list
137
4 Click the Design button to open from in design view. Microsoft Access presents the form in three sections in design view: 2 Form Header contains the heading label of the form. It appears at the top of the window 3 Detail section contains the fields from the table to view data. It repeats for each record 4 Form footer appears at the bottom of the window. All forms have a detail section but may or may not have form header and footer. A form in design view:
To add form header and footer, select Form Header / Form Footer from the list box.
138
2 Drag the border to the resize the control.
All the text box controls have attached label controls. They can be moved together or separately. To move a control 1 Select the control to move. 2 Position the pointer anywhere on the control and hold down the mouse button. 3 Drag the control (text box and label together move) 4 Release the mouse button when the control is placed at the desired place. To move the attached label separately 4) Select the control 5) Position the pointer at the left top corner of the label and hold down the mouse button. 6) Move the label around 7) Release the mouse button when the label is positioned at the desired place. To delete a control 4) Select the control to delete 5) Press DEL key. It deletes the text box and its attached label.
139
To create a query 2 Click the query button; click the new button to open the new query window. 3 Add the two tables, to display data in the form. 4 Connect the tables with join line. 5 Drag the fields from the field list to the QBE grid. 6 Save and close the query. To base a form on a query 2 Click form button in the database window. 3 Click new button to display New form dialog box. 4 Select the query just created from the list box. 5 Click Ok to create the form by choosing the required fields, a format and a title for the form. At the end click on Finish button to save and open the form.
140
Check Your Progress
Using the tables created previously 1. Create forms to view data.
2. Add, delete and save records through the forms created. Change the structure of the form in design view.
141
3. Choose Report Wizard from the dialog box and Click OK.
4 Make the following choices through the dialog box. i) Choose the fields you want on the report. Fields can be from more than one table or query. For example Emp_no, Emp_name, Basic_salary from Employee table Dept_name from Department table.
142
143
iv. Select the sort order and summary options for the detail records.
For example
144
Choose ascending order of Emp_no, Emp_name and descending order of Basic_salary and Summary options Sum, Min, Max. v) Choose a layout for the report.
145
vii) Give a title for the report and click on Finish button to create and open the report in Print Preview.
146
Report in print Preview:
147
148
2. 3. 4. List of students whose average is greater than 80. List of Items for a Transaction date.
Day-wise transactions for each month under the months heading showing total transaction at the end.
7.6 SUMMARY
A database is a collection of data related to a particular topic. Database, typically consists of a heading that describes the type of information it contains, and each row contains some information. In database terminology, the columns are called fields and the rows are called records. This kind of organization in a database is called a Table. A DBMS is a system that stores and retrieves information in a database. Data management involves creating, modifying, deleting and adding data in files and using this data to generate reports or answer adhoc queries. The software that allows us to perform these functions easily is called a DBMS. In this chapter we have introduced MS-Access. Microsoft Access is a relational DBMS. Microsoft Access is also a database like any other database. In MS-Access unlike other databases it is possible to display an image on screen apart from all the other details, that is you can store pictures in Access but not in other databases. A Microsoft Access database is a collection of database files, which are also known as tables. And each table is a collection of records, and a record is a collection of fields. Each record in a table contains the same set of fields and each field contains the same type of information for each record. In MS-Access, a Query is a question you ask about the data in your database. The answer to the question can be from a single table or several tables; the query brings the data together. A form is a customized way of viewing, entering and editing records in the database. You can specify how data is to be displayed when you design the form. Forms can be created to resemble more closely the way data would be entered on paper form so that the user feels familiar with the operation Forms and queries present the data on screen. Reports are used to present data on printed paper. A Microsoft Access database is a collection of objects. A database file contains the tables, queries, forms and reports that help you to use information in the database. When a database is opened, Microsoft Access displays its database window in the Microsoft Access window. From Access window you can create and use any object in your database and other features of the Microsoft access. Tables, queries, forms, reports, macros and modules are objects of the Access database. The object buttons in the database window provide direct access to every object in the database. A table is a collection of data stored about a particular subject. The data in a table is presented in columns and rows. We discussed how to create Microsoft access database, creating a table, save and close a table, add,
149
edit and save records, modify fields in a table, modify columns and rows in a datasheet, and also validation rule to a field. Table is used to store data. Stored data can be retrieved whenever required. There are many ways in which data stored in a table can be viewed based on some criteria. We learnt how to find, filter, query and sort as well as to view data. A Query or a Filter is used to view the records in raw form from a table. To view the data in customized way we use Forms. A Form provides an easy way to view data and all the values for one record. Switch to datasheet view of the form to see all the records for that form. A Form offers the most convenient layout for entering, changing and viewing the records in the database. The form design tools in Microsoft Access help to design forms that present data in an attractive format with special fonts, and other effects. We learnt how all these are possible. Reports are used to present data on paper. A report is information organized and formatted to fit some specification. Examples are employee details, department details, etc. With Microsoft Access different design elements such as text, data, pictures, lines, boxes and graphs are used to create reports. You can create a design for a report and save it. It can be used again and again. Current data at that time is printed. We also discussed all these aspects provided by Microsoft access.
150
16. Reports are used to present data on _________
Answers
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. DBMS Tables Query Forms Screen Printed paper Exit Table Table Eight Requirements 64 create save close paper
Explain how you start Microsoft Access Explain how a database is opened and closed. Explain the different types of bars available when database window is opened. Explain the steps involved in creating Microsoft Access Database. What are the data types available in MS-Access ? Explain.
151
Explain how a table is created, saved and closed. What are the field properties available? Explain. Explain how records can be added, edited and saved. Explain how to modify fields in a table. Explain briefly the following operations i) Find a value ii) Find and replace a value iii) Create and apply a filter iv) Sort records v) Create a query
Explain how forms can be used to view, add, delete and save records. What is a report in Microsoft Access? Explain How a report is created? Explain
Type Numeric Text Numeric Numeric Numeric Numeric Numeric Numeric Numeric
Width 5 20 2 3 3 3 3 3 3
Constraint Unique
After creating the table, do the following: 3 Set field properties of each field. 4 Modify the table STUDENT to include the following fields:
152
Field Aggregate Average Class_obtained Type Numeric Numeric Text Width 4 5 10 Constraint
Exercises
2 decimal places
5 Apply necessary validation rules to each field. 6 Add 25 records. For the table created above 1. Apply filters to list students with marks greater than 50. 2. Sort students by name. 3. Create queries to list students with marks > 50 Using the table created above 1. Create forms to view data. 2. Add, delete and save records through the forms created. Change the structure of the form in design view. Using the table created above, generate the following reports: 3. List of students with marks greater than 50 in OS 4. List of students whose average is greater than 70. 5. List of students who got first class with distinction 6. List of students who have failed. 2. Create a table DAILY_TRANSACTION to have the following fields.
Width 5 5 25 5
Constraint Unique
After creating the table, do the following: 1. Set field properties of each field. 2 Apply necessary validation rules to each field.
153
3 Add 30 records, with different transaction numbers, item numbers with different dates. For the table created above 1. 2. 3. Apply filters to list items sold with quantity more than 100 Sort all transactions by increasing order of date. Create queries to list items sold with quantity less than 60
Using the table created above 1. 2. Create forms to view data. Add, delete and save records through the forms created. Change the structure of the form in design view. Using the table created above, generate the following reports: 1. List of items sold on a specific date. 2. List of items sold with quantity more than 200. 3. List of particular item sold on different dates. 4. List of items sold more than once in a particular day. 3. Consider the Car insurance database given below. The primary keys for each table is underlined. Create all the tables, along with their fields given for each table an their field type and size. PERSON
Width 20 30 30
Constraint Unique
Width 20 15 04
Constraint Unique
154
ACCIDENT
Exercises
Width 04 30
Constraint Unique
Width 20 20
Constraint Unique
Width 20 20 04 06
Constraint Unique
After creating all the above tables by properly specifying primary keys and the foreign keys i) Enter at least 10 tuples for each relation, provide data of your liking ii) Write a query to count number of drivers iii) Write a query to count number of cars of each model iv) Write a query to find total number of accidents v) Write a query to find total number of accidents on a specified date. vi) Write queries to obtain a. Regnum of cars participated in more than one accident b. Driver_ID of drivers involved in more than one accident c. Sum of the damage amount d. Average of the damage amount e. Maximum damage amount paid
155
f. Minimum damage amount paid vii) Find the total number of people who owned the cars that were involved in accidents in 2005 viii) Find the number of accidents in which cars belonging to a specific model were involved ix) Update the damage amount for the car with a specific Regnum in the accident with report number 12 to 25000 x) Add a new accident to the database xi) Generate some suitable reports of your choice. xii) Create forms to view data xiii) Add, delete and save the data through these forms 4. Consider the following database of student enrolment in courses and books adopted for each course. The primary keys for each table is underlined. Create all the tables, along with their fields given for each table an their field type and size. STUDENT
Width 10 30 20
Constraint Unique
Width 04 15 20
Constraint Unique
Width 10 04 02 03
Constraint
156
BOOK_ADOPTION
Exercises
Width 04 02 06
Constraint
Width 06 40 25 30
Constraint
After creating all the above tables by properly specifying primary keys and the foreign keys i) Enter at least 10 tuples for each relation, provide data of your liking ii) Write a query to count number of text books iii) Write a query to count number of books by each publisher iv) Write a query to find total number of students v) Write a query to find total number of students enrolled. vi) Demonstrate how you add a new book to the database and make this book be adopted by some department. vii) Produce a list of text books in the alphabetical order for courses offered by the CS department that use more than two books. viii) List any department that has all adopted books published by a specific publisher.
ix) Add a new accident to the database x) Generate some suitable reports of your choice. xi) Create forms to view data xii) Add, delete and save the data through these forms
157
Students (Name, Rollnum) Courses (Title, Code) Classes (Code, Rollnum) a. Provide data of your liking to each table. b. Write a query to list Name, Rollnum, Title, and Code. c. Prepare a report showing for each student name the titles of the courses taken by the student
158
19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. Multivalued attributes are displayed in double ovals A key consisting of two or more attributes is called as a composite key In ER diagrams, relationship types are displayed as rectangle boxes In relational model each column has a unique name. In relational model two or more tables can have the same name. A set in network model represents a 1:M relationship between the owner and the member Hash file organization, provides very fast access to records on certain search conditions The DBA is responsible for the overall security of the database system The first step in designing the database is to make the table structure A table first created is an empty container for data Reports are used to present data on paper Reporter header and footer prints information once in the report
Exercises
159
17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30.
T T T T F T F T T T T T T T
160
15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. __________ attributes are displayed in double ovals In ER diagrams, relationship types are displayed as ________boxes Relational data model is based on ________ algebra CODASYL expands to ____________ The hierarchical data model organizes data in a _______ structure Hashing for _______ files is called external hashing The ______ is responsible for the overall security of the database system Reports are used to present data on __________ Reporter header and footer prints information ________ in the report Page header and footer print the information on ________ page To close a form select close from the ________ menu
Exercises
Answers
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. data American Standard Code for Information Interchange Field Record Datafile DATABASE Database Administrator Minimized Cost Physical storage Skeleton Fields index sequential file Attributes Multivalued diamond-shaped relational
161
Conference on Data Systems Languages Tree Disk DBA Paper Once Every File
162
14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. Explain briefly the important data models. Explain the primary use of external storage devices Explain briefly different types of records What is meant by file organization ? Explain its different types. Explain briefly the types of indexes. Explain the structure of index sequential file. Explain briefly available hashing techniques. Explain the five phases of the life cycle of database system development process What is an entity ? give examples Explain different types of attributes giving examples. What is a null value ? Explain by giving example. Explain the following by giving examples i) Super key ii) Candidate key iii) Primary key 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. Explain the ternary relationship-giving example. Explain different types of mappings giving examples. Explain the relational model giving example Explain the relational rules. Give the differences between relational, network and hierarchical data models. Explain object/relational and object oriented models. What is collision ? When it occurs ? Explain Explain external and internal hashing Explain briefly the issues of security Explain how DBA is responsible for overall security of the database. Explain how database is created in MS Access. Explain how form is created in MS Access Explain briefly how reports are created in MS Access.
Exercises