DBMS Notes

Database management System
Data and information

Data are raw facts that are stored in computer in format of byte. Processing of data is called Information.
Data base The word database made up of two words. Data and base. A database is collection of interrelated data. That database contains all the data needed by the organization, as a result a huge volume of data that needed for long term storage and excess the data by large number of users. Example: Employee details of an organization like employee number, employee name, salary, date of joining etc can be called as data base. Flat File: A flat file is a file containing records that have no structured interrelationship.
DBMS: Database management systems consist of collection of interrelated data and set of program that access to the data. It is software that is helpful in maintaining and utilizing the data. Database Applications: _ Banking: all transactions _ Airlines: reservations, schedules _ Universities: registration, grades _ Sales: customers, products, purchases _ Manufacturing: production, inventory, orders, supply chain _ Human resources: employee records, salaries, tax deductions Purpose of DBMS Drawbacks of using file systems to store data: _ Data redundancy and inconsistency _ Multiple file formats, duplication of information in different files _ Difficulty in accessing data _ Need to write a new program to carry out each new task _ Data isolation multiple files and formats _ Integrity problems _ Integrity constraints (e.g. account balance > 0) become part of program code _ Hard to add new constraints or change existing ones Drawbacks of using file systems (cont.)
1
_ Atomicity of updates _ Failures may leave database in an inconsistent state with partial updates carried out _ E.g. transfer of funds from one account to another should either complete or not happen at all _ Concurrent access by multiple users _ Concurrent accessed needed for performance _ Uncontrolled concurrent accesses can lead to inconsistencies E.g. two people reading a balance and updating it at the same time _ Security problems _ Database systems offer solutions to all the above problems. Advantages of DBMS Now that we have known what is a DBMS, we can easily list down the advantages of a DBMS. Some of these advantages are: Reduced Data Redundancy In early data processing systems, each program had its own data file. This led to the duplication of data across organization. Data Redundancy is reduced in a DBMS to a great extent as the database consists of an integrated set of files. Therefore, in a typical database the employee information like the employee number, name, address, joining date, salary, etc. is stored only once but can be accessed by different programs, departments or users at any time. Also, since the data is stored at a central location, every one uses the data in the same format. Inconsistency is avoided Data Redundancy in the organization led to inconsistency of data. Let us take an example to illustrate this point. The employee detail DESIGNATION is needed by the Payroll and Employee Training departments. Let us further suppose that both of these departments have their own data. Let us suppose that the designation of an employee changed, but the change was done only by the Payroll department in their data file. This means that the data is inconsistent 10 as it shows the same employee with two designations. Since data redundancy is reduced in a DBMS, inconsistency too is avoided. Reduced Programming effort During the early days of data processing, for even a small task like retrieving employee number and name, a program had to be written. Most of such requirements in a DBMS can be handled without any extra programming effort. DBMS have some in-built functions to take care of such basic tasks. But, in case of a complex query programming is required. Many DBMSs come with an in-built query language which is English-like and hence is easy to learn and use. Writing a program in such a language is therefore, does not require much of effort. Data Independence
2
In the early days of data processing, data and programs were tightly bound to each other. Therefore, any change made to the data structure would trigger the changes to be made to all the programs that access it. But in a DBMS, data is independent of the application programs using it. DBMS relieves the users from the responsibility of knowing the physical details of data. The user need to know only the logical content of data. Therefore, any change done to the data structure will not imply that changes should be done to all the application programs that access the data. Cost Reduction Many DBMS packages prove to be quite costly in the beginning because of high initial investment. But in a long run they reduce the cost of the entire business operation. This reduction in cost is because of all the factors discussed above. For example, it can be said that because the data can be accessed faster and less programming is required to retrieve data, the overall operation is enhanced in a DBMS. And as a result there is an accompanying cost benefit. Security Security and data integrity are vital to the success of a database. Data must be safeguarded against any type of damage, whether it is system failure, human error, a natural calamity or intentional mischief. Many DBMS provide the users with the facility of taking backups. Logging is another technique in which every change made to the database is recorded on a separate log. This log is maintained on a separate storage device from that on which the database is stored. In case of any failure, data can be recovered from this log. 11 Many DBMS provide the user with the facility of a username and a password to connect to the database. Then, depending on the users rights, restricted access is provided to the database. Multi-user support and Distributed Processing Almost all of the DBMSs today support multiple users. This means that more than one user might attempt to change the same data at the same time. In such a situation the DBMS must use the concept of concurrency control . Some database management systems allow the users to retrieve information from remotely located databases. The connection in such a case is made via ordinary telephone lines or satellite links. In many applications like railway or airline reservation systems distributed processing is indispensable. 1.6 Disadvantages of DBMS Though DBMS has many advantages, it has some disadvantages too. One of the most apparent disadvantage is its cost. A DBMS does pay in the long run but its initial investment is quite high. The hardware has to be upgraded so that the programs and extensive data can be executed and stored. Redundancy is reduced to a great extent in a DBMS, but lack of duplication requires that the database is adequately backed up so that in case of any failure this backup data can be used by the system. Also, in a DBMS recovery and backup procedures are fairly
3
complex. Despite these disadvantages, there are a number of advantages of a DBMS because of which they are extremely popular. DBMS Architecture Database Management Systems are very complex, sophisticated software applications that provide reliable management of large amounts of data. To better understand general database concepts and the structure and capabilities of a DBMS, it is useful to examine the architecture of a typical database management system. There are two different ways to look at the architecture of a DBMS: the logical DBMS architecture and the physical DBMS architecture. The logical architecture deals with the way data is stored and presented to users, while the physical architecture is concerned with the software components that make up a DBMS. Logical DBMS Architecture The logical architecture describes how data in the database is perceived by users. It is not concerned with how the data is handled and processed by the DBMS, but only with how it looks. Users are shielded from the way data is stored on the underlying file system, and can manipulate the data without worrying about where it is located or how it is actually stored. This results in the database having different levels of abstraction. The ANSI/SPARC architecture divides the system into three levels of abstraction: the internal or physical level, the conceptual level, and the external or view level. The diagram below shows the logical architecture for a typical DBMS.
Figure 3-1 Logical DBMS Architecture The Internal or Physical Level The collection of files permanently stored on secondary storage devices is known as the physical database. The physical or internal level is the one closest to physical storage, and it provides a lowlevel description of the physical database, and an interface between the operating system's file system and the record structures used in higher levels of abstraction. It is at this level that record types and methods of storage are defined, as well as how stored fields are represented, what physical sequence the stored records are in, and what other physical structures exist. The Conceptual Level The conceptual level presents a logical view of the entire database as a unified whole, which allows you to bring all the data in the database together and see it in a consistent manner. The first stage in the design of a database is to define the conceptual view, and a DBMS provides a data definition language for this purpose. It is the conceptual level that allows a DBMS to provide data independence. The data definition language used to create the conceptual level must not specify any physical storage considerations that should be handled by the physical level. It should not provide any storage or access details, but should define the information content only. The External or View Level
5
The external or view level provides a window on the conceptual view which allows the user to see only the data of interest to them. The user can be either an application program or an end user. Any number of external schemas can be defined and they can overlap each other. The System Administrator and the Database Administrator are special cases. Because they have responsibilities for the design and maintenance for the design and maintenance of the database, they at times need to be able to see the entire database. The external and the conceptual view are functionally equivalent for these two users. Mappings between Levels Obviously, the three levels of abstraction in the database do not exist independently of each other. There must be some correspondence, or mapping, between the levels. There are actually two mappings: the conceptual/internal mapping and the external/conceptual mapping. The conceptual/internal mapping lies between the conceptual and internal levels, and defines the correspondence between the records and the fields of the conceptual view and the files and data structures of the internal view. If the structure of the stored database is changed, then the conceptual/ internal mapping must also be changed accordingly so that the view from the conceptual level remains constant. It is this mapping that provides physical data independence for the database. The external/conceptual view lies between the external and conceptual levels, and defines the correspondence between a particular external view and the conceptual view. Although these two levels are similar, some elements found in a particular external view may be different from the conceptual view. For example, several fields can be combined into a single (virtual) field, which can also have different names from the original fields. If the structure of the database at the conceptual level is changed, then the external/conceptual mapping must change accordingly so the view from the external level remains constant. It is this mapping that provides logical data independence for the database. It is also possible to have another mapping, where one external view is expressed in terms of other external views (this could be called an external/external mapping). This is useful if several external views are closely related to one another, as it allows you to avoid mapping each of the similar external views directly to the conceptual level. Physical DBMS Architecture The physical architecture describes the software components used to enter and process data, and how these software components are related and interconnected. Although it is not possible to generalize the component structure of a DBMS, it is possible to identify a number of key functions
6
which are common to most database management systems. The components that normally implement these functions are shown in the diagram on the following page, which depicts the physical architecture for a typical DBMS. At its most basic level the physical DBMS architecture can be broken down into two parts: the back end and the front end. The back end is responsible for managing the physical database and providing the necessary support and mappings for the internal, conceptual, and external levels described earlier. Other benefits of a DBMS, such as security, integrity, and access control, are also the responsibility of the back end. The front end is really just any application that runs on top of the DBMS. These may be applications provided by the DBMS vendor, the user, or a third party. The user interacts with the front end, and may not even be aware that the back end exists.
Figure 3-2 Physical DBMS Architecture Both the back end and front end can be further broken down into the software components that are common to most types of DBMS. These components are examined in detail in the following sections. Applications and Utilities
7
Applications and utilities are the main interface to the DBMS for most users. There are three main sources of applications and utilities for a DBMS: the vendor, the user, and third parties. Vendor applications and utilities are provided for working with or maintaining the database, and usually allow users to create and manipulate a database without the need to write custom applications. However, these are usually general-purpose applications and are not the best tools to use for doing specific, repetitive tasks. User applications are generally custom-made application programs written for a specific purpose using a conventional programming language. This programming language is coupled to the DBMS query language through the application program interface (API). This allows the user to utilize the power of the DBMS query language with the flexibility of a custom application. Third party applications may be similar to those provided by the vendor, but with enhancements, or they may fill a perceived need that the vendor hasn't created an application for. They can also be similar to user applications, being written for a specific purpose they think a large majority of users will need. The most common applications and utilities used with a database can be divided into several welldefined categories. These are:
Command Line Interfaces-these are character-based, interactive interfaces that let you use the full power and functionality of the DBMS query language directly. They allow you to manipulate the database and perform ad-hoc queries and see the results immediately. They are often the only method of exploiting the full power of the database without creating programs using a conventional programming language. Graphical User Interface (GUI) tools-these are graphical, interactive interfaces that hide the complexity of the DBMS and query language behind an intuitive, easy to understand, and convenient interface. This allows casual users the ability to access the database without having to learn the query language, and it allows advanced users to quickly manage and manipulate the database without the trouble of entering formal commands using the query language. However, graphical interfaces usually do not provide the same level of functionality as a command line interface because it is not always possible to implement all commands or options using a graphical interface. Backup/Restore Utilities-these are designed to minimize the effects of a database failure and ensure a database is restored to a consistent state if a failure does occur. Manual backup/restore utilities require the user to initiate the backup, while automatic utilities will back up the database at regular intervals without any intervention from the user. Proper use of a backup/restore utility allows a DBMS to recover from a system failure correctly and reliably.
8
Load/Unload Utilities-these allow the user to unload a database or parts of a database and reload the data on the same machine, or on another machine in a different location. This can be useful in several situations, such as for creating backup copies of a database at a specific point in time, or for loading data into a new version of the database or into a completely different database. These load/unload utilities may also be used for rearranging the data in the database to improve performance, such as clustering data together in a particular way or reclaiming space occupied by data that has become obsolete. Reporting/Analysis Utilities-these are used to analyze and report on the data contained in the database. This may include analyzing trends in data, computing values from data, or displaying data that meets some specified criteria, and then displaying or printing a report containing this information.
The Application Program Interface The application program interface (API) is a library of low-level routines which operate directly on the database engine. The API is usually used when creating software applications with a generalpurpose programming language such as C++ or Visual Basic. This allows you to write custom software applications to suit the needs of your business, without having to develop the storage architecture as well. The storage of the data is handled by the database engine, while the input and any special analysis or reporting functions are handled by the custom application. An API is specific to each DBMS, and a program written using the API of one DBMS cannot be used with another DBMS. This is because each API usually has its own unique functions calls that are tied very tightly to the operation of the database. Even if two databases have the same function, they may use different parameters and function in different ways, depending on how the database designer decided to implement the function in each database. One exception to this is the Microsoft Open Database Connectivity API, which is designed to work with any DBMS that supports it. The Query Language Processor The query language processor is responsible for receiving query language statements and changing them from the English-like syntax of the query language to a form the DBMS can understand. The query language processor usually consists of two separate parts: the parser and the query optimizer. The parser receives query language statements from application programs or command-line utilities and examines the syntax of the statements to ensure they are correct. To do this, the parser breaks a statement down into basic units of syntax and examines them to make sure each statement consists of the proper component parts. If the statements follow the syntax rules, the tokens are passed to the query optimizer.
The query optimizer examines the query language statement, and tries to choose the best and most efficient way of executing the query. To do this, the query optimizer will generate several query plans in which operations are performed in different orders, and then try estimate which plan will execute most efficiently. When making this estimate, the query optimizer may examine factors such as: CPU time, disk time, network time, sorting methods, and scanning methods. The DBMS Engine The DBMS engine is the heart of the DBMS, and it is responsible for all of the data management in the DBMS. The DBMS engine usually consists of two separate parts: the transaction manager and the file manager. The transaction manager maintains tables of authorization and currency control information. The DBMS may use authorization tables to allow the transaction manager to ensure the user has permission to execute the query language statement on the database. The authorization tables can only be modified by properly authorized user commands, which are themselves checked against the authorization tables. In addition, a database may also support concurrency control tables to prevent conflicts when simultaneous, conflicting commands are executed. The DBMS checks the concurrency control tables before executing a query language statement to ensure that it is not locked by another statement. The file manager is the component responsible for all physical input/output operations on the database. It is concerned with the physical address of the data on the disk, and is responsible for any interaction (reads or writes) with the host operating system Types of Databases
10
11
Components of database A Database Management System (DBMS) is defined as a software package that contains computer programs that can control the creation, maintenance and also the use of any database. There are five major components of any database management system. These are: A DBMS engine, a Data Definition Subsystem, a Data Manipulation Subsystem, an Application Generation Subsystem and a Data Administration Subsystem. A DBMS engine is the part of the system that accepts and deals with logical requests that are received from a variety of other DBMS subsystems. It then works to convert them into physical equivalents and will then gain access to the database along with its data dictionary as they are on a storagedevice. A Data Definition Subsystem has a simple but useful purpose to assist the user in creating and keeping up with the data dictionary. It also helps to define the structure of any file that is contained within the database. A Data Manipulation Subsystem is another part of the system that helps the user. It helps them to add, change and delete any data within the database. It also deals with any requests for valuable information. Software tools that exist within the Data Manipulation Subsystem are usually the primary interface between the user and the information in the database that they are trying to access. The Data Manipulation Subsystem allows the user to be specific with their information requirements. An Application Generation Subsystem is the part of the system that contains all the facilities that could be used to help the user develop transaction intensive applications. This procedure will usually require the user to perform complex and detailed tasks to complete the transaction. This is where the Application Generation Subsystem comes in useful as it provides easy to use screens where you enter the data, programming languages and interfaces.
A Data Administration Subsystem is there to help users with managing the overall database environment. It does this by providing facilities that give the user options for backup and recovery of lost data, managing the system's security, query optimization, concurrency control and change management.
12
Database management System A set of application program used to access, update and manage data. This part constitutes data base management. DBMS is a general purpose of software Eg. Oracle, SQL etc. DBMS is a complex system that allows users to do many things to data. DBMS allows user to input data, share the data, edit the data, manipulate the data and display the data in data base.
Edit Input DBMS Share

Manipulate
Update
Select
Various types of user interacts There are various ways to interact with the data base which was shown following diagram.
User
P1
13
S
Figure: types of user interaction From figure 1: User one is directly working with database interactively by working directly we mean that the user issue certain command to query the database. User two is working with the database indirectly with the help of an application programs written by a programmer. Program (P) interacting with the database without the presents of any user, program like this are known as batch programs and are used for various purposes like generation of periodic reports.
User1, user2, P could be interacting with database at the same time concurrently. This type of an environment is multiuser environment. Database (Collection of files)
File (Collection of records)
Record (Collection of fields)
Ahmed
Field
Components and interfaces of DBMS A database management system involves 5 major components 1. Data 2. Hardware 3. Software 4. Procedure
14
5. Users.
Data is building block of database management system. Data is considered as a heart of database management system.
Hardware: The particular hardware depends on the requirements of the organization and the database management system. Some DBMS run only on particular operating system, while others run on a wide variety of operating system. A DBMS requires a minimum amount of main memory and disk space to run, but this minimum configuration may not give acceptable performance.
Software: The software includes the DBMS software, application programs together with the operating system. The application programs are return in programming language like C, COBOL, etc..
Procedure: are the rules that govern the design and use of the database. The procedure may contain information on how to log on to the DBMS, start and stop the DBMS, how to recover the database, change the structure of the table and improve the performance. compose Users
User
Database Administrator
Database Designer
Database Manager
Database User
Database Administrator
15
Application Programmer
End User
Database Administrator is a person having central control over data and programs accessing that data. The database administrator is a manager whose responsibilities are focus on management of technical aspects of the database system. The objectives of database administrator are given us follows:
o To control the database environment
o To standardized of the use of database and associated software o To support the development and maintenance of database application Responsibilities of database administrator: The responsibility of the database administrator is to maintain the integrity, security and availability of data. A database must be protected from accidents, such as input or programming errors. The responsibilities of database administrator are summarized as follows: o o o o Authorizing access to the database Coordinating and monitoring its use. Acquiring hardware and software resources as needed. Backup and recovery
Database Designer Database Designer can be either logical database designer or physical database designer. Logical database designer is concerned with identifying the data, the relationship between the data. The logical database designer must have through understanding of the organization data and its business rule. The physical database designer takes a logical data model and decides the way in which it can be physically implemented. Database manager The database manager is a program module which provides the interface between the low level data stores in the database and the application programs and queries submitted to them. The database manager would translate DML (Data manipulate language) statements in to low level file system commands. For storing, retrieving and updating data in the database. Database Users Database users are the people who need information from the database to carry out their business responsibilities.
16
The data base users can be classified in to two categories 1. 2. Application programming End user
Application programming
Application programmings are the people who write an application program and interact with the database. o
o
End Users End users are the people interact with the system without writing programs. They forms request by writing queries in a database queries language.
Data Dictionary Data dictionary also known as system catalog is a centralized store of information about the database. It contains information about the tables, the fields, data types, primary keys, indexes, the joins etc. This information store in the data dictionary is called the Metadata thus the data dictionary can be considered as a file that stores metadata.Data dictionary is a tool for recording and processing information about the data that an organization uses. Metadata: The information about data in the database is called metadata. The metadata available for query and manipulation Data Abstraction: Need for abstraction The main objective of DBMS is to store and retrieve information efficiently; all the users should be able to access same data. The designers use complex data structure to represent data. So that the data can be efficiently store and retrieve. The developers hide the complexity from users through several levels of abstraction. DBMS Architecture Database Management Systems are very complex, sophisticated software applications that provide reliable management of large amounts of data. To better understand general database concepts and
17
the structure and capabilities of a DBMS, it is useful to examine the architecture of a typical database management system. There are two different ways to look at the architecture of a DBMS: the logical DBMS architecture and the physical DBMS architecture. The logical architecture deals with the way data is stored and presented to users, while the physical architecture is concerned with the software components that make up a DBMS. Logical DBMS Architecture The logical architecture describes how data in the database is perceived by users. It is not concerned with how the data is handled and processed by the DBMS, but only with how it looks. Users are shielded from the way data is stored on the underlying file system, and can manipulate the data without worrying about where it is located or how it is actually stored. This results in the database having different levels of abstraction. The majority of commercial Database Management Systems available today are based on the ANSI/SPARC generalized DBMS architecture, as proposed by the ANSI/SPARC Study Group on Data Base Management Systems. The ANSI/SPARC architecture divides the system into three levels of abstraction: the internal or physical level, the conceptual level, and the external or view level. The diagram below shows the logical architecture for a typical DBMS.
Figure 3-1 Logical DBMS Architecture The Internal or Physical Level

18
The collection of files permanently stored on secondary storage devices is known as the physical database. The physical or internal level is the one closest to physical storage, and it provides a lowlevel description of the physical database, and an interface between the operating system's file system and the record structures used in higher levels of abstraction. It is at this level that record types and methods of storage are defined, as well as how stored fields are represented, what physical sequence the stored records are in, and what other physical structures exist. The Conceptual Level The conceptual level presents a logical view of the entire database as a unified whole, which allows you to bring all the data in the database together and see it in a consistent manner. The first stage in the design of a database is to define the conceptual view, and a DBMS provides a data definition language for this purpose. It is the conceptual level that allows a DBMS to provide data independence. The data definition language used to create the conceptual level must not specify any physical storage considerations that should be handled by the physical level. It should not provide any storage or access details, but should define the information content only. The External or View Level The external or view level provides a window on the conceptual view which allows the user to see only the data of interest to them. The user can be either an application program or an end user. Any number of external schema can be defined and they can overlap each other. The System Administrator and the Database Administrator are special cases. Because they have responsibilities for the design and maintenance for the design and maintenance of the database, they at times need to be able to see the entire database. The external and the conceptual view are functionally equivalent for these two users. Mappings Between Levels Obviously, the three levels of abstraction in the database do not exist independently of each other. There must be some correspondence, or mapping, between the levels. There are actually two mappings: the conceptual/internal mapping and the external/conceptual mapping. The conceptual/internal mapping lies between the conceptual and internal levels, and defines the correspondence between the records and the fields of the conceptual view and the files and data structures of the internal view. If the structure of the stored database is changed, then the conceptual/ internal mapping must also be changed accordingly so that the view from the
19
conceptual level remains constant. It is this mapping that provides physical data independence for the database. The external/conceptual view lies between the external and conceptual levels, and defines the correspondence between a particular external view and the conceptual view. Although these two levels are similar, some elements found in a particular external view may be different from the conceptual view. For example, several fields can be combined into a single (virtual) field, which can also have different names from the original fields. If the structure of the database at the conceptual level is changed, then the external/conceptual mapping must change accordingly so the view from the external level remains constant. It is this mapping that provides logical data independence for the database. It is also possible to have another mapping, where one external view is expressed in terms of other external views (this could be called an external/external mapping). This is useful if several external views are closely related to one another, as it allows you to avoid mapping each of the similar external views directly to the conceptual level. Physical DBMS Architecture The physical architecture describes the software components used to enter and process data, and how these software components are related and interconnected. Although it is not possible to generalize the component structure of a DBMS, it is possible to identify a number of key functions which are common to most database management systems. The components that normally implement these functions are shown in the diagram on the following page, which depicts the physical architecture for a typical DBMS. At its most basic level the physical DBMS architecture can be broken down into two parts: the back end and the front end. The back end is responsible for managing the physical database and providing the necessary support and mappings for the internal, conceptual, and external levels described earlier. Other benefits of a DBMS, such as security, integrity, and access control, are also the responsibility of the back end. The front end is really just any application that runs on top of the DBMS. These may be applications provided by the DBMS vendor, the user, or a third party. The user interacts with the front end, and may not even be aware that the back end exists.
20
Figure 3-2 Physical DBMS Architecture Both the back end and front end can be further broken down into the software components that are common to most types of DBMS. These components are examined in detail in the following sections. Applications and Utilities Applications and utilities are the main interface to the DBMS for most users. There are three main sources of applications and utilities for a DBMS: the vendor, the user, and third parties. Vendor applications and utilities are provided for working with or maintaining the database, and usually allow users to create and manipulate a database without the need to write custom applications. However, these are usually general-purpose applications and are not the best tools to use for doing specific, repetitive tasks. User applications are generally custom-made application programs written for a specific purpose using a conventional programming language. This programming language is coupled to the DBMS query language through the application program interface (API). This allows the user to utilize the power of the DBMS query language with the flexibility of a custom application.
21
Third party applications may be similar to those provided by the vendor, but with enhancements, or they may fill a perceived need that the vendor hasn't created an application for. They can also be similar to user applications, being written for a specific purpose they think a large majority of users will need. The most common applications and utilities used with a database can be divided into several welldefined categories. These are:
Command Line Interfaces-these are character-based, interactive interfaces that let you use the full power and functionality of the DBMS query language directly. They allow you to manipulate the database and perform ad-hoc queries and see the results immediately. They are often the only method of exploiting the full power of the database without creating programs using a conventional programming language. Graphical User Interface (GUI) tools-these are graphical, interactive interfaces that hide the complexity of the DBMS and query language behind an intuitive, easy to understand, and convenient interface. This allows casual users the ability to access the database without having to learn the query language, and it allows advanced users to quickly manage and manipulate the database without the trouble of entering formal commands using the query language. However, graphical interfaces usually do not provide the same level of functionality as a command line interface because it is not always possible to implement all commands or options using a graphical interface. Backup/Restore Utilities-these are designed to minimize the effects of a database failure and ensure a database is restored to a consistent state if a failure does occur. Manual backup/restore utilities require the user to initiate the backup, while automatic utilities will back up the database at regular intervals without any intervention from the user. Proper use of a backup/restore utility allows a DBMS to recover from a system failure correctly and reliably. Load/Unload Utilities-these allow the user to unload a database or parts of a database and reload the data on the same machine, or on another machine in a different location. This can be useful in several situations, such as for creating backup copies of a database at a specific point in time, or for loading data into a new version of the database or into a completely different database. These load/unload utilities may also be used for rearranging the data in the database to improve performance, such as clustering data together in a particular way or reclaiming space occupied by data that has become obsolete. Reporting/Analysis Utilities-these are used to analyze and report on the data contained in the database. This may include analyzing trends in data, computing values from data, or displaying data that meets some specified criteria, and then displaying or printing a report containing this information.
22
The Application Program Interface The application program interface (API) is a library of low-level routines which operate directly on the database engine. The API is usually used when creating software applications with a generalpurpose programming language such as C++ or Visual Basic. This allows you to write custom software applications to suit the needs of your business, without having to develop the storage architecture as well. The storage of the data is handled by the database engine, while the input and any special analysis or reporting functions are handled by the custom application. An API is specific to each DBMS, and a program written using the API of one DBMS cannot be used with another DBMS. This is because each API usually has its own unique functions calls that are tied very tightly to the operation of the database. Even if two databases have the same function, they may use different parameters and function in different ways, depending on how the database designer decided to implement the function in each database. One exception to this is the Microsoft Open Database Connectivity API, which is designed to work with any DBMS that supports it. The Query Language Processor The query language processor is responsible for receiving query language statements and changing them from the English-like syntax of the query language to a form the DBMS can understand. The query language processor usually consists of two separate parts: the parser and the query optimizer. The parser receives query language statements from application programs or command-line utilities and examines the syntax of the statements to ensure they are correct. To do this, the parser breaks a statement down into basic units of syntax and examines them to make sure each statement consists of the proper component parts. If the statements follow the syntax rules, the tokens are passed to the query optimizer. The query optimizer examines the query language statement, and tries to choose the best and most efficient way of executing the query. To do this, the query optimizer will generate several query plans in which operations are performed in different orders, and then try estimate which plan will execute most efficiently. When making this estimate, the query optimizer may examine factors such as: CPU time, disk time, network time, sorting methods, and scanning methods. The DBMS Engine The DBMS engine is the heart of the DBMS, and it is responsible for all of the data management in the DBMS. The DBMS engine usually consists of two separate parts: the transaction manager and the file manager.
23
The transaction manager maintains tables of authorization and currency control information. The DBMS may use authorization tables to allow the transaction manager to ensure the user has permission to execute the query language statement on the database. The authorization tables can only be modified by properly authorized user commands, which are themselves checked against the authorization tables. In addition, a database may also support concurrency control tables to prevent conflicts when simultaneous, conflicting commands are executed. The DBMS checks the concurrency control tables before executing a query language statement to ensure that it is not locked by another statement. The file manager is the component responsible for all physical input/output operations on the database. It is concerned with the physical address of the data on the disk, and is responsible for any interaction (reads or writes) with the host operating system Different levels of data abstraction Data abstractions are: 1. 2. 3. Physical level Logical level View level
1. Physical level
2. It is concern with a physical storage of the information. It provides the internal view of the
actual physical storage of data.
3. Logical level Logical level describes what data are store in the database and what relationship exist among those data. Logical level describes the entire database in terms of small number of simple structure. The database administrator uses the logical level of abstraction. 4. View level View level is the highest level of abstraction. It is the view the individual user of the database has. There are many view level abstraction of the same data.
Database instances:
24
Database change over time as information is inserted and deleted. The collection of information stored in the database at a particular moment is called an instance of database.
Database schema The overall design of the database is called Database schema. Schema provides a logical classification of object in the database.
Database schema
Sub schema Physical Schema Logical schema (Describes the different (Describe the database design(Describe the database design at views of the database) the logical level) at the physical level) Advantages of Database Management System 1. 2. 3. 4. Data Independence Data redundancy Data inconsistency Centralizing the data
1. Data Independence: Data independence means the programs are isolated from changes in the
way the data structured and stored. In a database system the data base management system provides the interface between the application programs and the data. Physical data independence means the applications need not worry about how the data are physically structured and stored. Application should work with the logical data model and declarative query language. If major changes where to be made to the data, the application programs may need to be rewritten.
2. Data redundancy: Data redundancy means duplication of data. Data redundancy will occupy more space. Hence it is not desirable. (Not wanted) 3. Data inconsistency: Data inconsistency means different copies of the same data will have different values.
25
5. Centralizing The Data: Centralizing data means data can easily shared between the users. But
the main concern is data security. Data Redundancy Data independence Data Inconsistency Centralizing the data Data integrity Entity Relationship Diagram E/R diagram is a graphical modeling tool to standardize E/R modeling. The modeling can be carried out with the help of pictorial representation of entities, attribute and relationship. The basic building blocks of entity relationship diagram are entity, attribute and relationship. Entity: An Entity is an object that exists and is distinguishable from other objects. In other words the entity can be uniquely identified. Eg; A particular person, a particular department, a particular place etc.. Attributes: Attributes are properties of entity type. In other words entities are described in a database by a set of attributes. Eg. Students ID, Name, Batch no are the attribute of student. Relationship: A Relationship is a association of entity. Where the association includes one entity from each participating entity type, where relationship is a meaningful association between entity types. Eg: - Buying is a relationship between vendor and customer. -
Reduce in DBMS Activated in DBMS Avoided in DBMS Achieved in DBMS Necessary for efficient transaction
Treatment is the relationship between Doctor and Patient. Teacher is the relationship between the teacher and student.
Classification of Entity Type The entities can be classified in to: 1. 2. Strong entity Weak entity
Entity
26
Strong Entity
Weak Entity
Strong Entity: strong entity is one who is existence does not depend on other entity. Eg: Consider the following example:
Student
Takes
Course
In this example course is considered as weak entity, because if there are no student to take a particular course then that course cannot be offered. That course entity depends on the student entity. 28-02-2010 Weak Entity: Weak entity is one whose existence depends on other entity. Example: consider the example, customer borrows loan, hear loan is a week entity. For every loan there should be at least one customer. Hence the entity loan depends on the entity customer. Customer Browse Loan
Attribute classification Attributes is used to describe the properties of the entity. This attributes can be classified into the following: 1. Single value attribute 2. Multi value attribute 3. Derived Attribute 4. Composite Attribute 5. Null Attribute
1. Single Value Attribute: Single Value attribute means there is only one value associated with
that attribute
27
Symbol
Example: Age of person, student ID, Employee ID.
2. Multi Value Attribute: In the case of multi value attributes more than one value will be
associated with that attribute. Example: Consider an entity employee; an employee can have many skills. Hence skilled associated an employee are multi value attribute. Employee Age Employee Name Employee Skill Multi Valued attribute
3. Derived Attribute: The Value of Derived attribute can be derived from the values of other
related attributes are entities.
The derived attributes is represented by dotted ellipse.
Employee Name
Employee
Age
4. Composite attribute: Composite attribute is one which can be further sub divided in to simple
attribute. Example: Consider the attribute Address which can be further sub divided into street name, city, and state.
Employee Name
Street No City State

28
5. Null Attribute: particular entity may not have any applicable value for an attribute. For such
situation a special value called null value is created.
Example: The attribute phone number and, if a person do not have phone than a null value entered in that column. Relationship degree Relationship degree refers to the number of associated entities. The relationship degree can be classified in to the following types.
1. 2. 3.
4.
Unary Relationship (one) Binary Relationship (two) Ternary Relationship (three) Quaternary Relationship (four)
Unary Relationship: The unary relationship is otherwise known as recursive relationship. In the unary relationship the number of associated entity is one.
Captain of Player
Binary Relationship: In binary relationship two entities are involved.
Staff
IS Assign
Department
Ternary Relationship: In a ternary relationship three entities are simultaneously involved. Ternary relationships are required when binary relationships are not sufficient Location
29
9-03-2010
Staff
IS Assign
Department
Quaternary relationship: Quaternary relationship involved four entities Slides
Lecturer
Teaches
Student
Course Relationship Classification Relationship is association among one or more entities. This relationship can be classified in to the following: 1. One to one relationship 2. One to many relationship 3. Many to one relationship 4. Many to many relationship
One to one relationship: One to one relationship is a special case of one to many
relationships. True one to one relationship is rare. Example of one to one relationship is a relationship between the president and the country.
One to many relationship: The relationship that associates one entity to more than one entity is called one to many relationship.
Example of one to many relationship is country having states, for one country that can be more than one state hence it is an example of one to many relationship.
Many to one relationship: The relationship between employee and department is an
example of many to one relationship. There may be many employees working in one department. Hence it is an example of many to one relationship.
30
Many to many relationship: The relationship between employee and project is an example
of many to many relationships. Many employees will be working in many projects. Hence the relationship between employee and the project is many to many relationships.
Relationship Type President-to -One One
Representation Country
Example
Employee Department One -to -Many
Many Employee -to -Many

Employees-to -One Many
Project Department
14-03-2010 EER Model (Enhanced Entity Relationship Model) The basic concepts of ER modeling are not powerful enough for some complex application. Hence some semantic modeling concepts are required. The EER Model is the extension of the ER Model with new modeling construct. The new modeling construct introduce in the EER Model are super type (super class), subtype (sub class) relationship. The super type allow us to model general entity type where as the subtype allow us to model specialized entity type. EER Model = ER Model + hierarchical relationship
Super type are (super class): Super type are super class is a generic entity type, that has relationship with one or more subtype
Example: player is generic which has a relationship one more subtype like cricket player, foot ball player.
Sub type are (Sub class): sub type are sub class is a sub grouping of the entities in an entity type.
A sub class entity type is a specialized entity type of super class entity type Subtype inherit the attributes and the relationship associated with a super type.
Generalization and Specialization Are viewed from two opposite direction. Generalization is the bottom - up process of defining a generalize entity type from a set of one or more specialized entity type.
31
Specialization is a top-down process of defining one or more sub type of a super type. Generalization is a process minimizing the differences between entities by identifying common features. Specialization is a process of identifying subset of an entity that share distinguishing characteristics.
Student Specialization
Supper class or supper type or generic entity type Generalizatio n
Full time student First Tier
Part time student Database Architecture
Two Tier Architectureclass or server Architecture) Sub (Client sub type or specialize entity type
Task User Interface Presentation Services Application Services Client Second Tier Database Server Task Application Services Business Services Data Services
The two tier architecture is a client server architecture in which the client contains the presentation code and the SQL statements for data access. The data server process SQL statements and sends query result back to the client. The clients or first tier is primarily responsible for presentation of
32
data to the user and server or second tire is primarily is responsible for supplying data service for the client. Presentation Services: presentation services refer to the portion of the application which presents data to the user. It also provides for the mechanism in which the user will interact with the data. Business Services: Business Services encapsulate in organization business processes and requirements. These rules are derived from the steps necessary to carry out day to day to business. These rules can be validation rules. Data Services: Data Services provide access to the data independent to their location. Data services provide a standard interface for access in data. 21-03-2010 Three Tire: Three Tier Architecture
First Tier Client
Task User Interface Presentation Services Task Application Services Business Services
Second Tier Application Server Business Objects Third Tier Data Server
Task Data Services Data Validation
From figure it is clear that in order to improve the performance a second tire is included between the client and the server. Three tire architecture provides greater application scalability, lower maintenance and increase reuse of components. Through standard tiered interfaces, services are made available through application. A single application can employee many different services which may reside on dissimilar platform.
33
N Tire (multi tire Architecture)
Applicatio n Server
The multi tire architecture is the most general client server architecture. It can be most difficult to Applicatio Data implement because of it generality. However a good design and implementation of multi tire n benefits in Server architecture can provide a mostServer terms of scalability and flexibility. From diagram the client application looks to application server1 and to supply the data from a main frame based application. Application server 1 has no direct access to the main frame. But it does know, through the development of application services, that application server 2 provides a service to access the data from the main frame application which satisfy the client request. Database model Everything is stored as object in an object based in database model. The object base database model can be defined as a collection of tool. 1. Describing data 2. Describing Data Relationship
Record based Database Model Record based database model describe the data structures and access techniques of DBMS. These can be divided into Four Types 1. File management system
2. Hierarchical database System 34
3. Network database System 4. Relational database system
1. File Management System File management System was a first method used to store data in a computerize database. The data item is stored sequentially in one large file. If a particular data item has to be located the search, Starts from the beginning and items are checked sequentially till the required items are found. 2. Hierarchical Database System In a Hierarchical model data is organized in a tree like structure. Implying (used) the single upward link in each record to describe the nesting and sort field to keep records in particular order, in each same line. Hierarchical structures were widely used in the early main frame database system. 3. Network Database System Data represented by a collection of record and relationship among data or represented by links. This is similar to hierarchical level. It offers many to many relationship. Dr E.F.Codd first introduces the relational database model. The relational model allows data to be represented in a raw, column format. Each data field is considered as a column and each record is considered as a raw of the table. Different relationship between various tables can be achieved by a relational database model. We can easily insert data in to the database and very easily delete data from the database. The 12 Rules 1. The information rule: All information is explicitly and logically and represented in tables as data value. 2. The rule of Guarantee Access: Every item of data in the database must be accessible with the help of a table name primary key value and column name. Example: Select * from employee (table name) Select name, student Id from student (table name) A primary key prevents the entry of duplicate values and null values 3. The database description rule: The data description of database is maintained using the same logical structure with which data was define by the RDBMS. 4. Comprehensive data sub language Rule: This rule support the following 1. Data definition 2. View definition
35
3. Data manipulation 4. Integrity constrains 5. Authorization 6. Transaction management obstructing
5. The view updating rule: All views that are theoretically updatable must also be updatable by the system. 6. The Insert and update rule: It allows data to be inserted in a database and it also allows data to be updated in a database. 7. The Physical independence rule: Application programs are independent and the database is independent In order to make any changes in a database no need to change or update application programs
8. The Logical Data independence rule: the changes that are made to the database should not affect the users ability to work with the data. 9. The Integrity independence rule: the Integrity constraint should be store in the database as a table, but it wont disturb the database as well as program. 10. The distribution rule: the system must be able to access or manipulate data that is distributed in other system. 11. The Non Sub-version Rule: The non sub-version rule status the different level of the language cannot by pass the integrity rule and the constraints. In simple words if on RDDMS support a low level language then it should not by pass then it should not by pass any integrity constraint defined in the high level language. 12. The Systematic treatment of null value: The RDBMS is must be able to support null values (that are values are different from zero and spaces) to represent missing or in applicable information. Primary key A primary key is used as a unique identifier for each record in a table. And it is essential when working with relational tables. A primary key cannot have the duplicate entries. A primary key must be set on a field generating a unique identifier. The retrieval of the data from the database can be faster with the help of primary key.
36
Example Std id 1001 1002 1003 Types of primary key 1-Single field primary key 2-Auto number primary key 3-multiple field primary key Student name Mohamed Hussein Bassam
Single field primary key If we set a field to be unique and not null then it is called as single field primary key. E.g. student id, product id, employee id Auto number primary key If we set a field as auto number it will automatically generate sequential numbers. This is the one of the default primary key in ms access. We cant enter the values in auto number primary key field; the values will be automatically generated. Example 1, 2, 3, 4, 5, 6, 7 etc Multiple field primary key If we set more than one field in a table as primary key then it is called multiple field primary key. Example .order id and bill no
Foreign key A foreign key is a column or a set of column in a table which have a corresponding relationship and a dependency on another table, specifically the dependency is on the primary key of the table so you would have one -to -many relationships. Sometimes it requires more than one attribute to uniquely identify an identity a primary that made up of more than one attribute is known as composite key. Composite key Emp id 01 02 01 02 03 Pros id 1008 1006 1008 1006 1009 Hours worked 80 96 120 72 14
37
Super key Different set of attributes which are able to identify any row in the database is known as super key. Normalization Normalization allows us to establish relationship between tables. Normalization reduces redundancy. Normalizations help in simplifying the structure of tables. Normalization theory is built around the concept of normal forms. Normalization is a process in database design which groups data into various tables. Normalization is a systematic way of ensuring that a database structure is suitable for generate purpose query. Advantages of Normalization Large tables can be divided into any n number of smaller tables. Breaking the database into a numerous smaller tables and eliminating redundancies eases (easier) database management and enhances database efficiency. Example: Un normalized table. Custcode Custname Address Products 1001 ali male X Y z 1006 Ahmed male A 1008 Mohamed male m W Normalized table Custcode Custname 1001 ali 1006 Ahmed 1008 Mohamed Product table Custcode 1001 1001 1001 1006 1008 1008
Address Male Male Male
Product Name x y z a m n
If we split the above table into two tables we can avoid unnecessary repetition of data. Based on the customer code we can establish relationship between customer table and product table. There are 4 level of normalization.
38
1. 2. 3. 4.
First normal form Second normal form Third normal form Boyce Codd normal
Purpose of normalization To avoid redundancy by storing each fact within the database only once. To put data into the form that is more able to accurately accommodate change. To avoid unnecessary coding First normal form A table is in first normal form if and only if all columns contains only atomic values, that is there is no longer repeating groups within a row. Un normal form to fist normal form. Dept no Dept name 1 Nilgris 2 SAS 3 KAD 4 NKD
location Coimbatore, Chennai Chennai, Mumbai Coimbatore Calcutta
Consider a table department. The table department is not in normal form because the table has repeating groups. The intersection of row with column should have only one value but in the above table the department location value is not atomic. Because the location field located in more than one location. Dept no 1 2 3 4 Dept name Nilgris SAS KAD NKD Location 1 Coimbatore Chennai Coimbatore Calcutta Location 2 Chennai Mumbai
The column location in first table having more than one value. One way is to divide the column into location 1 and location 2. The drawback of this form is that if a department is started in many places than more locations like location 1, location 2 location n SECOND NORMAL FORMS A table is set to be second normal form if it is in first normal form and all of its non key attributes depend on the key. E_ID E_Name P_ID P_Name Total Time
39
E_ID -> employee id P_ID-Project id P_Name Project name E-Name- employee name Consider the above table employee project which having the attributes which having the E_ID, project_id , project name , total time. The above table can be transformed to second normal form by breaking the table into two smaller tables. E_ID 1001 E_Name ali
E_ID
P_ID
P_Name
Total Time
Second normal form to third normal form A table is in third normal form if it is in second normal form and contains no transitive dependency. To understand transitive dependency let us consider three attributes. A, B and c connected in such a way that Aand B C. In other words AC. this kind of functional dependency is known as transitive dependency. Let us consider a table hostel. The attributes of the table hostel are roll number, building name and fees. Roll no 100 101 102 Building Main additional new fee 600 500 650
Roll number is the key for the table hostel since the other two columns depends on roll number; here it is in second normal form. Roll no 100 101 102 Building main additional new Building Main additional new fee 60 500 600
40
Boyce -codd normal form. A relation is in Boyce codd normal form if for non trivial functional dependency Xa super key .Boyce codd normal form is a stronger form of normalization then normal form . Let us consider a relation. Teaching which has three attribute. Student r1 Course r2 Instructor r3
x is a
Teaching (student, course, instructor) In the above relation teaching, student determines the course, which determines the instructor. Also the instructor determines the course which he has to handle. The relation teaching can be transformed in BCNF by splitting into two relations. R1 and R2 R1 (r1, r2) R2 (r2, r3) R3(r1,r3) By splitting the relation teaching into two relations R1 and R2 we have transformed the relation teaching into boyce codd normal form. Database language
DDL is classified in three categories 1. Create Table 2. Alter table 3. Drop table Table A table is a unit of storage which holds the data in the form of rows and columns Create table <table name> (column definition, column definition2.); Example Create table customer (custid number (5), name varchar (20), city varchar (20)); Syntax to alter the table Alter table<table name >modify, (column definition ); Example Alter table customer modify (custid number (10)); Alter tables<tablename>add<column definition, column definition); Example Alter table customer add (address varchar (20));
41
111) Alter table <table name .rename column (old name to new name ); Example Alter table customer rename column (customer name to c_name); Syntax to drop table Drop table<table name> Example Drop table customer; Syntax to truncate table Example Truncate customer; Data manipulation language (DML) DML commands are 1) Insert 2) Select 3) Update 4) Delete Syntax to insert Insert into <table name>value (value1,value2.); Example. Insert into customer values (1001 ali); Syntax for select Select <definition>from <<table name>; Example Select * from customer; Select custname,custid from customer Syntax for update . Update <tablename>set<column>=value where<colunmname>; Example Update customer set name=ali where name =ahmed Distributed Database Management System A Distributed computing system consists of number of processing elements that are inter connected by a COMPUTER NETWORK and that corporate in performing certain application task. The distributed database is a collection of multiple logically interrelated databases. Distributed over a computer network. The distributed database is a collection of data which belong logically to the same system but are spread over the site of a computer network. Distribution The fact that the data are not resident at the same site, so that we can distinguish a distributed database from a single, centralize database.
42
Database 1
Database 2 Computer 2
Computer 1
Local Network
Computer 3
Database 3
Database , DBMS 1. Advantages and application of DBMS 2. Components of DBMS 3. RDBMS, Rules for RDBMS 4. Database model 5. Database Architecture? 6. E/R diagram with example and diagram 7. EER, Generalization and specialization 8. Distributed DBMS 9. Normalization with normal forms 10. Keys and its types with exam 11. DL and DML with syntax and example 12. Relationship degree and relationship classification
43
44

DBMS Notes

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

DBMS Notes

Hochgeladen von

Copyright:

Verfügbare Formate

Database management System

Data and information

Edit Input DBMS Share

File (Collection of records)

Record (Collection of fields)

Figure 3-1 Logical DBMS Architecture The Internal or Physical Level

actual physical storage of data.

Example: Age of person, student ID, Employee ID.

related attributes are entities.

The derived attributes is represented by dotted ellipse.

Street No City State

situation a special value called null value is created.

Quaternary relationship: Quaternary relationship involved four entities Slides

Relationship Type President-to -One One

Employee Department One -to -Many

Many Employee -to -Many

Supper class or supper type or generic entity type Generalizatio n

Full time student First Tier

Part time student Database Architecture

First Tier Client

Task Data Services Data Validation

N Tire (multi tire Architecture)

3. Network database System 4. Relational database system

3. Data manipulation 4. Integrity constrains 5. Authorization 6. Transaction management obstructing

Address Male Male Male

location Coimbatore, Chennai Chennai, Mumbai Coimbatore Calcutta

Das könnte Ihnen auch gefallen