Beruflich Dokumente
Kultur Dokumente
Chapter 1: Introduction
Purpose of Database Systems View of Data Database Languages Relational Databases Database Design Object-based and semistructured databases Data Storage and Querying Transaction Management Database Architecture Database Users and Administrators Overall Structure History of Database Systems
1.2
Introduction:Any organization, be it a Bank, Manufacturing Company, Hospital, University, Conglomerate or a Government Department, all require huge amount of data in one form or another. All such organizations need to collect data, process them and store them for future use. These organizations require data for a number of purposes, say: Preparing sales report Forecasts Accounts payable & receivable Medical histories, etc.
1.3
A database should be able to represent some aspect of real world (also called mini world). Changes to mini world are reflected in the database. A database is a logically interrelated collection of data with some inherent meaning. A random collection of data thus cannot be referred to as a database. A database is designed, built, and populated with data for a specific purpose. It has an intended group of users and some preconceived applications in which these users are interested.
1.4
Database System: It is the computerized management of interrelated data and a set of program files to access those data. In other words, It is a computerized record keeping system used which provides an efficient and convenient environment to store and retrieve the data. This computerized system is thus responsible for maintaining the information and makes that information available on demand. A complete database system involves four major components Data, Hardware, Software (DBMS), and Users. Except for hardware component we will discuss all the other three components.
1.5
Users / Programs
DATBASE SYSTEM
DBMS SOFTWARE
Stored Data
Database Management System(DBMS): A DBMS is a collection of programs that enables users to create and maintain a database. The DBMS hence is a general-purpose software system that facilitates the process of Defining, Constructing, and Manipulating databases for various applications. Application package such as SQL Server, Oracle, MS-Access, FoxPro are the examples of commercially available DBMSs. The various applications of DBMS are: Defining a database involves specifying the data type, structures, and constraints for the data to be stored in the database. Constructing the database is the process of storing the data itself on some storage medium that is controlled by DBMS. Manipulating a database includes such functions as querying the database to retrieve specific data, updating the database to reflect changes real world, and generating reports from the data.
1.7
An Example of Database
STUDENT Name Amit Kr Reha COURSE Course Name Intro to IT Data Structure Mathematics DBMS Roll No 17 8 Course No. 101 102 103 104 Class B.Tech MBA Hours 4 4 3 3 Major Comp. Sc Marketing Department Comp Sc Comp Sc Math Comp Sc Instructor Mrs. Sarika Mr. P Kumar Mr. A. Sinha Mrs. P Agg. Mr. P Kumar Mr. S Kumar Grade Prerequisite No. 102 103 101
1.8
SECTION
Section AIdentifier B C D E F
Course No. 103 101 102 103 101 104 Student No. 17 17 8 8 8 8
Semester I I II I I I
Year
GRADE_REPORT
91 91 92 92 92 92 Section Identifier D E A B C F
PREREQUISITE
Database Applications:
Banking: Airlines:
all transactions
reservations, schedules
registration, grades
customers, products, purchases retailers: order tracking, customized recommendations production, inventory, orders, supply chain resources: employee records, salaries, tax deductions
Manufacturing: Human
1.9
1.
2.
3. 4.
System programmers writes the application programs to meet the needs of the bank. New application programs are added to the system as the need arises. For example, suppose that the bank decides to offer zero balance in savings account of some special customers. As a result, the bank creates new permanent files that contain information about all such customers, and it may have to write new application programs to deal with this situation. Thus, as time goes by, the system becomes bulky and unmanageable by acquires more and more files and application programs.
1.10
Data
Multiple
For example, a changed customer address may be reflected in savings-account records but not elsewhere in the system.
Difficulty
in accessing data to write a new program to carry out each new task
Need
Ex. The bank officers needs to find out the names of all customers who live within a particular postal-code area. The bank officer has now two choices: either obtain the list of all customers and extract the needed information manually or ask a system programmer to write the necessary application program. Ex. list to include only those customers who have an account balance of $10,000 or more.
1.11
Integrity problems
Integrity constraints (e.g. account balance > 0) become buried in program code rather than being stated explicitly Hard to add new constraints or change existing ones For example, the balance of a bank account may never fall below a prescribed amount (say,INR 500). The problem is compounded when constraints involve several data items from different files.
Atomicity of updates
Failures may leave database in an inconsistent state with partial updates carried out Example: Transfer of funds from one account to another should either complete or not happen at all
1.12
Concurrent
Concurrent
Uncontrolled
Example: Two people reading a balance and updating it at the same time
Consider bank account A, containing $500. If two customers withdraw funds (say $50 and $100 respectively) from account A at about the same time, the result of the concurrent executions may leave the account in an incorrect (or inconsistent) state.
Suppose that the programs executing on behalf of each withdrawal read the old balance, reduce that value by the amount being withdrawn, and write the result back. If the two programs run concurrently, they may both read the value $500, and write back $450 and$400, respectively. Depending on which one writes the value last, the account may contain either $450 or $400, rather than the correct value of $350.
Security
Hard
For example, in a banking system, payroll personnel need to see only that part of the database that has information about the various bank employees.
Solution : Database systems offer common solutions to all the above problems
1.13
Maintainability
Data Independence
Shareability
Availability
1.14
Avoidance of Inconsistency
Enforcement Of Standards
Maintenance Of Integrity
Database Security
Data Independence
1.15
Controlling Redundancy Storing the same data multiple times leads to duplication of data, wastage of storage space. Same data stored at different places can have different formats like date of birth at one place is stored like Jan-19-1974, and at other place it is stored as January 19, 1974. Security (Restricting Unauthorized Access) When multiple users share a database, it is likely that some users will not be authorized to access all information in the database. E.g. financial data is often considered confidential and hence only authorized person is allowed to access such data. In addition, some users may be permitted only to retrieve data, where as other are allowed both to retrieve and to update. Providing MultiUser Interfaces Because of different types of users, with varying levels of technical knowledge regarding database, DBMS provides a variety of user interfaces.
These include query languages for casual users, programming language interfaces for application programmer, forms and command codes for parametric users, and menu driven interfaces and natural language interfaces for stand alone users.
Representing Complex Relationships Among Data A database may include different varieties of data that are interrelated in number of ways. A DBMS have the capability to represent a variety of complex relationship among them as well as to retrieve and update related data easily and efficiently. Enforcing Integrity Constraints A DBMS provides capabilities for defining and enforcing Integrity constraints. The simplest type of integrity constraint involves specifying a data type for each data item. Ex. the student name must be a string of not more than 30 alphabetic characters. Providing Back-up and Recovery A DBMS provides us the facilities for recovering from h/w or s/w failures. The backup and recovery subsystem of the DBMS is responsible for recovery. For example, if computer system fails in the middle of a complex update program, the recovery subsystem is responsible for making sure that the database is restored to the state it was in before the program started executing. 1.16
1.17
Levels of Abstraction
Physical level:
It describes how data are actually stored. This is concerned with the physical storage of the information. At this level, complex low-level data structures are described in detail.
Logical level:
This abstraction describes what data are stored in the database, and what relationship exists among those data. It is used by database administrators, who must decide what information is to kept in the database.
type customer = record customer_id : string; customer_name : string; customer_street : string; customer_city : integer; end;
View level: v
This is the highest level of abstraction, which describes only part 1.20 of the entire database.
View of Data
An architecture for a database system
1.21
1.22
Example: The database consists of information about a set of customers and accounts and the relationship between them) Analogous to type information of a variable in a program Physical schema/Internal Schema: database design at the physical level. Internal schema, not only define the various types of records but also how stored fields are represented and stored in the database system.The internal schemas uses the physical data model and describes the complete detail of data storage and access paths to the database. Logical schema: database design at the logical level. The conceptual schema includes the definitions of various types of conceptual data records of the database along with the relationship between different types of records.
External Schema / Subschema: At the highest level we have the external schema/subschema of the database. Each external view is defined by external schema, which consists of basic definitions of the various types of records found in that external view. Instance the actual content of the database at a particular point in time
Data Independence: The ability to modify a schema definition in one level without affecting a schema definition in the next higher level is called data independence.
Physical Data Independence: It is the ability to modify the physical schema without causing application programs to be rewritten. Modifications at the physical level are occasionally necessary to improve performance. Logical Data Independence: It is the ability to modify the logical or conceptual view (schema) without causing application programs to be rewritten. Modifications at the logical level are necessary whenever the logical structure of a database is altered.
1.24
Data Models
Underlying the structure of a database is called Data Models. A collection of tools for describing Data Data relationships Data semantics Data constraints There are three different kinds of data models: Object-based logical models Record-based logical models Physical models
Object-based logical models: Object-based logical models are used in describing data at the logical and view levels. There are many different models like: v v v v The entity-relationship model The object-oriented model The semantic data model The functional data model
1.25
Entity-Relationship Model : The entity-relationship (E-R) data model is based on a perception of a real world that consists of a collection of basic objects, called entities, and of relationships among these objects. An entity is a "thing" or "object" in the real world that is distinguishable from other objects.
For example, A person is an entity, and bank accounts can be considered as entities. The set of all entities of the same type, and the set of all relationships of the same type, are termed an entity set and relationship set respectively. The overall logical structure of a database can, be expressed graphically by an E-R diagram, which is built up from the following components: Rectangles: represent entity sets Ellipses: represent attributes Diamonds: represent relationships among entity sets Lines: link attributes to entity sets and entity sets to relationships
1.26
Customer_Address
Customer_PAN
Customer_Name
Customer_DOB
A\c No
Balance
Customer
Depositor
Account
1.27
The Object-Oriented Model Like the E-R model, the object-oriented model is based on collection of objects. An object is similar to a variable having some value. Objects also contain methods that operate on them. Ex. JODB, EyeDB, Durus, Postgre SQL.
1.28
Relational Model This model uses a collection of tables to represent both data and relationships among them. Network Model Data in the network model are represented by collections of records and relationships among data are represented by links, which can be viewed as pointers. The records in the database are organized as collection of arbitrary graphs.
Anil Ravi Kapil Neeraj
Abhishek
ADYSZ 1971K ADYSZ 1978K ADYSZ 1973K ADYSZ 1974K ADYSZ 1975K ADYSZ 1977K
A 101 A 102
8982 1998
A 158
A 197 A 201 A 207 A 211
2572
1903 21476 4242 11260
Mukesh
Hierarchical Model This model is similar to the network model in the sense that records and links represent data and relationships among them. It differs from the network model in that the records are organized as collections of trees rather than arbitrary graphs.
Anil
ADYSZ 1978K
Kapil
ADYSZ 1973K
Neeraj
A - 101 8982
A - 201 21476
A - 201 21476
A - 102 1998
A - 207 4242
A - 158 2572
A - 197 1903
A - 211 11260
DML also known as query language Procedural user specifies what data is required and how to get those data Declarative (nonprocedural) user specifies what data is required without specifying how to get those data
1.31
Example:
char(10), integer)
DDL compiler generates a set of tables stored in a data dictionary Data dictionary contains metadata (i.e., data about data)
Database schema Data storage and definition language Specifies the storage structure and access methods used
Integrity constraints Domain constraints Referential integrity (references constraint in SQL) Assertions
Authorization
1.32
SQL
SQL: widely used non-procedural language
Example: Find the name of the customer with customer-id 192-83-7465 select customer.customer_name from customer where customer.customer_id = 192-83-7465
Example: Find the balances of all accounts held by the customer with customer-id 192-83-7465 select account.balance from depositor, account where depositor.customer_id = 192-83-7465 and depositor.account_number = account.account_number Language extensions to allow embedded SQL Application program interface (e.g., ODBC/JDBC) which allow SQL queries to be sent to a database
1.33
Types of Database Users Database Designer: A database designer (DD) is the person
who is responsible for designing the actual database. The DD should interact with all the potential groups of users and develop an External as well as Logical view of the database. His Responsibilities includes a) Identifying the data to be stored in the database b) Choosing appropriate structure to represent and store this data. This is done before the database is actually implemented. c) It is the responsibility of DD to communicate with all the prospective database users, in order to understand their requirements. d) The DD should come up with a design of the database, which should meet end user requirements and should be capable enough to perform all data processing functions.
1.34
d) Data Access authorization granting: The granting of different types of authorization allows the database administrator to regulate which parts of the database various users can access. e) IntegrityConstraint specification: The data values stored in the database must satisfy certain constraints. The database administrator must specify such a constraint explicitly. The integrity constraints are kept in a special system structure that is consulted by the database system whenever an update takes place in the system. f) Data Appraisals: It is the job of DBA to carry out, from time to time, appraisals of the data held in the database for ensuring its completeness and accuracy, or that data is not being duplicated. DBA also Organizes facilities for addition of new data in the database and deletion of data no longer required, in consultation with user departments and systems personnel. g) Preparing Data Manuals: The DBA prepares manuals to help the user in making optimal use of the database facilities available.
1.37