Sie sind auf Seite 1von 219

Database Management System

Database Management Systems


UNIT -1 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 Introduction and brief history to Database Characteristics of database Difference between File System & DBMS. Advantages of DBMS Functions of DBMS Role of Database Administrator Simplified Database System Environment Example of a Database Architecture of DBMS Data Independence

1.10 Types of database applications 1.11 Data Models 1.12 The database system environment 1.13 Centralized and Client-Server DBMS Architectures

VTU-EDUSAT

Page 1

Database Management System

Introduction to Database
1.0 Introduction
Database is a collection of related data. Database management system is software designed to assist the maintenance and utilization of large scale collection of data. DBMS came into existence in 1960 by Charles. Integrated data store which is also called as the first general purpose DBMS. Again in 1960 IBM brought IMS-Information management system. In 1970 Edgor Codd at IBM came with new database called RDBMS. In 1980 then came SQL Architecture- Structure Query Language. In 1980 to 1990 there were advances in DBMS e.g. DB2, ORACLE.

Data
Data is raw fact or figures or entity. When activities in the organization takes place, the effect of these activities need to be recorded which is known as Data.

Information
Processed data is called information The purpose of data processing is to generate the information required for carrying out the business activities.

In general data management consists of following tasks

Data capture: Which is the task associated with gathering the data as and when they originate.

Data classification: Captured data has to be classified based on the nature and intended usage.

Data storage: The segregated data has to be stored properly. Data arranging: It is very important to arrange the data properly Data retrieval: Data will be required frequently for further processing, Hence it is very important to create some indexes so that data can be retrieved

VTU-EDUSAT

Page 2

Database Management System


easily. Data maintenance: Maintenance is the task concerned with keeping the data upto-date. Data Verification: Before storing the data it must be verified for any error. Data Coding: Data will be coded for easy reference. Data Editing: Editing means re-arranging the data or modifying the data for presentation. Data transcription: This is the activity where the data is converted from one form into another. Data transmission: This is a function where data is forwarded to the place where it would be used further. Metadata (meta data, or sometimes meta information) is "data about data", of any sort in any media. An item of metadata may describe a collection of data including multiple content items and hierarchical levels, for example a database schema. In data processing, metadata is definitional data that provides information about or documentation of other data managed within an application or environment. The term should be used with caution as all data is about something, and is therefore metadata.

Database

Database may be defined in simple terms as a collection of data A database is a collection of related data. The database can be of any size and of varying complexity. A database may be generated and maintained manually or it may be computerized.

Database Management System


A Database Management System (DBMS) is a collection of program that enables user to create and maintain a database.

The DBMS is hence a general purpose software system that facilitates the process of defining constructing and manipulating database for various applications.

VTU-EDUSAT

Page 3

Database Management System 1.1

Characteristics of DBMS
To incorporate the requirements of the organization, system should be designed for easy maintenance.

Information systems should allow interactive access to data to obtain new information without writing fresh programs.

System should be designed to co-relate different data to meet new requirements. An independent central repository, which gives information and meaning of available data is required.

Integrated database will help in understanding the inter-relationships between data stored in different applications.

The stored data should be made available for access by different users simultaneously.

Automatic recovery feature has to be provided to overcome the problems with processing system failure.

DBMS Utilities
A data loading utility: Which allows easy loading of data from the external format without writing programs. A backup utility: Which allows to make copies of the database periodically to help in cases of crashes and disasters. Recovery utility: Which allows to reconstruct the correct state of database from the backup and history of transactions. Monitoring tools: Which monitors the performance so that internal schema can be changed and database access can be optimized.

VTU-EDUSAT

Page 4

Database Management System

File organization: Which allows restructuring the data from one type to another?

1.2 Difference between File system & DBMS


File System
1. File system is a collection of data. Any management with the file system, user has to write the procedures 2. File system gives the details of the data representation and Storage of data. 3. In File system storing and retrieving of data cannot be done efficiently. 4. Concurrent access to the data in the file system has many problems like a. Reading the file while other deleting some information, updating some information 5. File system doesnt provide crash recovery mechanism. Eg. While we are entering some data into the file if System crashes then content of the file is lost. 6. Protecting a file under file system is very difficult.

DBMS
1. DBMS is a collection of data and user is not required to write the procedures for

managing the database. 2. DBMS provides an abstract view of data that hides the details. 3. DBMS is efficient to use since there are wide varieties of sophisticated techniques to store and retrieve the data. 4. DBMS takes care of Concurrent access using some form of locking. 5. DBMS has crash recovery mechanism, DBMS protects user from the effects of system failures. 6. DBMS has a good protection mechanism. DBMS = Database Management System RDBMS = Relational Database Management System

VTU-EDUSAT

Page 5

Database Management System


A database management system is, well, a system used to manage databases. A relational database management system is a database management system used to manage relational databases. A relational database is one where tables of data can have relationships based on primary and foreign keys.

1.3 Advantages of DBMS.


Due to its centralized nature, the database system can overcome the disadvantages of the file system-based system
1. Data independency:

Application program should not be exposed to details of data representation and storage DBMS provides the abstract view that hides these details.
2. Efficient data access.: DBMS utilizes a variety of sophisticated techniques to store and retrieve data

efficiently.
3. Data integrity and security:

Data is accessed through DBMS, it can enforce integrity constraints. E.g.: Inserting salary information for an employee.
4. Data Administration:

When users share data, centralizing the data is an important task, Experience professionals can minimize data redundancy and perform fine tuning which reduces retrieval time.
5. Concurrent access and Crash recovery:

DBMS schedules concurrent access to the data. DBMS protects user from the effects of system failure.
6. Reduced application development time.

DBMS supports important functions that are common to many applications. VTU-EDUSAT Page 6

Database Management System

1.4 Functions of DBMS


Data Definition: The DBMS provides functions to define the structure of the data in the application. These include defining and modifying the record structure, the type and size of fields and the various constraints to be satisfied by the data in each field. Data Manipulation: Once the data structure is defined, data needs to be inserted, modified or deleted. These functions which perform these operations are part of DBMS. These functions can handle plashud and unplashud data manipulation needs. Plashud queries are those which form part of the application. Unplashud queries are ad-hoc queries which performed on a need basis. Data Security & Integrity: The DBMS contains modules which handle the security and integrity of data in the application. Data Recovery and Concurrency: Recovery of the data after system failure and concurrent access of records by multiple users is also handled by DBMS. Data Dictionary Maintenance: Maintaining the data dictionary which contains the data definition of the application is also one of the functions of DBMS. Performance: Optimizing the performance of the queries is one of the important functions of DBMS.

1.5 Role of Database Administrator.


Typically there are three types of users for a DBMS: 1. The END User who uses the application. Ultimately he is the one who actually puts the data into the system into use in business. This user need not know anything about the organization of data in the physical level. 2. The Application Programmer who develops the application programs. He/She has more knowledge about the data and its structure. He/she can manipulate the data using his/her programs. He/she also need not have access and knowledge of the complete data in the system. 3. The Data base Administrator (DBA) who is like the super-user of the system. VTU-EDUSAT Page 7

Database Management System


The role of DBA is very important and is defined by the following functions. Defining the schema: The DBA defines the schema which contains the structure of the data in the application. The DBA determines what data needs to be present in the system and how this data has to be presented and organized. Liaising with users: The DBA needs to interact continuously with the users to understand the data in the system and its use. Defining Security & Integrity checks: The DBA finds about the access restrictions to be defined and defines security checks accordingly. Data Integrity checks are defined by the DBA. Defining Backup/Recovery Procedures: The DBA also defines procedures for backup and recovery. Defining backup procedure includes specifying what data is to be backed up, the periodicity of taking backups and also the medium and storage place to backup data. Monitoring performance: The DBA has to continuously monitor the performance of the queries and take the measures to optimize all the queries in the application.

VTU-EDUSAT

Page 8

Database Management System


1.6 Simplified Database System Environment

A database management system (DBMS) is a collection of programs that enables users to create and maintain database. The DBMS is a general purpose software system that facilitates the process of defining, constructing, manipulating and sharing databases among various users and applications. Defining a database specifying the database involves specifying the data types, constraints and structures of the data to be stored in the database. The descriptive information is also stored in the database in the form database catalog or dictionary; it is called meta-data. Manipulating the data includes the querrying the database to retrieve the specific data. An application program accesses the database by sending the qurries or requests for data to DBMS. The important function provided by the DBMS includes protecting the database and maintain the database.

VTU-EDUSAT

Page 9

Database Management System


1.7 Example of a Database (with a Conceptual Data Model)
Mini-world for the example: Part of a UNIVERSITY environment. Some mini-world entities: STUDENTs COURSEs SECTIONs (of COURSEs) (academic) DEPARTMENTs INSTRUCTORs

Example of a Database (with a Conceptual Data Model)


Some mini-world relationships: SECTIONs are of specific COURSEs STUDENTs take SECTIONs COURSEs have prerequisite COURSEs INSTRUCTORs teach SECTIONs COURSEs are offered by DEPARTMENTs STUDENTs major in DEPARTMENTs

VTU-EDUSAT

Page 10

Database Management System


Example of a simple Database

VTU-EDUSAT

Page 11

Database Management System


Example of a simple Database

Example of a Student File

VTU-EDUSAT

Page 12

Database Management System Example of a Student File

VTU-EDUSAT

Page 13

Database Management System Example of a simplified database catalog

1.8 Architecture of DBMS

VTU-EDUSAT

Page 14

Database Management System


A commonly used views of data approach is the three-level architecture suggested by ANSI/SPARC (American National Standards Institute/Standards Planning and

Requirements Committee). ANSI/SPARC produced an interim report in 1972 followed by a final report in 1977. The reports proposed an architectural framework for databases. Under this approach, a database is considered as containing data about an enterprise. The three levels of the architecture are three different views of the data: External - individual user view Conceptual - community user view Internal - physical or storage view The three level database architecture allows a clear separation of the information meaning (conceptual view) from the external data representation and from the physical data structure layout. A database system that is able to separate the three different views of data is likely to be flexible and adaptable. This flexibility and adaptability is data independence that we have discussed earlier.

We now briefly discuss the three different views.

The external level is the view that the individual user of the database has. This view is often a restricted view of the database and the same database may provide a number of different views for different classes of users. In general, the end users and even the application programmers are only interested in a subset of the database. For example, a department head may only be interested in the departmental finances and student enrolments but not the library information. The librarian would not be expected to have any interest in the information about academic staff. The payroll office would have no interest in student enrolments.

The conceptual view is the information model of the enterprise and contains the view of the whole enterprise without any concern for the physical implementation. This view is normally more stable than the other two views. In a database, it may be desirable to change the internal view to improve performance while there has been no change in the

VTU-EDUSAT

Page 15

Database Management System


conceptual view of the database. The conceptual view is the overall community view of the database and it includes all the information that is going to be represented in the database. The conceptual view is defined by the conceptual schema which includes definitions of each of the various types of data.

The internal view is the view about the actual physical storage of data. It tells us what data is stored in the database and how. At least the following aspects are considered at this level:

Storage allocation e.g. B-trees, hashing etc. Access paths e.g. specification of primary and secondary keys, indexes and pointers and sequencing. Miscellaneous e.g. data compression and encryption techniques, optimization of the internal structures.

Efficiency considerations are the most important at this level and the data structures are chosen to provide an efficient database. The internal view does not deal with the physical devices directly. Instead it views a physical device as a collection of physical pages and allocates space in terms of logical pages.

The separation of the conceptual view from the internal view enables us to provide a logical description of the database without the need to specify physical structures. This is often called physical data independence. Separating the external views from the conceptual view enables us to change the conceptual view without affecting the external views. This separation is sometimes called logical data independence.

Assuming the three level view of the database, a number of mappings are needed to enable the users working with one of the external views. For example, the payroll office may have an external view of the database that consists of the following information only: Staff number, name and address. VTU-EDUSAT Page 16

Database Management System


Staff tax information e.g. number of dependents. Staff bank information where salary is deposited. Staff employment status, salary level, leave information etc.

The conceptual view of the database may contain academic staff, general staff, casual staff etc. A mapping will need to be created where all the staff in the different categories are combined into one category for the payroll office. The conceptual view would include information about each staff's position, the date employment started, full-time or parttime etc. This will need to be mapped to the salary level for the salary office. Also, if there is some change in the conceptual view, the external view can stay the same if the mapping is changed.

1.9 Data Independence


Data independence can be defined as the capacity to change the schema at one level without changing the schema at next higher level. There are two types of data Independence. They are 1. Logical data independence. 2. Physical data independence.

1. Logical data independence is the capacity to change the conceptual schema without having to change the external schema. 2. Physical data independence is the capacity to change the internal schema without changing the conceptual schema.

When not to use a DBMS


Main inhibitors (costs) of using a DBMS: High initial investment and possible need for additional hardware. Overhead for providing generality, security, concurrency control, recovery, and integrity functions When a DBMS may be unnecessary:

VTU-EDUSAT

Page 17

Database Management System


If the database and applications are simple, well defined and not expected to change. If there are stringent real-time requirements that may not be met because of DBMS overhead. If access to data by multiple users is not required. When no DBMS may suffice: If the database system is not able to handle the complexity of data because of modeling limitations If the database users need special operations not supported by the DBMS.

1.10 Types of Databases and Database Applications


Traditional Applications: Numeric and Textual Databases More Recent Applications: Multimedia Databases Geographic Information Systems (GIS) Data Warehouses Real-time and Active Databases Many other applications

1.11 Data Model


A model is an abstraction process that hides superfluous details. Data modeling is used for representing entities of interest and their relationship in the database. Data model and different types of Data Model Data model is a collection of concepts that can be used to describe the structure of a database which provides the necessary means to achieve the abstraction. The structure of a database means that holds the data. data types VTU-EDUSAT Page 18

Database Management System


relationships constraints

Types of Data Models


1. High Level- Conceptual data model. 2. Low Level Physical data model. 3. Relational or Representational 4. Object-oriented Data Models: 5. Object-Relational Models:

1. High Level-conceptual data model: User level data model is the high level or conceptual model. This provides concepts that are close to the way that many users perceive data. 2 .Low level-Physical data model : provides concepts that describe the details of how data is stored in the computer model. Low level data model is only for Computer specialists not for end-user. 3. Representation data model: It is between High level & Low level data model Which provides concepts that may be understood by end-user but that are not too far removed from the way data is organized by within the computer. The most common data models are

1. Relational Model
The Relational Model uses a collection of tables both data and the relationship among those data. Each table have multiple column and each column has a unique name . Relational database comprising of two tables Customer Table.

VTU-EDUSAT

Page 19

Database Management System


Customer-Name Preethi Sharan Preethi Arun Preethi Rocky Account Table Account-Number A-101 A-125 A-456 A-987 A-111 Balance 1000.00 1200.00 5000.00 1234.00 3000.00 Security Number 111-222-3456 111-222-3457 112-123-9878 123-987-9909 111-222-3456 222-232-0987 Address Yelhanka Hebbal Jaynagar MG road Yelhanka Sanjay Nagar City Bangalore Bangalore Bangalore Bangalore Bangalore Bangalore AccountNumber A-101 A-125 A-456 A-987 A-111 A-111

Customer Preethi and Rocky share the same account number A-111 Advantages 1. The main advantage of this model is its ability to represent data in a simplified format. 2. The process of manipulating record is simplified with the use of certain key attributes used to retrieve data. 3. Representation of different types of relationship is possible with this model.

2. Network Model
The data in the network model are represented by collection of records and relationships among data are represented by links, which can be viewed as pointers. Preethi 111-222-3456 yelhanka Bangalore

A-101 A-111

1000.00 3000.00

VTU-EDUSAT

Page 20

Database Management System


The records in the database are organized as collection of arbitrary groups. Advantages: 1. Representation of relationship between entities is implemented using pointers which allows the representation of arbitrary relationship 2. Unlike the hierarchical model it is easy.

3. data manipulation can be done easily with this model.

3. Hierarchical Model
A hierarchical data model is a data model which the data is organized into a tree like structure. The structure allows repeating information using parent/child relationships: each parent can have many children but each child only has one parent. All attributes of a specific record are listed under an entity type.

Advantages: 1. The representation of records is done using an ordered tree, which is natural method of implementation of oneto-many relationships. 2. Proper ordering of the tree results in easier and faster retrieval of records. 3. Allows the use of virtual records. This result in a stable database especially when modification of the data base is made.

VTU-EDUSAT

Page 21

Database Management System 4.0 Object-oriented Data Models


Several models have been proposed for implementing in a database system. One set comprises models of persistent O-O Programming Languages such as C++ (e.g., in OBJECTSTORE or VERSANT), and Smalltalk (e.g., in GEMSTONE). Additionally, systems like O2, ORION (at MCC then ITASCA), IRIS (at H.P.used in Open OODB).

5.0 Object-Relational Models


Most Recent Trend. Started with Informix Universal Server. Relational systems incorporate concepts from object databases leading to objectrelational. Object Database Standard: ODMG-93, ODMG-version 2.0,ODMG-version 3.0. Exemplified in the latest versions of Oracle-10i,DB2, and SQL Server and other DBMSs. Standards included in SQL-99 and expected to be enhanced in future SQL standards.

Schemas versus Instances


Database Schema:

The description of a database. Includes descriptions of the database structure, data types, and the constraints on the database. Schema Diagram:

An illustrative display of (most aspects of) a database schema. Schema Construct:

A component of the schema or an object within the schema, e.g., STUDENT, COURSE. VTU-EDUSAT Page 22

Database Management System


Database State:

The actual data stored in a database at a particular moment in time. This includes the collection of all the data in the database. Also called database instance (or occurrence or snapshot). The term instance is also applied to individual database components, e.g. record instance, table instance, entity instance

Database Schema vs. Database State


Database State:

Refers to the content of a database at a moment in time. Initial Database State:

Refers to the database state when it is initially loaded into the system. Valid State:

A state that satisfies the structure and constraints of the database.

Distinction

The database schema changes very infrequently. The database state changes every time the database is updated

Schema is also called intension

State is also called extension

VTU-EDUSAT

Page 23

Database Management System


Example of a Database Schema

Example of a database state

VTU-EDUSAT

Page 24

Database Management System

DBMS Languages
Data Definition Language (DDL) Data Manipulation Language (DML) High-Level or Non-procedural Languages: These include the relational language SQL May be used in a standalone way or may be embedded in a programming language Low Level or Procedural Languages:

These must be embedded in a programming language

Data Definition Language (DDL)


Used by the DBA and database designers to specify the conceptual schema of a database. In many DBMSs, the DDL is also used to define internal and external schemas (views). Page 25

VTU-EDUSAT

Database Management System


In some DBMSs, separate storage definition language (SDL) and view definition language (VDL) are used to define internal and external schemas. SDL is typically realized via DBMS commands provided to the DBA and database designers

Data Manipulation Language (DML)


Used to specify database retrievals and updates DML commands (data sublanguage) can be embedded in a general-purpose programming language (host language), such as COBOL, C, C++, or Java. A library of functions can also be provided to access the DBMS from a programming language Alternatively, stand-alone DML commands can be applied directly (called a query language).

Types of DML
High Level or Non-procedural Language:

For example, the SQL relational language are set-oriented and specify what data to retrieve rather than how to retrieve it. Also called declarative languages.

Low Level or Procedural Language: Retrieve data one record-at-a-time; Constructs such as looping are needed to retrieve multiple records, along with positioning pointers.

DBMS Interfaces
Stand-alone query language interfaces Example: Entering SQL queries at the DBMS interactive SQL interface (e.g. SQL*Plus in ORACLE) VTU-EDUSAT Page 26

Database Management System


Programmer interfaces for embedding DML in programming languages User-friendly interfaces Menu-based, forms-based, graphics-based, etc.

DBMS Programming Language Interfaces


Programmer interfaces for embedding DML in a programming languages: Embedded Approach: e.g embedded SQL (for C,C++, etc.), SQLJ (for Java) Procedure Call Approach: e.g. JDBC for Java, ODBC for other programming languages Database Programming Language Approach: e.g. ORACLE has PL/SQL, a programming language based on SQL; language incorporates SQL and its data types as integral components/

User-Friendly DBMS Interfaces


Menu-based, popular for browsing on the web Forms-based, designed for nave users Graphics-based (Point and Click, Drag and Drop, etc.) Natural language: requests in written English Combinations of the above:For example, both menus and forms usedextensively in Web database interfaces

Other DBMS Interfaces


Speech as Input and Output Web Browser as an interface Parametric interfaces, e.g., bank tellers using function keys. Interfaces for the DBA: Creating user accounts, granting authorizations Setting system parameters

Changing schemas or access paths

VTU-EDUSAT

Page 27

Database Management System


2.0 The database system environment
The DBMS is a complex software system.

Typical DBMS Component Modules

The figure is divided into two halves. The top half of the figure refers to the various users of the database environment and their interfaces. The lower half shows the internals of the DBMS responsible for storage of data and processing of transaction. The database and the DBMS catalog are usually stored on disk.Access to the disk is primarily controlled by operating system(OS).which inclues disk input/Output.A higher level stored data manager module of DBMS controls access to DBMS information that is stored on the disk.

VTU-EDUSAT

Page 28

Database Management System


If we consider the top half of the figure, It shows interfaces to DBA staff, casual users, application programmers and parametric users The DDL compiler processes schema definitions, specified in the DDL,and stores the description of the schema in the DBMS Catalog..The catalog includes information such as names and sizes of the sizes of the files, data types of data of data items. Storage details of each file, mapping information among schemas and constraints. Casual users and persons with occasional need of information from database interact using some for of interface which is interactive query interface. The queries are parsed, analysed for correctness of the operations for the model. the names of the data elements and so on by a query compiler that compiles them into internal form. The internal query is subjected to query optimization..The query optimizer is concerned with rearrangement and possible recording of operations, eliminations of redundancies. Application programmer writes programs in host languages. The precompiler extracts DML commands from an application program

2.1Centralized and Client-Server DBMS Architectures Centralized DBMS:


Combines everything into single system including- DBMS software, hardware, application programs, and user interface processing software. User can still connect through a remote terminal however, all processing is done at centralized site.

VTU-EDUSAT

Page 29

Database Management System


A Physical Centralized Architecture

Architectures for DBMS have followed trends similar to those generating computer system architectures. Earlier architectures used mainframes computers to provide the main processing for all system functions, including user application programs and user interface programs as well all DBMS functionality. The reason was that most users accessed such systems via computer terminals that did not have processing power and only provided display capabilities. Therefore all processing was performed remotely on the computer system, and only display information and controls were sent from the computer to the display terminals, which were connected to central computer via various types of communication networks. As prices of hardware declined, most users replaced their terminals with PCs and workstations. At first database systems used these computers similarly to how they have used is play terminals, so that DBMS itself was still a Centralized DBMS in which all the DBMS functionality, application program execution and user interface processing were carried out on one Machine. VTU-EDUSAT Page 30

Database Management System


Basic 2-tier Client-Server Architectures
Specialized Servers with Specialized functions Print server File server DBMS server Web server Email server Clients can access the specialized servers as needed

Logical two-tier client server architecture

Clients
Provide appropriate interfaces through a client software module to access and utilize the various server resources. Clients may be diskless machines or PCs or Workstations with disks with only the client software installed. Connected to the servers via some form of a network. (LAN: local area network, wireless network, etc.)

DBMS Server
Provides database query and transaction services to the clients Relational DBMS servers are often called SQL servers, query servers, or transaction servers Applications running on clients utilize an Application Program Interface (API) to access server databases via standard interface such as: VTU-EDUSAT Page 31

Database Management System


ODBC: Open Database Connectivity standard JDBC: for Java programming access Client and server must install appropriate client module and server module software for ODBC or JDBC

Two Tier Client-Server Architecture


A client program may connect to several DBMSs, sometimes called the data sources. In general, data sources can be files or other non-DBMS software that manages data. Other variations of clients are possible: e.g., in some object DBMSs, more functionality is transferred to clients including data dictionary functions, optimization and recovery across multiple servers, etc.

Three Tier Client-Server Architecture


Common for Web applications Intermediate Layer called Application Server or Web Server: Stores the web connectivity software and the business logic part of the application used to access the corresponding data from the database server Acts like a conduit for sending partially processed data between the database server and the client. Three-tier Architecture Can Enhance Security: Database server only accessible via middle tier Clients cannot directly access database server

VTU-EDUSAT

Page 32

Database Management System

Classification of DBMSs
Based on the data model used Traditional: Relational, Network, Hierarchical. Emerging: Object-oriented, Object-relational. Other classifications Single-user (typically used with personal computers) vs. multi-user (most DBMSs). Centralized (uses a single computer with one database) vs. distributed (uses multiple computers, multiple databases)

Variations of Distributed DBMSs (DDBMSs)


Homogeneous DDBMS Heterogeneous DDBMS Federated or Multidatabase Systems Distributed Database Systems have now come to be known as client-server based database systems because: VTU-EDUSAT Page 33

Database Management System


They do not support a totally distributed environment, but rather a set of database servers supporting a set of clients.

Cost considerations for DBMSs


Cost Range: from free open-source systems to configurations costing millions of dollars Examples of free relational DBMSs: MySQL, PostgreSQL, others

VTU-EDUSAT

Page 34

Database Management System


UNIT -2

Entity-Relationship Model
Introduction to ER Model
ER model is represents real world situations using concepts, which are commonly used by people. It allows defining a representation of the real world at logical level.ER model has no facilities to describe machine-related aspects. In ER model the logical structure of data is captured by indicating the grouping of data into entities. The ER model also supports a top-down approach by which details can be given in successive stages. Entity: An entity is something which is described in the database by storing its data, it may be a concrete entity a conceptual entity. Entity set: An entity set is a collection of similar entities. Attribute: An attribute describes a property associated with entities. Attribute will have a name and a value for each entity. Domain: A domain defines a set of permitted values for a attribute

VTU-EDUSAT

Page 1

Database Management System


SYMBOLS IN E-R DIAGRAM
The ER model is represented using different symbols as shown in Fig .a

VTU-EDUSAT

Page 2

Database Management System


Overview of Database Design Process

Example COMPANY Database


We need to create a database schema design based on the following (simplified) requirements of the COMPANY Database:

The company is organized into DEPARTMENTs. Each department has a name, number and an employee who manages the department. We keep track of the start date of the department manager. A department may have several locations. Each department controls a number of PROJECTs. Each project has a unique name, unique number and is located at a single location. We store each EMPLOYEEs social security number, address, salary, sex, and birth date. Each employee works for one department but may work on several projects. VTU-EDUSAT Page 3

Database Management System


We keep track of the number of hours per week that an employee currently works on each project. We also keep track of the direct supervisor of each employee. Each employee may have a number of DEPENDENTs. For each dependent, we keep track of their name, sex, birth date, and relationship to the employee.

ER Model Concepts
Entities and Attributes Entities are specific objects or things in the mini-world that are represented in the database. For example the EMPLOYEE John Smith, the Research DEPARTMENT, the ProductX PROJECT.

Attributes are properties used to describe an entity. For example an EMPLOYEE entity may have the attributes Name, SSN, Address, Sex, BirthDate .

A specific entity will have a value for each of its attributes. For example a specific employee entity may have Name='John Smith', SSN='123456789', Address ='731, Fondren, Houston, TX', Sex='M', BirthDate='09-JAN-55

Each attribute has a value set (or data type) associated with it e.g. integer, string, subrange, enumerated type,

Types of Attributes
There are two types of Attributes Simple Each entity has a single atomic value for the attribute. For example, SSN or Sex. VTU-EDUSAT Page 4

Database Management System


Composite The attribute may be composed of several components. For example: Address(Apt#, House#, Street, City, State, ZipCode, Country), or Name(FirstName, MiddleName, LastName). Composition may form a hierarchy where some components are themselves composite.

Multi-valued

An entity may have multiple values for that attribute. For example, Color of a CAR or Previous Degrees of a STUDENT. Denoted as {Color} or {Previous Degrees}. In general, composite and multi-valued attributes may be nested arbitrarily to any number of levels, although this is rare. For example, Previous Degrees of a STUDENT is a composite multi-valued attribute denoted by {Previous Degrees (College, Year, Degree, Field)} Multiple Previous Degrees values can exist. Each has four subcomponent attributes: College, Year, Degree, Field

Example of a composite attribute

VTU-EDUSAT

Page 5

Database Management System


Entity Types and Key Attributes
Entities with the same basic attributes are grouped or typed into an entity type. For example, the entity type EMPLOYEE and PROJECT.

An attribute of an entity type for which each entity must have a unique value is called a key attribute of the entity type. For example, SSN of EMPLOYEE.

A key attribute may be composite. Vehicle Tag Number is a key of the CAR entity type with components (Number, State).

An entity type may have more than one key. The CAR entity type may have two keys: VehicleIdentificationNumber (popularly called VIN) VehicleTagNumber (Number, State), license plate number. Each key is underlined

Displaying an Entity type


In ER diagrams, an entity type is displayed in a rectangular box Attributes are displayed in ovals. Each attribute is connected to its entity type Components of a composite attribute are connected to the oval representing the composite attribute. Each key attribute is underlined. Multivalued attributes displayed in double ovals.

VTU-EDUSAT

Page 6

Database Management System


Entity Type CAR with two keys and a corresponding Entity Set

Entity Set
Each entity type will have a collection of entities stored in the database Called the entity set. The above example shows three CAR entity instances in the entity set for CAR Same name (CAR) used to refer to both the entity type and the entity set. Entity set is the current state of the entities of thattype that are stored in the database.

Initial Design of Entity Types for the COMPANY Database Schema


Based on the requirements, we can identify four initial entity types in the COMPANY database: DEPARTMENT PROJECT EMPLOYEE DEPENDENT VTU-EDUSAT Page 7

Database Management System


Their initial design is shown below. The initial attributes shown are derived from the requirements description

Initial Design of Entity Types for the COMPANY Database Schema

VTU-EDUSAT

Page 8

Database Management System

Refining the initial design by introducing relationships


The initial design is typically not complete. Some aspects in the requirements will be represented as relationships.

ER model has three main concepts: Entities (and their entity types and entity sets) Attributes (simple, composite, multi valued) Relationships (and their relationship types and relationship sets)

Relationships and Relationship Types


A relationship relates two or more distinct entities with a specific meaning. For example, EMPLOYEE John Smith works on the ProductX PROJECT, or EMPLOYEE Franklin Wong manages theResearch DEPARTMENT.

Relationships of the same type are grouped or typed into a relationship type. For example, the WORKS_ON relationship type in which EMPLOYEEs and PROJECTs participate, or the MANAGES relationship type in which EMPLOYEEs and DEPARTMENTs participate.

The degree of a relationship type is the number of participating entity type. Both MANAGES and WORKS_ON are binary relationships.

VTU-EDUSAT

Page 9

Database Management System


Relationship instances of the WORKS_FOR N:1 relationship between EMPLOYEE and DEPARTMENT

Relationship instances of the M:N WORKS_ON relationship between EMPLOYEE and PROJECT

VTU-EDUSAT

Page 10

Database Management System


Relationship type vs. relationship set
Relationship Type: Is the schema description of a relationship. Identifies the relationship name and the participating entity types. Also identifies certain relationship constraints.

Relationship Set: The current set of relationship instances represented in the database. The current state of a relationship type. Previous figures displayed the relationship sets Each instance in the set relates individual participating entities one from each participating entity type.

In ER diagrams, we represent the relationship type as follows: Diamond-shaped box is used to display a relationship type. Connected to the participating entity types via straight lines.

Refining the COMPANY database schema by introducing relationships


By examining the requirements, six relationship types are identified. All are binary relationships( degree 2) Listed below with their participating entity types: WORKS_FOR (between EMPLOYEE, DEPARTMENT) MANAGES (also between EMPLOYEE, DEPARTMENT) CONTROLS (between DEPARTMENT, PROJECT) WORKS_ON (between EMPLOYEE, PROJECT) SUPERVISION (between EMPLOYEE (as subordinate), EMPLOYEE (as supervisor)) DEPENDENTS_OF (between EMPLOYEE, DEPENDENT)

VTU-EDUSAT

Page 11

Database Management System


ER DIAGRAM Relationship Types are: WORKS_FOR, MANAGES, WORKS_ON, CONTROLS, SUPERVISION, DEPENDENTS_OF

Relationship Types
In the refined design, some attributes from the initial entity types are refined into relationships:

Manager of DEPARTMENT -> MANAGES Works_on of EMPLOYEE -> WORKS_ON Department of EMPLOYEE -> WORKS_FOR etc

In general, more than one relationship type can exist between the same participating entity types MANAGES and WORKS_FOR are distinct relationship types between EMPLOYEE and DEPARTMENT Different meanings and different relationship instances.

VTU-EDUSAT

Page 12

Database Management System


Recursive Relationship Type
An relationship type whose with the same participating entity type in distinct roles Example: In the SUPERVISION relationship EMPLOYEE participates twice in two distinct roles: supervisor (or boss) role supervisee (or subordinate) role Each relationship instance relates two distinct EMPLOYEE entities: One employee in supervisor role One employee in supervisee role

Weak Entity Types


An entity that does not have a key attribute. A weak entity must participate in an identifying relationship type with an owner or identifying entity type.

Entities are identified by the combination of: A partial key of the weak entity type The particular entity they are related to in the identifying entity type.

Example:
A DEPENDENT entity is identified by the dependents first name, and the specific EMPLOYEE with whom the dependent is related. Name of DEPENDENT is the partial key. DEPENDENT is a weak entity type. EMPLOYEE is its identifying entity type via the identifying relationship type DEPENDENT_OF

Constraints on Relationships
Constraints on Relationship Types (Also known as ratio constraints) VTU-EDUSAT Page 13

Database Management System


Cardinality Ratio (specifies maximum participation) One-to-one (1:1) One-to-many (1:N) or Many-to-one (N:1) Many-to-many (M:N) Existence Dependency Constraint (specifies minimum participation) (also called participation constraint) zero (optional participation, not existence-dependent) one or more (mandatory participation, existence-dependent)

Many-to-one (N:1) Relationship

VTU-EDUSAT

Page 14

Database Management System


Many-to-many (M:N) Relationship

Displaying a recursive relationship


In a recursive relationship type. Both participations are same entity type in different roles. For example, SUPERVISION relationships between EMPLOYEE (in role of supervisor or boss) and (another) EMPLOYEE (in role of subordinate or worker). In following figure, first role participation labeled with 1 and second role participation labeled with 2. In ER diagram, need to display role names to distinguish participations.

VTU-EDUSAT

Page 15

Database Management System


A Recursive Relationship Supervision

VTU-EDUSAT

Page 16

Database Management System


Recursive Relationship Type is: SUPERVISION (participation role names are shown)

Attributes of Relationship types


A relationship type can have attributes: For example, HoursPerWeek of WORKS_ON Its value for each relationship instance describes the number of hours per week that an EMPLOYEE works on a PROJECT.

A value of HoursPerWeek depends on a particular (employee, project) combination Most relationship attributes are used with M:N relationships. In 1:N relationships, they can be transferred to the entity type on the N-side of the relationship.

VTU-EDUSAT

Page 17

Database Management System


Example Attribute of a Relationship Type: Hours of WORKS_ON

Notation for Constraints on Relationships


Cardinality ratio (of a binary relationship): 1:1,1:N, N:1, or M:N Shown by placing appropriate numbers on the relationship edges. Participation constraint (on each participating entity type): total (called existence dependency) or partial. Total shown by double line, partial by single line

Alternative (min, max) notation for relationship structural constraints:


Specified on each participation of an entity type E in a relationship type R Specifies that each entity e in E participates in at least min and at most max relationship instances in R VTU-EDUSAT Page 18

Database Management System


Default(no constraint): min=0, max=n (signifying no limit) Must have minmax, min0, max 1 Derived from the knowledge of mini-world constraints Examples: A department has exactly one manager and an employee can manage at most one department. Specify (0,1) for participation of EMPLOYEE in MANAGES Specify (1,1) for participation of DEPARTMENT in MANAGES An employee can work for exactly one department but a department can have any number of employees. Specify (1,1) for participation of EMPLOYEE in WORKS_FOR Specify (0,n) for participation of DEPARTMENT in WORKS_FOR

The (min , max) notation for relationship constraints

VTU-EDUSAT

Page 19

Database Management System


COMPANY ER Schema Diagram using (min , max) notation

n-ary relationships (n > 2)


In general, 3 binary relationships can represent different information than a single ternary relationship (see Figure 3.17a and b on next slide)

If needed, the binary and n-ary relationships can all be included in the schema design (see Figure 3.17a and b, where all relationships convey different meanings)

In some cases, a ternary relationship can be represented as a weak entity if the data model allows a weak entity type to have multiple identifying relationships (and hence multiple owner entity types) (see Fig 3.17c)

VTU-EDUSAT

Page 20

Database Management System


Example of a ternary relationship

If a particular binary relationship can be derived from a higher-degree relationship at all times, then it is redundant.

For example, the TAUGHT_DURING binary relationship in Figure 3.18 (see next slide) can be derived from the ternary relationship OFFERS (based on the meaning of the relationships)

VTU-EDUSAT

Page 21

Database Management System


Another example of a ternary relationship

Bank Database

VTU-EDUSAT

Page 22

Database Management System


There are three basic notations that the E-R Model employs: 1.Entity Sets. 2.Relationship sets. 3.Attributes.

2.2.2 Entities and Entity sets: An Entity is any object of interest to and organization or for the representation in the database.They represent objects in the real world which is distinguishable from all other objects. For eg: Every person in a college is an entity. Every room in a college is an entity. Associated with an entity is a set of properties.These properties are used to distinguish to from one entity to another entity. For Eg:1.The Attributes of the entity of student are USN,Name,Address. 2.The Attributes of the Entity Of Vehicle are Vehicle no,Make,Capacity. For the purpose of accessing and storing information. Only certain attributes are used.Those attributes which uniquely identify every instance of the entity is termed as primary key.

An Entity which has a set of attributes.Which can uniquely identify all the entities is termed as Strong entity. An entity whose primary key does not determine all the instance of the entity uniquely termed as weak entity.

A collection of similar entities,Which has certain properties which are common forms an entity set for organization such as a college the object of concern include. Student,Teacher,Rooms,Subjects.The collection of similar entities forms entity set. 2.2.3 Attributes.

VTU-EDUSAT

Page 23

Database Management System


An Entity is represented by a set of properties called Attributes.The attributes are useful in describing the properties of each entity in the entity set.

Types of attributes: 1.Simple Attributes: The attributes which cannot be further divided into subparts. Eg; University Seat Number of a student is unique which cannot be further divided. 2. Composite Attributes :The attributes can be further divided into portions. Eg: The attribute name in the Student Database can be further divided into First name,Middle name,Last name. Name Firstname Middle name Last name 3. Single valued attributes : The attribute at any instant contains only a specific value at any instant. for eg The USN is unique 4.Multivalued Attributes; Certain attributes for example the dependent name in the policy database may have set of values assigned to it.There may be more than one dependent for a single policy holder. 5.Stored Attributes:For a person entity,the value of age can be determined from the current date and the value of that persons birthdate .The Age attribute is hence derived attribute and is said to be derivable from the birthdate attributes,which is called a stored attributes. 6.NULL Attributes: A NULL value attribute is used when an attributes does not have any values.

Data integrity
Data is accepted based on certain rules & there fore data is valid. Enforcing data integrity ensures that the data in the database is valid and correct. Keys play an important role in maintaining data integrity.

The various types of keys that have been identified are : Candidate key Primary key Alternate key Composite key Foreign Key VTU-EDUSAT Page 24

Database Management System


Candidate key

An attribute or set of attributes that uniquely identifies a row is called a Candidate key.
This attribute has values that are unique Vehicle

Primary Key

The Candidate key that you choose to identify each row uniquely is called the Primary key.
Alternate Key

A Candidate key that is not chosen as a Primary key is an Alternate key.


Composite Key

In certain tables, a single attribute cannot be used to identify rows uniquely and a combination of two or more attributes is used as a Primary key. Such keys are called Composite keys.

Purchase

Foreign Key

When a primary key of one table appears as an attribute in another table, it is called the Foreign key in the second table A foreign key is used to relate two tables.
Weak entity:

VTU-EDUSAT

Page 25

Database Management System


A weak entity does not have a distinguishing attribute of its own and mostly are dependent entities, which are part of some another entity. A weak entity will always be related to one or more strong entities. They can be also understood as multi-valued attributes.

Relationships
A relationship type is a meaningful association between entity types A relationship is an association of entities where the association includes one entity from each participating entity type. Relationship types are represented on the ER diagram by a series of lines. As always, there are many notations in use today... In the original Chen notation, the relationship is placed inside a diamond, e.g. managers manage employees:

Figure : Chens notation for relationships For this module, we will use an alternative notation, where the relationship is a label on the line. The meaning is identical

Figure : Relationships used in this document

2.3.1 Degree of a Relationship


The number of participating entities in a relationship is known as the degree of the relationship.

VTU-EDUSAT

Page 26

Database Management System


If there are two entity types involved it is a binary relationship type

Figure : Binary Relationships

If there are three entity types involved it is a ternary relationship type

Figure : Ternary relationship

It is possible to have a n-array relationship (e.g. quaternary or unary). Unary relationships are also known as a recursive relationship.

Figure : Recursive relationship

It is a relationship where the same entity participates more than once in different roles. In the example above we are saying that employees are managed by employees. If we wanted more information about who manages whom, we could introduce a second entity type called manager.

2.3.2 Replacing ternary relationships

VTU-EDUSAT

Page 27

Database Management System


When a ternary relationship occurs in an ER model they should always be removed before finishing the model. Sometimes the relationships can be replaced by a series of binary relationships that link pairs of the original ternary relationship.

Figure : A ternary relationship example

This can result in the loss of some information - It is no longer clear which sales assistant sold a customer a particular product. Try replacing the ternary relationship with an entity type and a set of binary relationships. Relationships are usually verbs, so name the new entity type by the relationship verb rewritten as a noun. The relationship sells can become the entity type sale.

Figure : Replacing a ternary relationship

So a sales assistant can be linked to a specific customer and both of them to the sale of a particular product. This process also works for higher order relationships.

Cardinality
Relationships are rarely one-to-one For example, a manager usually manages more than one employee VTU-EDUSAT Page 28

Database Management System


This is described by the cardinality of the relationship, for which there are four possible categories. One to one (1:1) relationship One to many (1:m) relationship Many to one (m:1) relationship Many to many (m:n) relationship On an ER diagram, if the end of a relationship is straight, it represents 1, while a "crow's foot" end represents many. A one to one relationship - a man can only marry one woman, and a woman can only marry one man, so it is a one to one (1:1) relationship

Figure : One to One relationship example

A one to may relationship - one manager manages many employees, but each employee only has one manager, so it is a one to many (1:n) relationship

Figure : One to Many relationship example

A many to one relationship - many students study one course. They do not study more than one course, so it is a many to one (m:1) relationship

Figure : Many to One relationship example

A many to many relationship - One lecturer teaches many students and a student is taught by many lecturers, so it is a many to many (m:n) relationship

Figure : Many to Many relationship example

VTU-EDUSAT

Page 29

Database Management System

2.3.4 Optionality
A relationship can be optional or mandatory. If the relationship is mandatory an entity at one end of the relationship must be related to an entity at the other end. The optionality can be different at each end of the relationship For example, a student must be on a course. This is mandatory. To the relationship `student studies course' is mandatory. But a course can exist before any students have enrolled. Thus the relationship `course is_studied_by student' is optional. To show optionality, put a circle or `0' at the `optional end' of the relationship. As the optional relationship is `course is_studied_by student', and the optional part of this is the student, then the `O' goes at the student end of the relationship connection.

Figure : Optionality example

It is important to know the optionality because you must ensure that whenever you create a new entity it has the required mandatory links.

2.4.1Entities
Bus - Company owns busses and will hold information about them. Route - Buses travel on routes and will need described. Town - Buses pass through towns and need to know about them Driver - Company employs drivers, personnel will hold their data.

VTU-EDUSAT

Page 30

Database Management System


Stage - Routes are made up of stages Garage - Garage houses buses, and need to know where they are. A bus is allocated to a route and a route may have several buses. Bus-route (m:1) is serviced by A route comprises of one or more stages. route-stage (1:m) comprises One or more drivers are allocated to each stage. driver-stage (m:1) is allocated . A stage passes through some or all of the towns on a route. stage-town (m:n) passes-through A route passes through some or all of the towns route-town (m:n) passes-through Some of the towns have a garage garage-town (1:1) is situated A garage keeps buses and each bus has one `home' garage garage-bus (m:1) is garaged

VTU-EDUSAT

Page 31

Database Management System


E-R Diagram

Figure : Bus Company

Attributes
Bus (reg-no,make,size,deck,no-pass) Route (route-no,avg-pass) Driver (emp-no,name,address,tel-no) Town (name) Stage (stage-no) Garage (name,address) Example: Entity and Relationship sets for the hospital called General Hospital, Patients, Doctors, Beds, Examines, Bed Assigned, Accounts, has Account. patients, entity set with attributes SSNo, LastName, FirstName, HomePhone, Sex, DateofBirth, Age, Street, City, State, Zip. doctors, entity set with attributes SSNo, LastName, FirstName, OfficePhone, Pager, Specialty. examines, relational set with attributes Date, Time, Diagnosis, Fee. beds, entity set with attributes RoomNumber, BedNumber, Type, Status, PricePerHour. VTU-EDUSAT Page 32

Database Management System


Bed_assigned, relational set with attributes DateIn, TimeIn, DateOut, TimeOut, Amount. accounts, weak entity set with attributes DateIn, DateOut, Amount. has_account, relational set with no Attributes

2.5 Constructing an ER model


Before beginning to draw the ER model, read the requirements specification carefully. Document any assumptions you need to make. 1. Identify entities - list all potential entity types. These are the object of interest in the system. It is better to put too many entities in at this stage and them discard them later if necessary.

VTU-EDUSAT

Page 33

Database Management System


2. Remove duplicate entities - Ensure that they really separate entity types or just two names for the same thing.
o o o

Also do not include the system as an entity type e.g. if modelling a library, the entity types might be books, borrowers, etc. The library is the system, thus should not be an entity type.

3. List the attributes of each entity (all properties to describe the entity which are relevant to the application).
o o o o

Ensure that the entity types are really needed. are any of them just attributes of another entity type? if so keep them as attributes and cross them off the entity list. Do not have attributes of one entity as attributes of another entity!

4. Mark the primary keys.


o o

Which attributes uniquely identify instances of that entity type? This may not be possible for some weak entities.

5. Define the relationships


o

Examine each entity type to see its relationship to the others.

6. Describe the cardinality and optionality of the relationships


o

Examine the constraints between participating entities.

7. Remove redundant relationships


o

Examine the ER model for redundant relationships.

ER modelling is an iterative process, so draw several versions, refining each one until you are happy with it. Note that there is no one right answer to the problem, but some solutions are better than others! Overview

construct an ER model understand the problems associated with ER models understand the modelling concepts of Enhanced ER modelling

VTU-EDUSAT

Page 34

Database Management System


Types of Data Integrity
Data Integrity falls into the following categories
Entity integrity

Entity integrity ensures that each row can be uniquely identified by an attribute called the Primary key. The Primary key cannot have a NULL value.
Domain integrity

Domain integrity refers to the range of valid entries for a given column. It ensures that there are only valid entries in the column.
Referential integrity

Referential integrity ensures that for every value of a Foreign key, there is a matching value of the Primary key.

2.7 Relational database


Relations can be represented as two-dimensional data tables with rows and columns The rows of a relation are called tuples. The columns of a relation are called attributes. The attributes draw values from a domain (a legal pool of values). The number of tuples in a relation is called its cardinality while the number of attributes in a relation is called its degree A relation also consists of a schema and an instance Schema defines the structure of a relation which consists of a fixed set of attributedomain pairs. An instance of a relation is a time-varying set of tuples where each tuple consists of attribute-value pairs. VTU-EDUSAT Page 35

Database Management System


UNIT -3

The Relational Data Model and Relational Database


Relational Model Concepts
The relational Model of Data is based on the concept of a Relation. A Relation is a mathematical concept based on the ideas of sets. The strength of the relational approach to data management comes from the formal foundation provided by the theory of relations. The model was first proposed by Dr. E.F. Codd of IBM in 1970 in the following paper: "A Relational Model for Large Shared Data Banks," Communications of the ACM, June 1970.

Informal Definitions
RELATION:
A Relation is table of values. A relation may be thought of as a set of rows. A relation may alternately be though of as a set of columns. Each row represents a fact that corresponds to a real-world entity or relationship. Each row has a value of an item or set of items that uniquely identifies that row in the table. Sometimes row-ids or sequential numbers are assigned to identify the rows in the table. Each column typically is called by its column name or column header or attribute name.

Formal definitions
A Relation may be defined in multiple ways. The Schema of a Relation: R (A1, A2, .....An) Relation schema R is defined over attributes A1, A2, .....An.
For Example -

CUSTOMER (Cust-id, Cust-name, Address, Phone#) Here, CUSTOMER is a relation defined over the four attributes Cust-id, Cust-name, Address, Phone#, each of which has a domain or a set of valid values. For example, the domain of Cust-id is 6 digit numbers. VTU-EDUSAT Page 1

Database Management System


A tuple is an ordered set of values.Each value is derived from an appropriate domain. Each row in the CUSTOMER table may be referred to as a tuple in the table and would consist of four values. <632895, "John Smith", "101 Main St. Atlanta, GA 30332", "(404) 894-2000"> is a tuple belonging to the CUSTOMER relation. A relation may be regarded as a set of tuples (rows). Columns in a table are also called attributes of the relation. A domain has a logical definition: e.g., USA_phone_numbers are the set of 10 digit phone numbers valid in the U.S. A domain may have a data-type or a format defined for it. The USA_phone_numbers may have a format: (ddd)-ddd-dddd where each d is a decimal digit. E.g., Dates have various formats such as monthname, date, year or yyyy-mm-dd, or dd mm,yyyy etc. An attribute designates the role played by the domain. E.g., the domain Date may be used to define attributes Invoice-date and Payment-date. The relation is formed over the cartesian product of the sets; each set has values from a domain; that domain is used in a specific role which is conveyed by the attribute name. For example, attribute Cust-name is defined over the domain of strings of 25 characters. The role these strings play in the CUSTOMER relation is that of the name of customers. Formally, Given R(A1, A2, .........., An) r(R) dom (A1) X dom (A2) X ....X dom(An) R: schema of the relation r of R: a specific "value" or population of R. R is also called the intension of a relation VTU-EDUSAT Page 2

Database Management System


r is also called the extension of a relation Let S1 = {0,1} Let S2 = {a,b,c} Let R S1 X S2 Then for example: r(R) = {<0,a> , <0,b> , <1,c> } has three tuples. is one possible state or population or extension r of the relatio relation n R, defined over domains S1 and S2. It

Example

Characteristics of Relations elations


Ordering of tuples in a relation r(R) r(R): The tuples are not considered to be ordered, even though they appear to be in the tabular form. Ordering of attributes in a relation schema R (and of values within each tuple): We will consider the attributes in R(A R(A1, A2, ..., An) and the values in t=<v1 1, v2, ..., vn> to be ordered . (However, a more general alternative definition of relation does not require this t ordering). Values in a tuple: : All values are considered atomic (indivisible). A special null value is used to represent values that are unknown or inapplicable to certain tuples. Notation: We refer to component values of a tuple t by t[Ai] = vi (the value of attribute Ai A for tuple t). VTU-EDUSAT Page 3

Database Management System


Similarly, t[Au, Av, , ..., A Aw] ] refers to the subtuple of t containing the values of attributes Au, Av, ..., Aw w, respectively.

Relational Integrity Constraints


Constraints are conditions that must hold on all valid relation instances. There are three main types of constraints: 1. Key constraints 2. Entity integrity constraints 3. Referential integrity constraints Superkey of R: A set of attributes SK of R such that no two tuples in any valid relation instance e r(R) will have the same value for SK. That is, for any distinct tuples t1 and t2 in r(R), t1[SK] t2[SK]. Key of R: A "minimal" superkey; that is, a superkey K such that removal of any attribute from K results in a set of attributes that is not a superkey. Example: The CAR relation schema: CAR(State, Reg#, SerialNo, Make, Model, Year) has two keys Key1 = {State, Reg#}, Key2 = {SerialNo}, which are also superkeys. {SerialNo, Make} is a superkey but not a key. If a relation has several candidate keys, , one is chosen arbitrarily to be the primary key. . The primary key attributes are underlined.

VTU-EDUSAT

Page 4

Database Management System

Entity Integrity
Relational Database Schema Schema: : A set S of relation schemas that belong to the same database. S is the name of the database. S = {R1, R2, ..., Rn} Entity Integrity: The primary key attributes PK of each relation schema R in S cannot have null values in any tuple of r(R). This is because primary key values are used to identify the individual tuples. t[PK] null for any tuple t in r(R) Note: Other attributes of R may be similarly constrained to disallow null values, even though they are not members of the primary key.

Referential Integrity
The initial design is typically not complete complete. Some aspects in the requirements will be represented as relationships relationships. ER model has three main concepts: VTU-EDUSAT Page 5

Database Management System


Entities (and their entity types and entity sets) Attributes (simple, composite, multi valued) Relationships (and their relationship types and relationship sets)

Referential Integrity Constraint


Statement of the constraint The value in the foreign key column (or columns) FK of the the referencing relation R1 can be either: (1) a value of an existing primary key value of the corresponding primary key PK in the referenced relation R2,, or.. (2) a null. In case (2), the FK in R1 should not be a part of its own primary key.

Other Types of Constraints Semantic Integrity Constraints:


It is based on application semantics and cannot be expressed by the model per se E.g., the max. no. of hours per employee for all projects he or she works on is 56 hrs per week A constraint specification language may have to be used to express these SQL-99 allows triggers and ASSERTIONS to allow for some of these.

VTU-EDUSAT

Page 6

Database Management System

VTU-EDUSAT

Page 7

Database Management System

VTU-EDUSAT

Page 8

Database Management System


Update Operations on Relations
1. INSERT a tuple 2. DELETE a tuple 3. MODIFY a tuple

Update Operations on Relations


Integrity constraints should not be violated by the update operations. Several update operations may have to be grouped together. Updates may propagate to cause other updates automatically. This may be necessary to maintain integrity constraints. In case of integrity violation, several actions can be taken: 1. Cancel the operation that causes the violation (REJECT option) 2. Perform the operation but inform the user of the violation 3. Trigger additional updates so the violation is corrected (CASCADE option, SET NULL option) 4. Execute a user-specified error-correction routine

The Relational Algebra and Relational Calculus Introduction


Relational Algebra is a procedural language used for manipulating relations. The relational model gives the structure for relations so that data can be stored in that format but relational algebra enables us to retrieve information from relations. Some advanced SQL queries requires explicit relational algebra operations, most commonly outer join. Relations are seen as sets of tuples, which means that no duplicates are allowed. SQL behaves differently in some cases. Remember the SQL keyword distinct. SQL is declarative, which means that you tell the DBMS what you want.

VTU-EDUSAT

Page 9

Database Management System


Set operations
Relations in relational algebra are seen as sets of tuples, so we can use basic set operations. Review of concepts and operations from set theory Set Element No duplicate elements No order among the elements Subset Proper subset (with fewer elements) Superset Union Intersection Set Difference Cartesian product

Relational Algebra
Relational Algebra consists of several groups of operations Unary Relational Operations SELECT (symbol: s (sigma)) PROJECT (symbol: (pi)) RENAME (symbol: (rho))

Relational Algebra Operations From Set Theory UNION ( U ), INTERSECTION ( ), DIFFERENCE (or MINUS, ) CARTESIAN PRODUCT ( x )

Binary Relational Operations JOIN (several variations of JOIN exist) DIVISION

Additional Relational Operations OUTER JOINS, OUTER UNION VTU-EDUSAT Page 10

Database Management System


AGGREGATE FUNCTIONS

Unary Relational Operations


SELECT (symbol: s (sigma)) PROJECT (symbol: (pi)) RENAME (symbol: (rho))

SELECT
The SELECT operation (denoted by (sigma)) is used to select a subset of the tuples from a relation based on a selection condition condition. The selection condition acts as a filter and keeps eeps only those tuples that satisfy the qualifying condition condition. Tuples satisfying the condition are selected wher whereas eas the other tuples are discarded (filtered out) Database State for COMPANY

VTU-EDUSAT

Page 11

Database Management System


Examples:
Select the EMPLOYEE tuples whose department number is 4:

DNO = 4 (EMPLOYEE) Select the employee tuples whose salary is greater than $30,000: SALARY > 30,000 (EMPLOYEE) In general, the select operation is denoted by <selection condition>(R) where the symbol (sigma) is used to denote the select operator the selection condition is a Boolean (conditional) expression specified on the attributes of relation R tuples that make the condition true are selected (appear in the result of the operation) tuples that make the condition false are filtered out (discarded from the result of the operation) The Boolean expression specified in <selection condition> is made up of a number of clauses of the form: <attribute name> <comparison op> <constant value> or <attribute name> <comparison op> <attribute name> Where <attribute name> is the name of an attribute of R, <comparison op> id normally one of the operations {=,>,>=,<,<=,!=} Clauses can be arbitrarily connected by the Boolean operators and, or and not For example, To select the tuples for all employees who either work in department 4 and make over $25000 per year, or work in department 5 and make over $30000, the select operation should be:

(DNO=4 AND Salary>25000 ) OR (DNO=5 AND Salary>30000 ) (EMPLOYEE)

VTU-EDUSAT

Page 12

Database Management System


The following query results refer to this database

Examples of applying SELECT and PROJECT operations

VTU-EDUSAT

Page 13

Database Management System

SELECT Operation Properties SELECT s is commutative: <condition1>( < condition2> (R)) = <condition2> ( < condition1> (R)) A cascade of SELECT operations may be replaced by a single selection with a conjunction of all the conditions: <cond1>(< cond2> (<cond3>(R)) = <cond1> AND < cond2> AND < cond3>(R)
PROJECT
PROJECT Operation is denoted by p (pi) If we are interested in only certain attributes of relation, we use PROJECT This operation keeps certain columns (attributes) from a relation and discards the other columns. PROJECT creates a vertical partitioning The list of specified columns (attributes) is kept in each tuple. tuple The other attributes in each tuple are discarded discarded. VTU-EDUSAT Page 14

Database Management System


Example: To list each employees first and last name and salary, the following is used:
LNAME, FNAME,SALARY(EMPLOYEE)

Examples of applying SELECT and PROJECT operations

Single expression versus sequence of relational operations


We may want to apply several relational algebra operations one after the other. other Either we can write the operations as a single relational algebra expression by nesting the operations, or We can apply one operation at a time and create intermediate result relations. relations In the latter case, we must give names to the relations that hold the intermediate results. To retrieve the first name, last name, and salary of all employees who work in department number 5, we must apply a select and a project operation We can write a single relational algebra expression as follows: VTU-EDUSAT Page 15

Database Management System


FNAME, LNAME, SALA SALARY( DNO=5(EMPLOYEE)) OR We can explicitly show the sequence of operations, giving a name to each intermediate relation: DEP5_EMPS DNO=5(EMPLOYEE) RESULT FNAME, LNAME, SALARY (DEP5_EMPS)

Example of applying multiple operations and RENAME

RENAME
The RENAME operator is denoted by (rho) In some cases, we may want to rename the attributes of a relation or the relation name or both Useful when a query requires multiple operations Necessary in some cases (see JOIN operation later) RENAME operation which can rename either the relation name or the attribute names, or both

VTU-EDUSAT

Page 16

Database Management System


The general RENAME operation can be expressed by any of the following forms: S(R) changes: the relation name only to S (B1, B2, , Bn )(R) changes: the column (attribute) names only to B1, B1, ..Bn S (B1, B2, , Bn )(R) changes both: the relation name to S, and the column (attribute) names to B1, B1, ..Bn

Relational Algebra Operations from Set Theory


Union Intersection Minus Cartesian Product

UNION
It is a Binary operation, denoted by U The result of R S, is a relation that includes all tuples that are either in R or in S or in both R and S Duplicate tuples are eliminated The two operand relations R and S must be type compatible (or UNION compatible) R and S must have same number of attributes Each pair of corresponding attributes must be type compatible (have same or compatible domains) Example: To retrieve the social security numbers of all employees who either work in department 5 (RESULT1 below) or directly supervise an employee who works in department 5 (RESULT2 below)

VTU-EDUSAT

Page 17

Database Management System


DEP5_EMPS sDNO=5 (EMPLOYEE) RESULT1 p SSN(DEP5_EMPS) RESULT2 pSUPERSSN(DEP5_EMPS) RESULT RESULT1 U RESULT2 The union operation produces the tuples that are in either RESULT1 or RESULT2 or both. The following query results refer to this database state state.

Example of the result of a UNION operation UNION Example

VTU-EDUSAT

Page 18

Database Management System

INTERSECTION INTERSECTION is denoted by


The result of the operation R S, is a relation that includes all tuples that are in both R and S The attribute names in the result will be the same as the attribute names in R The two operand relations R and S must be type compatible

SET DIFFERENCE SET DIFFERENCE (also called MINUS or EXCEPT) is denoted by


The result of R S, is a relation that includes all tuples that are in R but not in S The attribute names in the result will be the same as the attribute names in R The two operand relations R and S must be type compatible

VTU-EDUSAT

Page 19

Database Management System


Example to illustrate the result of UNION, INTERSECT, and DIFFERENCE

Some properties of UNION, INTERSECT, and DIFFERENCE


Notice that both union and intersection are commutative operations; that is R S = S R, and R S = S R Both union and intersection can be treated as n n-ary ary operations applicable to any number of relations as both are associative operations; that is R (S T) = (R S) T (R S) T = R (S T) The minus operation is not commutative; that is, in gener general

VTU-EDUSAT

Page 20

Database Management System


RSSR

CARTESIAN PRODUCT

CARTESIAN PRODUCT Operation


This operation is used to combine tuples from two relations in a combinatorial fashion. Denoted by R(A1, A2, . . ., An) x S(B1, B2, . . ., Bm) Result is a relation Q with degree n + m attributes: Q(A1, A2, . . ., An, B1, B2, . . ., Bm), in that order. The resulting relation state has one tuple for each combination of tuplesone from R and one from S. Hence, if R has nR tuples (denoted as |R| = nR ), and S has nS tuples, then R x S will have nR * nS tuples. The two operands do NOT have to be "type compatible Generally, CROSS PRODUCT is not a meaningful operation Can become meaningful when followed by other operations Example (not meaningful): FEMALE_EMPS SEX=F(EMPLOYEE) EMPNAMES FNAME, LNAME, SSN (FEMALE_EMPS) EMP_DEPENDENTS EMPNAMES x DEPENDENT

VTU-EDUSAT

Page 21

Database Management System


The following query results refer to this database state

VTU-EDUSAT

Page 22

Database Management System


Example of applying CARTESIAN PRODUCT

Example of applying CARTESIAN PRODUCT


To keep only combinations where the DEPENDENT is related to the EMPLOYEE, we add a SELECT operation as follows Add:

ACTUAL_DEPS SSN=ESSN(EMP_DEPENDENTS) RESULT FNAME, LNAME, DEPENDENT_NAME (ACTUAL_DEPS) Binary Relational Operations Division Join

VTU-EDUSAT

Page 23

Database Management System

Division
Interpretation of the division operation A/B: - Divide the attributes of A into 2 sets: A1 and A2. - Divide the attributes of B into 2 sets: B2 and B3. - Where the sets A2 and B2 have the same attributes. - For each set of values in B2: - Search in A2 for the sets of rows (having the same A1 values) whose A2 values (taken together) form a set which is the same as the set of B2s. - For all the set of rows in A which satisfy the above search, pick out their A1 values and put them in the answer.

VTU-EDUSAT

Page 24

Database Management System

VTU-EDUSAT

Page 25

Database Management System

JOIN
JOIN Operation (denoted by )

The sequence of CARTESIAN PRODECT followed by SELECT is used quite commonly to identify and select related tuples from two relations This operation is very important for any relational database with more than a single relation, lation, because it allows us combine related tuples from various relations The general form of a join operation on two relations R(A1, A2, . . ., An) and S(B1, B2, . . ., Bm) is: R expressions. Example: Suppose that we want to retrieve the name of the manager of each department. VTU-EDUSAT Page 26
<join condition>S

where R and S can be any relations that result from general relational algebra

Database Management System


To get the managers name, we need to combine each DEPARTMENT tuple with the EMPLOYEE tuple whose SSN value matches the MGRSSN val value ue in the department tuple. DEPT_MGR DEPARTMENT
MGRSSN=SSN EMPLOYEE

The following query results refer to this database state

Example of applying the JOIN operation DEPT_MGR DEPARTMENT


MGRSSN=SSN

EMPLOYEE

VTU-EDUSAT

Page 27

Database Management System

The general case of JOIN operation is called a Theta Theta-join: R


theta

The join condition is called theta Theta can be any general boolean expression on the attributes of R and S; for example: R.Ai<S.Bj AND (R.Ak=S.Bl OR R.Ap<S.Bq)

EQUIJOIN
The most common use of join involves join conditions with equality comparisons only Such a join, where the only comparison operator used is =, is called an EQUIJOIN. The JOIN seen in the previous example was an EQUIJOIN

NATURAL JOIN
Another variation of JOIN called NATURAL JOIN denoted by * It was created to get rid of the second (superfluous) attribute in an EQUIJOIN condition. Another example: Q R(A,B,C,D) * S(C,D,E) The implicit join condition includes each pair of attributes with the same name, ANDed NDed together: R.C=S.C AND R.D = S.D Result keeps only one attribute of each such pair: Q(A,B,C,D,E)

VTU-EDUSAT

Page 28

Database Management System


Example: To apply a natural join on the DNUMBER attributes of DEPARTMENT and DEPT_LOCATIONS, it is sufficient to write: DEPT_LOCS DEPARTMENT * DEPT_LO DEPT_LOCATIONS Only attribute with the same name is DNUMBER An implicit join condition is created based on this attribute: DEPARTMENT.DNUMBER=DEPT_LOCATIONS.DNUMBER The following query results refer to this database state

VTU-EDUSAT

Page 29

Database Management System


Example of NATURAL JOIN operation

Complete Set of Relational Operations


The set of operations including SELECT , PROJECT , UNION U, DIFFERENCE - , RENAME , and CARTESIAN PRODUCT X is called a complete set because any other relational algebra expression can be expressed by a combination of these five operations. For example: R S = (R U S ) ((R - S) U (S - R)) R
<join condition>S

= <join condition> (R X S)

VTU-EDUSAT

Page 30

Database Management System


Recap of Relational Algebra Operations

NATURAL JOIN

Example: To apply a natural join on the DNUMBER attributes of DEPARTMENT and DEPT_LOCATIONS, it is sufficient to write: DEPT_LOCS DEPARTMENT * DEPT_LOCATIONS Only attribute with the same name is DNUMBER An implicit join condition is created based on this attribute: DEPARTMENT.DNUMBER=DEPT_LOCATIONS.DNUMBER VTU-EDUSAT Page 31

Database Management System


Aggregate Functions and Grouping
A type of request that cannot be expressed in the basic relational algebra is to specify mathematical aggregate functions on collections of values from the database. Examples of such functions include retrieving the average or total salary of all employees or the total number of employee tuples. Common functions applied to collections of numeric values include SUM, AVERAGE, MAXIMUM, and MINIMUM. The COUNT function is used for counting tuples or values. Use of the Aggregate Functional operation MAX Salary (EMPLOYEE) retrieves the maximum salary value from the EMPLOYEE relation MIN Salary (EMPLOYEE) retrieves the minimum Salary value from the EMPLOYEE relation SUM Salary (EMPLOYEE) retrieves the sum of the Salary from the EMPLOYEE relation COUNT SSN, AVERAGE Salary (EMPLOYEE) computes the count (number) of employees and their average salary

Additional Relational Operations Outer Join


The OUTER JOIN Operation

VTU-EDUSAT

Page 32

Database Management System


In NATURAL JOIN and EQUIJOIN, tuples without a matching (or related) tuple are eliminated from the join result Tuples with null in the join attributes are also eliminated This amounts to loss of information. A set of operations, called OUTER joins, can be used when we want to keep all the tuples in R, or all those in S, or all those in both relations in the result of the join, regardless of whether or not they have matching tuples in the other relation. The left outer join operation keeps every tuple in the first or left relation R in R or padded with null values. A similar operation, right outer join, keeps every tuple in the second or right relation S in the result of R S. S;

if no matching tuple is found in S, then the attributes of S in the join result are filled

A third operation, full outer join, denoted by keeps all tuples in both the left and the right relations when no matching tuples are found, padding them with null values as needed.

VTU-EDUSAT

Page 33

Database Management System


Left Outer Join
E.g. List all employees and the department they manage, if they manage a department.

Outer join

Left outer,rightouter and full outer join

VTU-EDUSAT

Page 34

Database Management System

Examples of Queries in Relational Algebra Q1: Retrieve the name and address of all employees who work for the Research department.

RESEARCH_DEPT DNAME=Research (DEPARTMENT) RESEARCH_EMPS (RESEARCH_DEPT


DNUMBER= DNOEMPLOYEE

EMPLOYEE)

RESULT FNAME, LNAME, ADDRESS (RESEARCH_EMPS)

Q6: Retrieve the names of employees who have no dependents. ALL_EMPS SSN(EMPLOYEE)
EMPS_WITH_DEPS(SSN) ESSN(DEPENDENT) EMPS_WITHOUT_DEPS (ALL_EMPS - EMPS_WITH_DEPS) RESULT LNAME, FNAME (EMPS_WITHOUT_DEPS * EMPLOYEE)

VTU-EDUSAT

Page 35

SQL is a standard language for accessing and manipulating databases.

What is SQL?
SQL stands for Structured Query Language SQL lets you access and manipulate databases SQL is an ANSI (American National Standards Institute) standard

What Can SQL do?


SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL can can can can can can can can can can execute queries against a database retrieve data from a database insert records in a database update records in a database delete records from a database 2i 39 h 0 create new databases create new tables in a database create stored procedures in a database create views in a database set permissions on tables, procedures, and views

Database Tables
A database most often contains one or more tables. Each table is identified by a name (e.g. "Customers" or "Orders"). Tables contain records (rows) with data. Below is an example of a table called "Persons": P_Id 1 2 3 LastName Kumari Kumar Gubbi FirstName Mounitha Pranav Sharan Address VPura Yelhanka Hebbal City Bangalore Bangalore Tumkur

The table above contains three records (one for each person) and five columns (P_Id, LastName, FirstName, Address, and City).

SQL Statements
Most of the actions you need to perform on a database are done with SQL statements. The following SQL statement will select all the records in the "Persons" table:

SELECT * FROM Persons


In this tutorial we will teach you all about the different SQL statements.

Keep in Mind That...


SQL is not case sensitive

Semicolon after SQL Statements?


Some database systems require a semicolon at the end of each SQL statement. Semicolon is the standard way to separate each SQL statement in database systems that allow more than one SQL statement to be executed in the same call to the server. We are using MS Access and SQL Server 2000 and we do not have to put a semicolon after each SQL statement, but some database programs force you to use it.

SQL DML and DDL


SQL can be divided into two parts: The Data Manipulation Language (DML) and the Data Definition Language (DDL). The query and update commands form the DML part of SQL:

SELECT - extracts data from a database UPDATE - updates data in a database DELETE - deletes data from a database INSERT INTO - inserts new data into a database

The DDL part of SQL permits database tables to be created or deleted. It also define indexes (keys), specify links between tables, and impose constraints between tables. The most important DDL statements in SQL are:

CREATE DATABASE - creates a new database ALTER DATABASE - modifies a database CREATE TABLE - creates a new table ALTER TABLE - modifies a table DROP TABLE - deletes a table

The SQL SELECT Statement


The SELECT statement is used to select data from a database. The result is stored in a result table, called the result-set.

SQL SELECT Syntax SELECT column_name(s) FROM table_name


and

SELECT * FROM table_name


Note: SQL is not case sensitive. SELECT is the same as select.

An SQL SELECT Example


The "Persons" table: P_Id 1 2 3 LastName Kumari Kumar Gubbi FirstName Mounitha Pranav Sharan Address VPura Yelhanka Hebbal City Bangalore Bangalore Tumkur

Now we want to select the content of the columns named "LastName" and "FirstName" from the table above. We use the following SELECT statement:

SELECT LastName,FirstName FROM Persons


The result-set will look like this: LastName Kumari Kumar Gubbi FirstName Mounitha Pranav Sharan

SELECT * Example
Now we want to select all the columns from the "Persons" table. We use the following SELECT statement:

SELECT * FROM Persons


Tip: The asterisk (*) is a quick way of selecting all columns! The result-set will look like this: P_Id 1 2 3 LastName Kumari Kumar Gubbi FirstName Mounitha Pranav Sharan Address VPura Yelhanka Hebbal City Bangalore Bangalore Tumkur

SQL SELECT DISTINCT Statement


This chapter will explain the SELECT DISTINCT statement.

The SQL SELECT DISTINCT Statement


In a table, some of the columns may contain duplicate values. This is not a problem, however, sometimes you will want to list only the different (distinct) values in a table. The DISTINCT keyword can be used to return only distinct (different) values.

SQL SELECT DISTINCT Syntax SELECT DISTINCT column_name(s) FROM table_name

SELECT DISTINCT Example


The "Persons" table: P_Id 1 2 3 LastName Kumari Kumar Gubbi FirstName Mounitha Pranav Sharan Address VPura Yelhanka Hebbal City Bangalore Bangalore Tumkur

Now we want to select only the distinct values from the column named "City" from the table above. We use the following SELECT statement:

SELECT DISTINCT City FROM Persons


The result-set will look like this: City Bangalore Tumkur

The WHERE clause is used to filter records.

The WHERE Clause


The WHERE clause is used to extract only those records that fulfill a specified criterion.

SQL WHERE Syntax SELECT column_name(s) FROM table_name WHERE column_name operator value

WHERE Clause Example

The "Persons" table: P_Id 1 2 3 LastName Kumari Kumar Gubbi FirstName Mounitha Pranav Sharan Address VPura Yelhanka Hebbal City Bangalore Bangalore Tumkur

Now we want to select only the persons living in the city "Bangalore" from the table above. We use the following SELECT statement:

SELECT * FROM Persons WHERE City='Bangalore'


The result-set will look like this: P_Id 1 2 LastName Kumari Kumar FirstName Mounitha Pranav Address VPura Yelhanka City Bangalore Bangalore

Quotes Around Text Fields


SQL uses single quotes around text values (most database systems will also accept double quotes). Although, numeric values should not be enclosed in quotes. For text values:

This is correct: SELECT * FROM Persons WHERE FirstName='Pranav' This is wrong: SELECT * FROM Persons WHERE FirstName=Pranav
For numeric values:

This is correct: SELECT * FROM Persons WHERE Year=1965 This is wrong: SELECT * FROM Persons WHERE Year='1965'

Operators Allowed in the WHERE Clause


With the WHERE clause, the following operators can be used: Operator Description = <> > < Equal Not equal Greater than Less than

>= <= LIKE IN

Greater than or equal Less than or equal Search for a pattern If you know the exact value you want to return for at least one of the columns

BETWEEN Between an inclusive range

Note: In some versions of SQL the <> operator may be written as !

SQL AND & OR Operators


The AND & OR operators are used to filter records based on more than one condition.

The AND & OR Operators


The AND operator displays a record if both the first condition and the second condition is true. The OR operator displays a record if either the first condition or the second condition is true.

AND Operator Example


The "Persons" table: P_Id 1 2 3 LastName Kumari Kumar Gubbi FirstName Mounitha Pranav Sharan Address VPura Yelhanka Hebbal City Bangalore Bangalore Tumkur

Now we want to select only the persons with the first name equal to "Pranav" AND the last name equal to "Kumar": We use the following SELECT statement:

SELECT * FROM Persons WHERE FirstName='Pranav' AND LastName='Kumar'


The result-set will look like this: P_Id 2 LastName Kumar FirstName Pranav Address Yelhanka City Bangalore

OR Operator Example
Now we want to select only the persons with the first name equal to "Pranav" OR the first name equal to "Mounitha":

We use the following SELECT statement:

SELECT * FROM Persons WHERE FirstName='Pranav' OR FirstName='Mounitha'


The result-set will look like this: P_Id 1 2 LastName Kumari Kumar FirstName Mounitha Pranav Address VPura Yelhanka City Bangalore Bangalore

Combining AND & OR


You can also combine AND and OR (use parenthesis to form complex expressions). Now we want to select only the persons with the last name equal to "Kumar" AND the first name equal to "Pranav" OR to "Mounitha": We use the following SELECT statement:

SELECT * FROM Persons WHERE LastName='Kumar' AND (FirstName='Pranav' OR FirstName='Mounitha')


The result-set will look like this: P_Id 2 LastName Kumar FirstName Pranav Address Yelhanka City Bangalore

SQL ORDER BY Keyword


The ORDER BY keyword is used to sort the result-set.

The ORDER BY Keyword


The ORDER BY keyword is used to sort the result-set by a specified column. The ORDER BY keyword sort the records in ascending order by default. If you want to sort the records in a descending order, you can use the DESC keyword.

SQL ORDER BY Syntax SELECT column_name(s) FROM table_name ORDER BY column_name(s) ASC|DESC

ORDER BY Example
The "Persons" table: P_Id 1 2 3 4 LastName Kumari Kumar Gubbi Nilsen FirstName Mounitha Pranav Sharan Tom Address VPura Yelhanka Hebbal Vingvn 23 City Bangalore Bangalore Tumkur Tumkur

Now we want to select all the persons from the table above, however, we want to sort the persons by their last name. We use the following SELECT statement:

SELECT * FROM Persons ORDER BY LastName


The result-set will look like this: P_Id 3 2 1 4 LastName Gubbi Kumar Kumari Nilsen FirstName Sharan Pranav Mounitha Tom Address Hebbal Yelhanka VPura Vingvn 23 City Tumkur Bangalore Bangalore Tumkur

ORDER BY DESC Example


Now we want to select all the persons from the table above, however, we want to sort the persons descending by their last name. We use the following SELECT statement:

SELECT * FROM Persons ORDER BY LastName DESC


The result-set will look like this: P_Id 4 3 2 1 LastName Nilsen Gubbi Kumar Kumari FirstName Tom Sharan Pranav Mounitha Address Vingvn 23 Hebbal Yelhanka VPura City Tumkur Tumkur Bangalore Bangalore

SQL INSERT INTO Statement

The INSERT INTO statement is used to insert new records in a table.

The INSERT INTO Statement


The INSERT INTO statement is used to insert a new row in a table.

SQL INSERT INTO Syntax


It is possible to write the INSERT INTO statement in two forms. The first form doesn't specify the column names where the data will be inserted, only their values:

INSERT INTO table_name VALUES (value1, value2, value3,...)


The second form specifies both the column names and the values to be inserted:

INSERT INTO table_name (column1, column2, column3,...) VALUES (value1, value2, value3,...)

SQL INSERT INTO Example


We have the following "Persons" table: P_Id 1 2 3 LastName Kumari Kumar Gubbi FirstName Mounitha Pranav Sharan Address VPura Yelhanka Hebbal City Bangalore Bangalore Tumkur

Now we want to insert a new row in the "Persons" table. We use the following SQL statement:

INSERT INTO Persons VALUES (4,'Nilsen', 'Johan', 'Bakken 2', 'Tumkur')


The "Persons" table will now look like this: P_Id 1 2 3 4 LastName Kumari Kumar Gubbi Nilsen FirstName Mounitha Pranav Sharan Johan Address VPura Yelhanka Hebbal Bakken 2 City Bangalore Bangalore Tumkur Tumkur

Insert Data Only in Specified Columns


It is also possible to only add data in specific columns.

The following SQL statement will add a new row, but only add data in the "P_Id", "LastName" and the "FirstName" columns:

INSERT INTO Persons (P_Id, LastName, FirstName) VALUES (5, 'Tjessem', 'Jakob')
The "Persons" table will now look like this: P_Id 1 2 3 4 5 LastName Kumari Kumar Gubbi Nilsen Tjessem FirstName Mounitha Pranav Sharan Johan Jakob Address VPura Yelhanka Hebbal Bakken 2 City Bangalore Bangalore Tumkur Tumkur

SQL UPDATE Statement


The UPDATE statement is used to update records in a table.

The UPDATE Statement


The UPDATE statement is used to update existing records in a table.

SQL UPDATE Syntax UPDATE table_name SET column1=value, column2=value2,... WHERE some_column=some_value
Note: Notice the WHERE clause in the UPDATE syntax. The WHERE clause specifies which record or records that should be updated. If you omit the WHERE clause, all records will be updated!

SQL UPDATE Example


The "Persons" table: P_Id 1 2 3 4 5 LastName Kumari Kumar Gubbi Nilsen Tjessem FirstName Mounitha Pranav Sharan Johan Jakob Address VPura Yelhanka Hebbal Bakken 2 City Bangalore Bangalore Tumkur Tumkur

Now we want to update the person "Tjessem, Jakob" in the "Persons" table. We use the following SQL statement:

UPDATE Persons SET Address='Nissestien 67', City='Bangalore' WHERE LastName='Tjessem' AND FirstName='Jakob'
The "Persons" table will now look like this: P_Id 1 2 3 4 5 LastName Kumari Kumar Gubbi Nilsen Tjessem FirstName Mounitha Pranav Sharan Johan Jakob Address VPura Yelhanka Hebbal Bakken 2 Nissestien 67 City Bangalore Bangalore Tumkur Tumkur Bangalore

SQL UPDATE Warning


Be careful when updating records. If we had omitted the WHERE clause in the example above, like this:

UPDATE Persons SET Address='Nissestien 67', City='Bangalore'


The "Persons" table would have looked like this: P_Id 1 2 3 4 5 LastName Kumari Kumar Gubbi Nilsen Tjessem FirstName Mounitha Pranav Sharan Johan Jakob Address Nissestien 67 Nissestien 67 Nissestien 67 Nissestien 67 Nissestien 67 City Bangalore Bangalore Bangalore Bangalore Bangalore

SQL DELETE Statement


The DELETE statement is used to delete records in a table.

The DELETE Statement


The DELETE statement is used to delete rows in a table.

SQL DELETE Syntax DELETE FROM table_name WHERE some_column=some_value


Note: Notice the WHERE clause in the DELETE syntax. The WHERE clause specifies which record or records that should be deleted. If you omit the WHERE clause, all records will be deleted!

SQL DELETE Example


The "Persons" table: P_Id 1 2 3 4 5 LastName Kumari Kumar Gubbi Nilsen Tjessem FirstName Mounitha Pranav Sharan Johan Jakob Address VPura Yelhanka Hebbal Bakken 2 Nissestien 67 City Bangalore Bangalore Tumkur Tumkur Bangalore

Now we want to delete the person "Tjessem, Jakob" in the "Persons" table. We use the following SQL statement:

DELETE FROM Persons WHERE LastName='Tjessem' AND FirstName='Jakob'


The "Persons" table will now look like this: P_Id 1 2 3 4 LastName Kumari Kumar Gubbi Nilsen FirstName Mounitha Pranav Sharan Johan Address VPura Yelhanka Hebbal Bakken 2 City Bangalore Bangalore Tumkur Tumkur

Delete All Rows


It is possible to delete all rows in a table without deleting the table. This means that the table structure, attributes, and indexes will be intact:

DELETE FROM table_name or DELETE * FROM table_name


Note: Be very careful when deleting records. You cannot undo this statement!

SQL TOP Clause


The TOP Clause
The TOP clause is used to specify the number of records to return. The TOP clause can be very useful on large tables with thousands of records. Returning a large number of records can impact on performance. Note: Not all database systems support the TOP clause.

SQL Server Syntax SELECT TOP number|percent column_name(s) FROM table_name

SQL SELECT TOP Equivalent in MySQL and Oracle


MySQL Syntax SELECT column_name(s) FROM table_name LIMIT number Oracle Syntax SELECT column_name(s) FROM table_name WHERE ROWNUM <= number

SQL TOP Example


The "Persons" table: P_Id 1 2 3 4 LastName Kumari Kumar Gubbi Nilsen FirstName Mounitha Pranav Sharan Tom Address VPura Yelhanka Hebbal Vingvn 23 City Bangalore Bangalore Tumkur Tumkur

Now we want to select only the two first records in the table above. We use the following SELECT statement:

SELECT TOP 2 * FROM Persons


The result-set will look like this: P_Id 1 2 LastName Kumari Kumar FirstName Mounitha Pranav Address VPura Yelhanka City Bangalore Bangalore

SQL TOP PERCENT Example


The "Persons" table: P_Id 1 2 3 4 LastName Kumari Kumar Gubbi Nilsen FirstName Mounitha Pranav Sharan Tom Address VPura Yelhanka Hebbal Vingvn 23 City Bangalore Bangalore Tumkur Tumkur

Now we want to select only 50% of the records in the table above. We use the following SELECT statement:

SELECT TOP 50 PERCENT * FROM Persons


The result-set will look like this: P_Id 1 2 LastName Kumari Kumar FirstName Mounitha Pranav Address VPura Yelhanka City Bangalore Bangalore

SQL LIKE Operator


The LIKE operator is used in a WHERE clause to search for a specified pattern in a column.

The LIKE Operator


The LIKE operator is used to search for a specified pattern in a column.

SQL LIKE Syntax SELECT column_name(s) FROM table_name WHERE column_name LIKE pattern

LIKE Operator Example


The "Persons" table: P_Id 1 2 3 LastName Kumari Kumar Gubbi FirstName Mounitha Pranav Sharan Address VPura Yelhanka Hebbal City Bangalore Bangalore Tumkur

Now we want to select the persons living in a city that starts with "B" from the table above. We use the following SELECT statement:

SELECT * FROM Persons WHERE City LIKE 'B%'


The "%" sign can be used to define wildcards (missing letters in the pattern) both before and after the pattern. The result-set will look like this:

P_Id 1 2

LastName Kumari Kumar

FirstName Mounitha Pranav

Address VPura Yelhanka

City Bangalore Bangalore

Next, we want to select the persons living in a city that ends with an "r" from the "Persons" table. We use the following SELECT statement:

SELECT * FROM Persons WHERE City LIKE '%r'


The result-set will look like this: P_Id 3 LastName Gubbi FirstName Sharan Address Hebbal City Tumkur

Next, we want to select the persons living in a city that contains the pattern "mk" from the "Persons" table. We use the following SELECT statement:

SELECT * FROM Persons WHERE City LIKE '%mk%'


The result-set will look like this: P_Id 3 LastName Gubbi FirstName Sharan Address Hebbal City Tumkur

It is also possible to select the persons living in a city that NOT contains the pattern "mk" from the "Persons" table, by using the NOT keyword. We use the following SELECT statement:

SELECT * FROM Persons WHERE City NOT LIKE '%mk%'


The result-set will look like this: P_Id 1 2 LastName Kumari Kumar FirstName Mounitha Pranav Address VPura Yelhanka City Bangalore Bangalore

SQL Wildcards
SQL wildcards can be used when searching for data in a database.

SQL Wildcards

SQL wildcards can substitute for one or more characters when searching for data in a database. SQL wildcards must be used with the SQL LIKE operator. With SQL, the following wildcards can be used: Wildcard % _ [charlist] [^charlist] or [!charlist] Description A substitute for zero or more characters A substitute for exactly one character Any single character in charlist Any single character not in charlist

SQL Wildcard Examples


We have the following "Persons" table: P_Id 1 2 3 LastName Kumari Kumar Gubbi FirstName Mounitha Pranav Sharan Address VPura Yelhanka Hebbal City Bangalore Bangalore Tumkur

Using the % Wildcard


Now we want to select the persons living in a city that starts with "sa" from the "Persons" table. We use the following SELECT statement:

SELECT * FROM Persons WHERE City LIKE 'Ba%'


The result-set will look like this: P_Id 1 2 LastName Kumari Kumar FirstName Mounitha Pranav Address VPura Yelhanka City Bangalore Bangalore

Using the _ Wildcard


Now we want to select the persons with a first name that starts with any character, followed by "ri" from the "Persons" table. We use the following SELECT statement:

SELECT * FROM Persons

WHERE FirstName LIKE '_ri'


The result-set will look like this: P_Id 1 LastName Kumari FirstName Mounitha Address VPura City Bangalore

Next, we want to select the persons with a last name that starts with "P", followed by any character, followed by "an", followed by any character, followed by "v" from the "Persons" table. We use the following SELECT statement:

SELECT * FROM Persons WHERE LastName LIKE 'P_an_v'


The result-set will look like this: P_Id 2 LastName Kumar FirstName Pranav Address Yelhanka City Bangalore

Using the [charlist] Wildcard


Now we want to select the persons with a first name that starts with "b" or "s" or "p" from the "Persons" table. We use the following SELECT statement:

SELECT * FROM Persons WHERE FirstName LIKE '[bsp]%'


The result-set will look like this: P_Id 2 3 LastName Kumar Gubbi FirstName Pranav Sharan Address Yelhanka Hebbal City Bangalore Tumkur

Next, we want to select the persons with a last name that do not start with "b" or "s" or "p" from the "Persons" table. We use the following SELECT statement:

SELECT * FROM Persons WHERE LastName LIKE '[!bsp]%'


The result-set will look like this: P_Id 1 LastName Kumari FirstName Mounitha Address VPura City Bangalore

SQL IN Operator
The IN Operator
The IN operator allows you to specify multiple values in a WHERE clause.

SQL IN Syntax SELECT column_name(s) FROM table_name WHERE column_name IN (value1,value2,...)

IN Operator Example
The "Persons" table: P_Id 1 2 3 LastName Kumari Kumar Gubbi FirstName Mounitha Pranav Sharan Address VPura Yelhanka Hebbal City Bangalore Bangalore Tumkur

Now we want to select the persons with a last name equal to "Kumari" or "Gubbi" from the table above. We use the following SELECT statement:

SELECT * FROM Persons WHERE LastName IN ('Kumari','Gubbi')


The result-set will look like this: P_Id 1 3 LastName Kumari Gubbi FirstName Mounitha Sharan Address VPura Hebbal City Bangalore Tumkur

SQL BETWEEN Operator


The BETWEEN operator is used in a WHERE clause to select a range of data between two values.

The BETWEEN Operator


The BETWEEN operator selects a range of data between two values. The values can be numbers, text, or dates.

SQL BETWEEN Syntax SELECT column_name(s) FROM table_name WHERE column_name BETWEEN value1 AND value2

BETWEEN Operator Example


The "Persons" table: P_Id 1 2 3 LastName Kumari Kumar Gubbi FirstName Mounitha Pranav Sharan Address VPura Yelhanka Hebbal City Bangalore Bangalore Tumkur

Now we want to select the persons with a last name alphabetically between "Kumari" and "Gubbi" from the table above. We use the following SELECT statement:

SELECT * FROM Persons WHERE LastName BETWEEN 'Kumari' AND 'Gubbi'


The result-set will look like this: P_Id 1 3 LastName Kumari Gubbi FirstName Mounitha Sharan Address VPura Hebbal City Bangalore Tumkur

Note: The BETWEEN operator is treated differently in different databases. In some databases a person with the LastName of "Kumari" or "Gubbi" will not be listed (BETWEEN only selects fields that are between and excluding the test values). In other databases a person with the last name of "Kumari" or "Gubbi" will be listed (BETWEEN selects fields that are between and including the test values). And in other databases a person with the last name of "Kumari" will be listed, but "Gubbi" will not be listed (BETWEEN selects fields between the test values, including the first test value and excluding the last test value). Therefore: Check how your database treats the BETWEEN operator.

Example 2
To display the persons outside the range in the previous example, use NOT BETWEEN:

SELECT * FROM Persons WHERE LastName

NOT BETWEEN 'Kumari' AND 'Gubbi'


The result-set will look like this: P_Id 2 LastName Kumar FirstName Pranav Address Yelhanka City Bangalore

SQL Alias
With SQL, an alias name can be given to a table or to a column.

SQL Alias
You can give a table or a column another name by using an alias. This can be a good thing to do if you have very long or complex table names or column names. An alias name could be anything, but usually it is short.

SQL Alias Syntax for Tables SELECT column_name(s) FROM table_name AS alias_name SQL Alias Syntax for Columns SELECT column_name AS alias_name FROM table_name

Alias Example
Assume we have a table called "Persons" and another table called "Product_Orders". We will give the table aliases of "p" an "po" respectively. Now we want to list all the orders that "Mounitha Kumari" is responsible for. We use the following SELECT statement:

SELECT po.OrderID, p.LastName, p.FirstName FROM Persons AS p, Product_Orders AS po WHERE p.LastName='Kumari' WHERE p.FirstName='Mounitha'
The same SELECT statement without aliases:

SELECT Product_Orders.OrderID, Persons.LastName, Persons.FirstName FROM Persons, Product_Orders WHERE Persons.LastName='Kumari'

WHERE Persons.FirstName='Mounitha'
As you'll see from the two SELECT statements above; aliases can make queries easier to both write and to read.

SQL Joins
SQL joins are used to query data from two or more tables, based on a relationship between certain columns in these tables.

SQL JOIN
The JOIN keyword is used in an SQL statement to query data from two or more tables, based on a relationship between certain columns in these tables. Tables in a database are often related to each other with keys. A primary key is a column (or a combination of columns) with a unique value for each row. Each primary key value must be unique within the table. The purpose is to bind data together, across tables, without repeating all of the data in every table. Look at the "Persons" table: P_Id 1 2 3 LastName Kumari Kumar Gubbi FirstName Mounitha Pranav Sharan Address VPura Yelhanka Hebbal City Bangalore Bangalore Tumkur

Note that the "P_Id" column is the primary key in the "Persons" table. This means that no two rows can have the same P_Id. The P_Id distinguishes two persons even if they have the same name. Next, we have the "Orders" table: O_Id 1 2 3 4 5 OrderNo 77895 44678 22456 24562 34764 P_Id 3 3 1 1 15

Note that the "O_Id" column is the primary key in the "Orders" table and that the "P_Id" column refers to the persons in the "Persons" table without using their names. Notice that the relationship between the two tables above is the "P_Id" column.

Different SQL JOINs

Before we continue with examples, we will list the types of JOIN you can use, and the differences between them.

JOIN: Return rows when there is at least one match in both tables LEFT JOIN: Return all rows from the left table, even if there are no matches in the right table RIGHT JOIN: Return all rows from the right table, even if there are no matches in the left table FULL JOIN: Return rows when there is a match in one of the tables

SQL INNER JOIN Keyword


SQL INNER JOIN Keyword
The INNER JOIN keyword return rows when there is at least one match in both tables.

SQL INNER JOIN Syntax SELECT column_name(s) FROM table_name1 INNER JOIN table_name2 ON table_name1.column_name=table_name2.column_name
PS: INNER JOIN is the same as JOIN.

SQL INNER JOIN Example


The "Persons" table: P_Id 1 2 3 LastName Kumari Kumar Gubbi FirstName Mounitha Pranav Sharan Address VPura Yelhanka Hebbal City Bangalore Bangalore Tumkur

The "Orders" table: O_Id 1 2 3 4 5 OrderNo 77895 44678 22456 24562 34764 P_Id 3 3 1 1 15

Now we want to list all the persons with any orders. We use the following SELECT statement:

SELECT Persons.LastName, Persons.FirstName, Orders.OrderNo FROM Persons INNER JOIN Orders

ON Persons.P_Id=Orders.P_Id ORDER BY Persons.LastName


The result-set will look like this: LastName Kumari Kumari Gubbi Gubbi FirstName Mounitha Mounitha Sharan Sharan OrderNo 22456 24562 77895 44678

The INNER JOIN keyword return rows when there is at least one match in both tables. If there are rows in "Persons" that do not have matches in "Orders", those rows will NOT be listed.

SQL LEFT JOIN Keyword


SQL LEFT JOIN Keyword
The LEFT JOIN keyword returns all rows from the left table (table_name1), even if there are no matches in the right table (table_name2).

SQL LEFT JOIN Syntax SELECT column_name(s) FROM table_name1 LEFT JOIN table_name2 ON table_name1.column_name=table_name2.column_name
PS: In some databases LEFT JOIN is called LEFT OUTER JOIN.

SQL LEFT JOIN Example


The "Persons" table: P_Id 1 2 3 LastName Kumari Kumar Gubbi FirstName Mounitha Pranav Sharan Address VPura Yelhanka Hebbal City Bangalore Bangalore Tumkur

The "Orders" table: O_Id 1 2 3 4 5 OrderNo 77895 44678 22456 24562 34764 P_Id 3 3 1 1 15

Now we want to list all the persons and their orders - if any, from the tables above.

We use the following SELECT statement:

SELECT Persons.LastName, Persons.FirstName, Orders.OrderNo FROM Persons LEFT JOIN Orders ON Persons.P_Id=Orders.P_Id ORDER BY Persons.LastName
The result-set will look like this: LastName Kumari Kumari Gubbi Gubbi Kumar FirstName Mounitha Mounitha Sharan Sharan Pranav OrderNo 22456 24562 77895 44678

The LEFT JOIN keyword returns all the rows from the left table (Persons), even if there are no matches in the right table (Orders).

SQL RIGHT JOIN Keyword


SQL RIGHT JOIN Keyword
The RIGHT JOIN keyword Return all rows from the right table (table_name2), even if there are no matches in the left table (table_name1).

SQL RIGHT JOIN Syntax SELECT column_name(s) FROM table_name1 RIGHT JOIN table_name2 ON table_name1.column_name=table_name2.column_name
PS: In some databases RIGHT JOIN is called RIGHT OUTER JOIN.

SQL RIGHT JOIN Example


The "Persons" table: P_Id 1 2 3 LastName Kumari Kumar Gubbi FirstName Mounitha Pranav Sharan Address VPura Yelhanka Hebbal City Bangalore Bangalore Tumkur

The "Orders" table: O_Id 1 OrderNo 77895 P_Id 3

2 3 4 5

44678 22456 24562 34764

3 1 1 15

Now we want to list all the orders with containing persons - if any, from the tables above. We use the following SELECT statement:

SELECT Persons.LastName, Persons.FirstName, Orders.OrderNo FROM Persons RIGHT JOIN Orders ON Persons.P_Id=Orders.P_Id ORDER BY Persons.LastName
The result-set will look like this: LastName Kumari Kumari Gubbi Gubbi FirstName Mounitha Mounitha Sharan Sharan OrderNo 22456 24562 77895 44678 34764 The RIGHT JOIN keyword returns all the rows from the right table (Orders), even if there are no matches in the left table (Persons

SQL FULL JOIN Keyword


SQL FULL JOIN Keyword
The FULL JOIN keyword return rows when there is a match in one of the tables.

SQL FULL JOIN Syntax SELECT column_name(s) FROM table_name1 FULL JOIN table_name2 ON table_name1.column_name=table_name2.column_name

SQL FULL JOIN Example


The "Persons" table: P_Id 1 2 3 LastName Kumari Kumar Gubbi FirstName Mounitha Pranav Sharan Address VPura Yelhanka Hebbal City Bangalore Bangalore Tumkur

The "Orders" table: O_Id 1 2 3 4 5 OrderNo 77895 44678 22456 24562 34764 P_Id 3 3 1 1 15

Now we want to list all the persons and their orders, and all the orders with their persons. We use the following SELECT statement:

SELECT Persons.LastName, Persons.FirstName, Orders.OrderNo FROM Persons FULL JOIN Orders ON Persons.P_Id=Orders.P_Id ORDER BY Persons.LastName
The result-set will look like this: LastName Kumari Kumari Gubbi Gubbi Kumar FirstName Mounitha Mounitha Sharan Sharan Pranav 34764 The FULL JOIN keyword returns all the rows from the left table (Persons), and all the rows from the right table (Orders). If there are rows in "Persons" that do not have matches in "Orders", or if there are rows in "Orders" that do not have matches in "Persons", those rows will be listed as well. OrderNo 22456 24562 77895 44678

SQL UNION Operator


The SQL UNION operator combines two or more SELECT statements.

The SQL UNION Operator


The UNION operator is used to combine the result-set of two or more SELECT statements. Notice that each SELECT statement within the UNION must have the same number of columns. The columns must also have similar data types. Also, the columns in each SELECT statement must be in the same order.

SQL UNION Syntax SELECT column_name(s) FROM table_name1 UNION SELECT column_name(s) FROM table_name2

Note: The UNION operator selects only distinct values by default. To allow duplicate values, use UNION ALL.

SQL UNION ALL Syntax SELECT column_name(s) FROM table_name1 UNION ALL SELECT column_name(s) FROM table_name2
PS: The column names in the result-set of a UNION are always equal to the column names in the first SELECT statement in the UNION.

SQL UNION Example


Look at the following tables: "Employees_India": E_ID 01 02 03 04 "Employees_USA": E_ID 01 02 03 04 E_Name Turner, Sally Kent, Clark Kumar, Stephen Scott, Stephen E_Name Kumari, Mounitha Kumar, Pranav Kumar, Stephen Gubbi, Sharan

Now we want to list all the different employees in Norway and USA. We use the following SELECT statement:

SELECT E_Name FROM Employees_India UNION SELECT E_Name FROM Employees_USA


The result-set will look like this: E_Name Kumari, Mounitha Kumar, Pranav Kumar, Stephen Gubbi, Sharan Turner, Sally Kent, Clark Scott, Stephen

Note: This command cannot be used to list all employees in India and USA. In the example above we have two employees with equal names, and only one of them will be listed. The UNION command selects only distinct values.

SQL UNION ALL Example


Now we want to list all employees in India and USA:

SELECT E_Name FROM Employees_India UNION ALL SELECT E_Name FROM Employees_USA
Result E_Name Kumari, Mounitha Kumar, Pranav Kumar, Stephen Gubbi, Sharan Turner, Sally Kent, Clark Kumar, Stephen Scott, Stephen

SQL SELECT INTO Statement


The SQL SELECT INTO statement can be used to create backup copies of tables.

The SQL SELECT INTO Statement


The SELECT INTO statement selects data from one table and inserts it into a different table. The SELECT INTO statement is most often used to create backup copies of tables.

SQL SELECT INTO Syntax


We can select all columns into the new table:

SELECT * INTO new_table_name [IN externaldatabase] FROM old_tablename


Or we can select only the columns we want into the new table:

SELECT column_name(s)

INTO new_table_name [IN externaldatabase] FROM old_tablename

SQL SELECT INTO Example


Make a Backup Copy - Now we want to make an exact copy of the data in our "Persons" table. We use the following SQL statement:

SELECT * INTO Persons_Backup FROM Persons


We can also use the IN clause to copy the table into another database:

SELECT * INTO Persons_Backup IN 'Backup.mdb' FROM Persons


We can also copy only a few fields into the new table:

SELECT LastName,FirstName INTO Persons_Backup FROM Persons

SQL SELECT INTO - With a WHERE Clause


We can also add a WHERE clause. The following SQL statement creates a "Persons_Backup" table with only the persons who lives in the city "Bangalore":

SELECT LastName,Firstname INTO Persons_Backup FROM Persons WHERE City='Bangalore'

SQL SELECT INTO - Joined Tables


Selecting data from more than one table is also possible. The following example creates a "Persons_Order_Backup" table contains data from the two tables "Persons" and "Orders":

SELECT Persons.LastName,Orders.OrderNo INTO Persons_Order_Backup FROM Persons INNER JOIN Orders ON Persons.P_Id=Orders.P_Id

SQL CREATE DATABASE Statement


The CREATE DATABASE Statement
The CREATE DATABASE statement is used to create a database.

SQL CREATE DATABASE Syntax CREATE DATABASE database_name

CREATE DATABASE Example


Now we want to create a database called "my_db". We use the following CREATE DATABASE statement:

CREATE DATABASE my_db


Database tables can be added with the CREATE TABLE statement.

SQL CREATE TABLE Statement


The CREATE TABLE Statement
The CREATE TABLE statement is used to create a table in a database.

SQL CREATE TABLE Syntax CREATE TABLE table_name ( column_name1 data_type, column_name2 data_type, column_name3 data_type, .... )
The data type specifies what type of data the column can hold. For a complete reference of all the data types available in MS Access, MySQL, and SQL Server, go to our complete Data Types reference.

CREATE TABLE Example


Now we want to create a table called "Persons" that contains five columns: P_Id, LastName, FirstName, Address, and City. We use the following CREATE TABLE statement:

CREATE TABLE Persons

( P_Id int, LastName varchar(255), FirstName varchar(255), Address varchar(255), City varchar(255) )
The P_Id column is of type int and will hold a number. The LastName, FirstName, Address, and City columns are of type varchar with a maximum length of 255 characters. The empty "Persons" table will now look like this: P_Id LastName FirstName Address City

The empty table can be filled with data with the INSERT INTO statement

SQL Constraints
SQL Constraints
Constraints are used to limit the type of data that can go into a table. Constraints can be specified when a table is created (with the CREATE TABLE statement) or after the table is created (with the ALTER TABLE statement). We will focus on the following constraints:

NOT NULL UNIQUE PRIMARY KEY FOREIGN KEY CHECK DEFAULT

The next chapters will describe each constraint in details.

SQL NOT NULL Constraint


By default, a table column can hold NULL values.

SQL NOT NULL Constraint


The NOT NULL constraint enforces a column to NOT accept NULL values. The NOT NULL constraint enforces a field to always contain a value. This means that you cannot insert a new record, or update a record without adding a value to this field.

The following SQL enforces the "P_Id" column and the "LastName" column to not accept NULL values:

CREATE TABLE Persons ( P_Id int NOT NULL, LastName varchar(255) NOT NULL, FirstName varchar(255), Address varchar(255), City varchar(255) )

SQL UNIQUE Constraint


SQL UNIQUE Constraint
The UNIQUE constraint uniquely identifies each record in a database table. The UNIQUE and PRIMARY KEY constraints both provide a guarantee for uniqueness for a column or set of columns. A PRIMARY KEY constraint automatically has a UNIQUE constraint defined on it. Note that you can have have many UNIQUE constraints per table, but only one PRIMARY KEY constraint per table.

SQL UNIQUE Constraint on CREATE TABLE


The following SQL creates a UNIQUE constraint on the "P_Id" column when the "Persons" table is created:

SQL PRIMARY KEY Constraint


SQL PRIMARY KEY Constraint
The PRIMARY KEY constraint uniquely identifies each record in a database table. Primary keys must contain unique values. A primary key column cannot contain NULL values. Each table should have a primary key, and each table can have only one primary key.

SQL PRIMARY KEY Constraint on CREATE TABLE


The following SQL creates a PRIMARY KEY on the "P_Id" column when the "Persons" table is created:

CREATE TABLE Persons ( P_Id int NOT NULL, LastName varchar(255) NOT NULL, FirstName varchar(255), Address varchar(255), City varchar(255), PRIMARY KEY (P_Id) )
To allow naming of a PRIMARY KEY constraint, and for defining a PRIMARY KEY constraint on multiple columns, use the following SQL syntax:

To DROP a PRIMARY KEY Constraint


To drop a PRIMARY KEY constraint, use the following SQL: MySQL:

ALTER TABLE Persons DROP PRIMARY KEY


SQL Server / Oracle / MS Access:

ALTER TABLE Persons DROP CONSTRAINT pk_PersonID

SQL FOREIGN KEY Constraint


SQL FOREIGN KEY Constraint
A FOREIGN KEY in one table points to a PRIMARY KEY in another table. Let's illustrate the foreign key with an example. Look at the following two tables: The "Persons" table: P_Id 1 2 3 LastName Kumari Kumar Gubbi FirstName Mounitha Pranav Sharan Address VPura Yelhanka Hebbal City Bangalore Bangalore Tumkur

The "Orders" table: O_Id 1 2 OrderNo 77895 44678 P_Id 3 3

3 4

22456 24562

2 1

Note that the "P_Id" column in the "Orders" table points to the "P_Id" column in the "Persons" table. The "P_Id" column in the "Persons" table is the PRIMARY KEY in the "Persons" table. The "P_Id" column in the "Orders" table is a FOREIGN KEY in the "Orders" table. The FOREIGN KEY constraint is used to prevent actions that would destroy link between tables. The FOREIGN KEY constraint also prevents that invalid data is inserted into the foreign key column, because it has to be one of the values contained in the table it points to.

SQL FOREIGN KEY Constraint on CREATE TABLE


The following SQL creates a FOREIGN KEY on the "P_Id" column when the "Orders" table is created: MySQL:

CREATE TABLE Orders ( O_Id int NOT NULL, OrderNo int NOT NULL, P_Id int, PRIMARY KEY (O_Id), FOREIGN KEY (P_Id) REFERENCES Persons(P_Id) )
To allow naming of a FOREIGN KEY constraint, and for defining a FOREIGN KEY constraint on

To DROP a FOREIGN KEY Constraint


To drop a FOREIGN KEY constraint, use the following SQL:

ALTER TABLE Orders DROP FOREIGN KEY fk_PerOrders


SQL Server / Oracle / MS Access:

ALTER TABLE Orders DROP CONSTRAINT fk_PerOrders

The TRUNCATE TABLE Statement


What if we only want to delete the data inside the table, and not the table itself? Then, use the TRUNCATE TABLE statement:

TRUNCATE TABLE table_name

SQL ALTER TABLE Statement


The ALTER TABLE Statement
The ALTER TABLE statement is used to add, delete, or modify columns in an existing table.

SQL ALTER TABLE Syntax


To add a column in a table, use the following syntax:

ALTER TABLE table_name ADD column_name datatype


To delete a column in a table, use the following syntax (notice that some database systems don't allow deleting a column):

ALTER TABLE table_name DROP COLUMN column_name


To change the data type of a column in a table, use the following syntax:

ALTER TABLE table_name ALTER COLUMN column_name datatype

SQL ALTER TABLE Example


Look at the "Persons" table: P_Id 1 2 3 LastName Kumari Kumar Gubbi FirstName Mounitha Pranav Sharan Address VPura Yelhanka Hebbal City Bangalore Bangalore Tumkur

Now we want to add a column named "DateOfBirth" in the "Persons" table. We use the following SQL statement:

ALTER TABLE Persons ADD DateOfBirth date


Notice that the new column, "DateOfBirth", is of type date and is going to hold a date. The data type specifies what type of data the column can hold. For a complete reference of all the data types available in MS Access, MySQL, and SQL Server, go to our complete Data Types reference. The "Persons" table will now like this:

P_Id 1 2 3

LastName Kumari Kumar Gubbi

FirstName Mounitha Pranav Sharan

Address VPura Yelhanka Hebbal

City Bangalore Bangalore Tumkur

DateOfBirth

Change Data Type Example


Now we want to change the data type of the column named "DateOfBirth" in the "Persons" table. We use the following SQL statement:

ALTER TABLE Persons ALTER COLUMN DateOfBirth year


Notice that the "DateOfBirth" column is now of type year and is going to hold a year in a two-digit or four-digit format.

DROP COLUMN Example


Next, we want to delete the column named "DateOfBirth" in the "Persons" table. We use the following SQL statement:

ALTER TABLE Persons DROP COLUMN DateOfBirth


The "Persons" table will now like this: P_Id 1 2 3 LastName Kumari Kumar Gubbi FirstName Mounitha Pranav Sharan Address VPura Yelhanka Hebbal City Bangalore Bangalore Tumkur

SQL Views
A view is a virtual table. This chapter shows how to create, update, and delete a view.

SQL CREATE VIEW Statement


In SQL, a view is a virtual table based on the result-set of an SQL statement. A view contains rows and columns, just like a real table. The fields in a view are fields from one or more real tables in the database.

You can add SQL functions, WHERE, and JOIN statements to a view and present the data as if the data were coming from one single table.

SQL CREATE VIEW Syntax CREATE VIEW view_name AS SELECT column_name(s) FROM table_name WHERE condition
Note: A view always shows up-to-date data! The database engine recreates the data, using the view's SQL statement, every time a user queries a view.

SQL CREATE VIEW Examples


If you have the Northwind database you can see that it has several views installed by default. The view "Current Product List" lists all active products (products that are not discontinued) from the "Products" table. The view is created with the following SQL:

CREATE VIEW [Current Product List] AS SELECT ProductID,ProductName FROM Products WHERE Discontinued=No
We can query the view above as follows:

SELECT * FROM [Current Product List]


Another view in the Northwind sample database selects every product in the "Products" table with a unit price higher than the average unit price:

CREATE VIEW [Products Above Average Price] AS SELECT ProductName,UnitPrice FROM Products WHERE UnitPrice>(SELECT AVG(UnitPrice) FROM Products)
We can query the view above as follows:

SELECT * FROM [Products Above Average Price]


Another view in the Northwind database calculates the total sale for each category in 1997. Note that this view selects its data from another view called "Product Sales for 1997":

CREATE VIEW [Category Sales For 1997] AS SELECT DISTINCT CategoryName,Sum(ProductSales) AS CategorySales FROM [Product Sales for 1997] GROUP BY CategoryName
We can query the view above as follows:

SELECT * FROM [Category Sales For 1997]

We can also add a condition to the query. Now we want to see the total sale only for the category "Beverages":

SELECT * FROM [Category Sales For 1997] WHERE CategoryName='Beverages'

SQL Updating a View


You can update a view by using the following syntax:

SQL CREATE OR REPLACE VIEW Syntax CREATE OR REPLACE VIEW view_name AS SELECT column_name(s) FROM table_name WHERE condition
Now we want to add the "Category" column to the "Current Product List" view. We will update the view with the following SQL:

CREATE VIEW [Current Product List] AS SELECT ProductID,ProductName,Category FROM Products WHERE Discontinued=No

SQL Dropping a View


You can delete a view with the DROP VIEW command.

SQL DROP VIEW Syntax DROP VIEW view_name

SQL Date Functions


SQL Dates
The most difficult part when working with dates is to be sure that the format of the date you are trying to insert, matches the format of the date column in the database. As long as your data contains only the date portion, your queries will work as expected. However, if a time portion is involved, it gets complicated. Before talking about the complications of querying for dates, we will look at the most important built-in functions for working with dates.

SQL has many built-in functions for performing calculations on data.

SQL Aggregate Functions


SQL aggregate functions return a single value, calculated from values in a column. Useful aggregate functions:

AVG() - Returns the average value COUNT() - Returns the number of rows FIRST() - Returns the first value LAST() - Returns the last value MAX() - Returns the largest value MIN() - Returns the smallest value SUM() - Returns the sum

SQL Scalar functions


SQL scalar functions return a single value, based on the input value. Useful scalar functions:

UCASE() - Converts a field to upper case LCASE() - Converts a field to lower case MID() - Extract characters from a text field LEN() - Returns the length of a text field ROUND() - Rounds a numeric field to the number of decimals specified NOW() - Returns the current system date and time FORMAT() - Formats how a field is to be displayed

Tip: The aggregate functions and the scalar functions will be explained in details in the next chapters.

SQL AVG() Function


The AVG() Function
The AVG() function returns the average value of a numeric column.

SQL AVG() Syntax SELECT AVG(column_name) FROM table_name

SQL AVG() Example


We have the following "Orders" table: O_Id 1 OrderDate 2008/11/12 OrderPrice 1000 Customer Kumari

2 3 4 5 6

2008/10/23 2008/09/02 2008/09/03 2008/08/30 2008/10/04

1600 700 300 2000 100

Nilsen Kumari Kumari Jensen Nilsen

Now we want to find the average value of the "OrderPrice" fields. We use the following SQL statement:

SELECT AVG(OrderPrice) AS OrderAverage FROM Orders


The result-set will look like this: OrderAverage 950 Now we want to find the customers that have an OrderPrice value higher than the average OrderPrice value. We use the following SQL statement:

SELECT Customer FROM Orders WHERE OrderPrice>(SELECT AVG(OrderPrice) FROM Orders)


The result-set will look like this: Customer Kumari Nilsen Jensen

SQL COUNT() Function


The COUNT() function returns the number of rows that matches a specified criteria.

SQL COUNT(column_name) Syntax


The COUNT(column_name) function returns the number of values (NULL values will not be counted) of the specified column:

SELECT COUNT(column_name) FROM table_name SQL COUNT(*) Syntax


The COUNT(*) function returns the number of records in a table:

SELECT COUNT(*) FROM table_name SQL COUNT(DISTINCT column_name) Syntax


The COUNT(DISTINCT column_name) function returns the number of distinct values of the specified column:

SELECT COUNT(DISTINCT column_name) FROM table_name


Note: COUNT(DISTINCT) works with ORACLE and Microsoft SQL Server, but not with Microsoft Access.

SQL COUNT(column_name) Example


We have the following "Orders" table: O_Id 1 2 3 4 5 6 OrderDate 2008/11/12 2008/10/23 2008/09/02 2008/09/03 2008/08/30 2008/10/04 OrderPrice 1000 1600 700 300 2000 100 Customer Kumari Nilsen Kumari Kumari Jensen Nilsen

Now we want to count the number of orders from "Customer Nilsen". We use the following SQL statement:

SELECT COUNT(Customer) AS CustomerNilsen FROM Orders WHERE Customer='Nilsen'


The result of the SQL statement above will be 2, because the customer Nilsen has made 2 orders in total: CustomerNilsen 2

SQL COUNT(*) Example


If we omit the WHERE clause, like this:

SELECT COUNT(*) AS NumberOfOrders FROM Orders


The result-set will look like this: NumberOfOrders 6

which is the total number of rows in the table.

SQL COUNT(DISTINCT column_name) Example


Now we want to count the number of unique customers in the "Orders" table. We use the following SQL statement:

SELECT COUNT(DISTINCT Customer) AS NumberOfCustomers FROM Orders


The result-set will look like this: NumberOfCustomers 3 which is the number of unique customers (Kumari, Nilsen, and Jensen) in the "Orders" table.

SQL MAX() Function


The MAX() Function
The MAX() function returns the largest value of the selected column.

SQL MAX() Syntax SELECT MAX(column_name) FROM table_name

SQL MAX() Example


We have the following "Orders" table: O_Id 1 2 3 4 5 6 OrderDate 2008/11/12 2008/10/23 2008/09/02 2008/09/03 2008/08/30 2008/10/04 OrderPrice 1000 1600 700 300 2000 100 Customer Kumari Nilsen Kumari Kumari Jensen Nilsen

Now we want to find the largest value of the "OrderPrice" column. We use the following SQL statement:

SELECT MAX(OrderPrice) AS LargestOrderPrice FROM Orders


The result-set will look like this:

LargestOrderPrice 2000

SQL MIN() Function


The MIN() Function
The MIN() function returns the smallest value of the selected column.

SQL MIN() Syntax SELECT MIN(column_name) FROM table_name

SQL MIN() Example


We have the following "Orders" table: O_Id 1 2 3 4 5 6 OrderDate 2008/11/12 2008/10/23 2008/09/02 2008/09/03 2008/08/30 2008/10/04 OrderPrice 1000 1600 700 300 2000 100 Customer Kumari Nilsen Kumari Kumari Jensen Nilsen

Now we want to find the smallest value of the "OrderPrice" column. We use the following SQL statement:

SELECT MIN(OrderPrice) AS SmallestOrderPrice FROM Orders


The result-set will look like this: SmallestOrderPrice 100

SQL SUM() Function


The SUM() Function
The SUM() function returns the total sum of a numeric column.

SQL SUM() Syntax SELECT SUM(column_name) FROM table_name

SQL SUM() Example


We have the following "Orders" table: O_Id 1 2 3 4 5 6 OrderDate 2008/11/12 2008/10/23 2008/09/02 2008/09/03 2008/08/30 2008/10/04 OrderPrice 1000 1600 700 300 2000 100 Customer Kumari Nilsen Kumari Kumari Jensen Nilsen

Now we want to find the sum of all "OrderPrice" fields". We use the following SQL statement:

SELECT SUM(OrderPrice) AS OrderTotal FROM Orders


The result-set will look like this: OrderTotal 5700

SQL GROUP BY Statement


Aggregate functions often need an added GROUP BY statement.

The GROUP BY Statement


The GROUP BY statement is used in conjunction with the aggregate functions to group the result-set by one or more columns.

SQL GROUP BY Syntax SELECT column_name, aggregate_function(column_name) FROM table_name WHERE column_name operator value GROUP BY column_name

SQL GROUP BY Example


We have the following "Orders" table: O_Id 1 2 3 OrderDate 2008/11/12 2008/10/23 2008/09/02 OrderPrice 1000 1600 700 Customer Kumari Nilsen Kumari

4 5 6

2008/09/03 2008/08/30 2008/10/04

300 2000 100

Kumari Jensen Nilsen

Now we want to find the total sum (total order) of each customer. We will have to use the GROUP BY statement to group the customers. We use the following SQL statement:

SELECT Customer,SUM(OrderPrice) FROM Orders GROUP BY Customer


The result-set will look like this: Customer Kumari Nilsen Jensen SUM(OrderPrice) 2000 1700 2000

Let's see what happens if we omit the GROUP BY statement:

SELECT Customer,SUM(OrderPrice) FROM Orders


The result-set will look like this: Customer Kumari Nilsen Kumari Kumari Jensen Nilsen SUM(OrderPrice) 5700 5700 5700 5700 5700 5700

The result-set above is not what we wanted. Explanation of why the above SELECT statement cannot be used: The SELECT statement above has two columns specified (Customer and SUM(OrderPrice). The "SUM(OrderPrice)" returns a single value (that is the total sum of the "OrderPrice" column), while "Customer" returns 6 values (one value for each row in the "Orders" table). This will therefore not give us the correct result. However, you have seen that the GROUP BY statement solves this problem.

GROUP BY More Than One Column


We can also use the GROUP BY statement on more than one column, like this:

SELECT Customer,OrderDate,SUM(OrderPrice) FROM Orders GROUP BY Customer,OrderDate

SQL HAVING Clause


The HAVING Clause
The HAVING clause was added to SQL because the WHERE keyword could not be used with aggregate functions.

SQL HAVING Syntax SELECT column_name, aggregate_function(column_name) FROM table_name WHERE column_name operator value GROUP BY column_name HAVING aggregate_function(column_name) operator value

SQL HAVING Example


We have the following "Orders" table: O_Id 1 2 3 4 5 6 OrderDate 2008/11/12 2008/10/23 2008/09/02 2008/09/03 2008/08/30 2008/10/04 OrderPrice 1000 1600 700 300 2000 100 Customer Kumari Nilsen Kumari Kumari Jensen Nilsen

Now we want to find if any of the customers have a total order of less than 2000. We use the following SQL statement:

SELECT Customer,SUM(OrderPrice) FROM Orders GROUP BY Customer HAVING SUM(OrderPrice)<2000


The result-set will look like this: Customer Nilsen SUM(OrderPrice) 1700

Now we want to find if the customers "Kumari" or "Jensen" have a total order of more than 1500. We add an ordinary WHERE clause to the SQL statement:

SELECT Customer,SUM(OrderPrice) FROM Orders WHERE Customer='Kumari' OR Customer='Jensen' GROUP BY Customer HAVING SUM(OrderPrice)>1500
The result-set will look like this:

Customer Kumari Jensen

SUM(OrderPrice) 2000 2000

SQL UCASE() Function


The UCASE() Function
The UCASE() function converts the value of a field to uppercase.

SQL UCASE() Syntax SELECT UCASE(column_name) FROM table_name

SQL UCASE() Example


We have the following "Persons" table: P_Id 1 2 3 LastName Kumari Kumar Gubbi FirstName Mounitha Pranav Sharan Address VPura Yelhanka Hebbal City Bangalore Bangalore Tumkur

Now we want to select the content of the "LastName" and "FirstName" columns above, and convert the "LastName" column to uppercase. We use the following SELECT statement:

SELECT UCASE(LastName) as LastName,FirstName FROM Persons


The result-set will look like this: LastName KUMARI KUMAR GUBBI FirstName Mounitha Pranav Sharan

SQL LCASE() Function


The LCASE() Function
The LCASE() function converts the value of a field to lowercase.

SQL LCASE() Syntax SELECT LCASE(column_name) FROM table_name

SQL LCASE() Example


We have the following "Persons" table: P_Id 1 2 3 LastName Kumari Kumar Gubbi FirstName Mounitha Pranav Sharan Address VPura Yelhanka Hebbal City Bangalore Bangalore Tumkur

Now we want to select the content of the "LastName" and "FirstName" columns above, and convert the "LastName" column to lowercase. We use the following SELECT statement:

SELECT LCASE(LastName) as LastName,FirstName FROM Persons


The result-set will look like this: LastName Kumari Kumar Gubbi FirstName Mounitha Pranav Sharan

SQL MID() Function


The MID() Function
The MID() function is used to extract characters from a text field.

SQL MID() Syntax SELECT MID(column_name,start[,length]) FROM table_name


Parameter column_name start length Description Required. The field to extract characters from. Required. Specifies the starting position (starts at 1). Optional. The number of characters to return. If omitted, the MID() function returns the rest of the text.

SQL MID() Example


We have the following "Persons" table: P_Id 1 2 3 LastName Kumari Kumar Gubbi FirstName Mounitha Pranav Sharan Address VPura Yelhanka Hebbal City Bangalore Bangalore Tumkur

Now we want to extract the first four characters of the "City" column above.

We use the following SELECT statement:

SELECT MID(City,1,4) as SmallCity FROM Persons


The result-set will look like this: SmallCity Bang Bang Tumk

SQL LEN() Function


The LEN() Function
The LEN() function returns the length of the value in a text field.

SQL LEN() Syntax SELECT LEN(column_name) FROM table_name

SQL LEN() Example


We have the following "Persons" table: P_Id 1 2 3 LastName Kumari Kumar Gubbi FirstName Mounitha Pranav Sharan Address VPura Yelhanka Hebbal City Bangalore Bangalore Tumkur

Now we want to select the length of the values in the "Address" column above. We use the following SELECT statement:

SELECT LEN(Address) as LengthOfAddress FROM Persons


The result-set will look like this: LengthOfAddress 5 8 6

SQL ROUND() Function


The ROUND() Function

The ROUND() function is used to round a numeric field to the number of decimals specified.

SQL ROUND() Syntax SELECT ROUND(column_name,decimals) FROM table_name


Parameter column_name decimals Description Required. The field to round. Required. Specifies the number of decimals to be returned.

SQL ROUND() Example


We have the following "Products" table: Prod_Id 1 2 3 ProductName Jarlsberg Mascarpone GorgonzMounitha Unit 1000 g 1000 g 1000 g UnitPrice 10.45 32.56 15.67

Now we want to display the product name and the price rounded to the nearest integer. We use the following SELECT statement:

SELECT ProductName, ROUND(UnitPrice,0) as UnitPrice FROM Persons


The result-set will look like this: ProductName Jarlsberg Mascarpone GorgonzMounitha UnitPrice 10 33 16

SQL NOW() Function


The NOW() Function
The NOW() function returns the current system date and time.

SQL NOW() Syntax SELECT NOW() FROM table_name

SQL NOW() Example


We have the following "Products" table: Prod_Id 1 ProductName Jarlsberg Unit 1000 g UnitPrice 10.45

2 3

Mascarpone GorgonzMounitha

1000 g 1000 g

32.56 15.67

Now we want to display the products and prices per today's date. We use the following SELECT statement:

SELECT ProductName, UnitPrice, Now() as PerDate FROM Persons


The result-set will look like this: ProductName Jarlsberg Mascarpone GorgonzMounitha UnitPrice 10.45 32.56 15.67 PerDate 30/09/2012 30/09/2012 30/09/2012

SQL FORMAT() Function


The FORMAT() Function
The FORMAT() function is used to format how a field is to be displayed.

SQL FORMAT() Syntax SELECT FORMAT(column_name,format) FROM table_name


Parameter column_name format Description Required. The field to be formatted. Required. Specifies the format.

SQL FORMAT() Example


We have the following "Products" table: Prod_Id 1 2 3 ProductName Jarlsberg Mascarpone GorgonzMounitha Unit 1000 g 1000 g 1000 g UnitPrice 10.45 32.56 15.67

Now we want to display the products and prices per today's date (with today's date displayed in the following format "YYYY-MM-DD"). We use the following SELECT statement:

SELECT ProductName, UnitPrice, FORMAT(Now(),'YYYY-MM-DD') as PerDate FROM Persons


The result-set will look like this:

ProductName Jarlsberg Mascarpone GorgonzMounitha

UnitPrice 10.45 32.56 15.67

PerDate 2012-09-30 2012-09-30 2012-09-30

SQL Quick Reference


SQL Statement AND / OR Syntax SELECT column_name(s) FROM table_name WHERE condition AND|OR condition ALTER TABLE table_name ADD column_name datatype or ALTER TABLE table_name DROP COLUMN column_name AS (alias) SELECT column_name AS column_alias FROM table_name or SELECT column_name FROM table_name AS table_alias BETWEEN SELECT column_name(s) FROM table_name WHERE column_name BETWEEN value1 AND value2 CREATE DATABASE database_name CREATE TABLE table_name ( column_name1 data_type, column_name2 data_type, column_name2 data_type, ... ) CREATE INDEX index_name ON table_name (column_name) or CREATE UNIQUE INDEX index_name ON table_name (column_name) CREATE VIEW CREATE VIEW view_name AS SELECT column_name(s) FROM table_name WHERE condition DELETE FROM table_name WHERE some_column=some_value or

ALTER TABLE

CREATE DATABASE CREATE TABLE

CREATE INDEX

DELETE

DELETE FROM table_name (Note: Deletes the entire table!!) DELETE * FROM table_name (Note: Deletes the entire table!!) DROP DATABASE DROP INDEX DROP DATABASE database_name DROP INDEX table_name.index_name (SQL Server) DROP INDEX index_name ON table_name (MS Access) DROP INDEX index_name (DB2/Oracle) ALTER TABLE table_name DROP INDEX index_name (MySQL) DROP TABLE table_name SELECT column_name, aggregate_function(column_name) FROM table_name WHERE column_name operator value GROUP BY column_name SELECT column_name, aggregate_function(column_name) FROM table_name WHERE column_name operator value GROUP BY column_name HAVING aggregate_function(column_name) operator value SELECT column_name(s) FROM table_name WHERE column_name IN (value1,value2,..) INSERT INTO table_name VALUES (value1, value2, value3,....) or INSERT INTO table_name (column1, column2, column3,...) VALUES (value1, value2, value3,....) INNER JOIN SELECT column_name(s) FROM table_name1 INNER JOIN table_name2 ON table_name1.column_name=table_name2.column_name SELECT column_name(s) FROM table_name1 LEFT JOIN table_name2 ON table_name1.column_name=table_name2.column_name SELECT column_name(s) FROM table_name1 RIGHT JOIN table_name2 ON table_name1.column_name=table_name2.column_name SELECT column_name(s) FROM table_name1 FULL JOIN table_name2 ON table_name1.column_name=table_name2.column_name SELECT column_name(s) FROM table_name WHERE column_name LIKE pattern SELECT column_name(s) FROM table_name ORDER BY column_name [ASC|DESC] SELECT column_name(s) FROM table_name

DROP TABLE GROUP BY

HAVING

IN

INSERT INTO

LEFT JOIN

RIGHT JOIN

FULL JOIN

LIKE

ORDER BY

SELECT

SELECT * SELECT DISTINCT SELECT INTO

SELECT * FROM table_name SELECT DISTINCT column_name(s) FROM table_name SELECT * INTO new_table_name [IN externaldatabase] FROM old_table_name or SELECT column_name(s) INTO new_table_name [IN externaldatabase] FROM old_table_name

SELECT TOP TRUNCATE TABLE UNION

SELECT TOP number|percent column_name(s) FROM table_name TRUNCATE TABLE table_name SELECT column_name(s) FROM table_name1 UNION SELECT column_name(s) FROM table_name2 SELECT column_name(s) FROM table_name1 UNION ALL SELECT column_name(s) FROM table_name2 UPDATE table_name SET column1=value, column2=value,... WHERE some_column=some_value SELECT column_name(s) FROM table_name WHERE column_name operator value

UNION ALL

UPDATE

WHERE

Database Management System


UNIT -6 Database Design
Informal Design Guidelines for Relation Schemas; Functional Dependencies; Normal Forms Based on Primary Keys; General Definitions of Second and Third Normal Forms; Boyce-Codd Normal Form. Fourth Normal Form; and Fifth Normal Form;

INFORMAL DESIGHN GUIDELINES FOR RELATIONAL SCHEMA 1.Semantics of the Attributes 2.Reducing the Redundant Value in Tuples. 3.Reducing Null values in Tuples. 4.Dissallowing spurious Tuples. 1. Semantics of the Attributes Whenever we are going to form relational schema there should be some meaning among the attributes.This meaning is called semantics.This semantics relates one attribute to another with some relation. Eg: USN No Student name Sem

2. Reducing the Redundant Value in Tuples Mixing attributes of multiple entities may cause problems Information is stored redundantly wasting storage Problems with update anomalies Insertion anomalies Deletion anomalies Modification anomalies

VTU-EDUSAT

Page 1

Database Management System

The main goal of the schema diagram is to minimize the storage space that the base memory occupies.Grouping attributes information relations has asignificant effect on storage space. Eg; USN No Eg: Dept No Dept Name Student name Sem

If we integrate these two and is used as a single table i.e Student Table USN No Student name Sem Dept No Dept Name

Here whenever if we insert the tuples there may be N stunents in one department,so Dept No,Dept Name values are repeated N times which leads to data redundancy. Another problem is updata anamolies ie if we insert new dept that has no students. If we delet the last student of a dept,then whole information about that department will be deleted If we change the value of one of the attributes of aparticaular table the we must update the tuples of all the students belonging to thet depy else Database will become inconsistent. Note: Design in such a way that no insertion ,deletion,modification anamolies will occur 3. Reducing Null values in Tuples. Note: Relations should be designed such that their tuples will have as few NULL values as possible Attributes that are NULL frequently could be placed in separate relations (with the primary key) Reasons for nulls: attribute not applicable or invalid attribute value unknown (may exist) VTU-EDUSAT Page 2

Database Management System


value known to exist, but unavailable 4. Disallowing spurious Tuples Bad designs for a relational database may result in erroneous results for certain JOIN operations The "lossless join" property is used to guarantee meaningful results for join operations Note: The relations should be designed to satisfy the lossless join condition. No spurious tuples should be generated by doing a natural-join of any relations.

Functional dependency
1. Functional dependencies (FDs) are used to specify formal measures of the "goodness" of relational designs 2. FDs and keys are used to define normal forms for relations 3. FDs are constraints that are derived from the meaning and interrelationships of the data attributes 4. X->Y : A set of attributes X functionally determines a set of attributes Y if the value of X determines a unique value for Y 5. X -> Y holds if whenever two tuples have the same value for X, they must have the same value for Y 6. For any two tuples t1 and t2 in any relation instance r(R): If t1[X]=t2[X], then t1[Y]=t2[Y] 7. X -> Y in R specifies a constraint on all relation instances r(R) 8. Written as X -> Y; can be displayed graphically on a relation schema as in Figures. ( denoted by the arrow: ). 9. FDs are derived from the real-world constraints on the attributes 10. social security number determines employee name SSN -> ENAME 11.project number determines project name and location PNUMBER -> {PNAME, PLOCATION} 11. employee ssn and project number determines the hours per week that the employee works on the project VTU-EDUSAT Page 3

Database Management System


{SSN, PNUMBER} -> HOURS

Inference rules for FDs:


Inference rules also known as Armstrong's Axioms are published by Armstrong. These properties are as given below: 1. Reflexivity property: X ->Y is true if Y is subset of X. 2. Augmentation property: If X->Y is true, then XZ ->YZ is also true. 3. Transitivity property: If X->Y and Y->Z then X ->Z is implied. 4. Union property: If X ->Y and X ->Z are true, then X ->YZ is also true. This property indicates that if right hand side of FD contains many attributes then FD exists for each of them. 5. Decomposition property: If X ->Y is implied and Z is subset of Y, then X ->Z is implied. This property is the reverse of union property. 6. Pseudotransitivity property: If X ->Y and WY ->Z are given, then XW ->Z is true. Normalization: Normalizing the database ensures the following things: Dependencies between data are identified. Redundant data is minimized. The data model is flexible and easier to maintain Normalization: The process of decomposing unsatisfactory "bad" relations by breaking up their attributes into smaller relations Normal form: Condition using keys and FDs of a relation to certify whether a relation schema is in a particular normal form

VTU-EDUSAT

Page 4

Database Management System


FIRST NORMAL FORM The purpose of first normal form (1NF) is to eliminate repeating groups of attributes in an entity. Disallows composite attributes, multivalued attributes, and nested relations; attributes whose values for an individual tuple are non-amanjuic Consider the following table:

SECOND NORMAL FORM

The purpose of second normal form (2NF) is to eliminate partial key dependencies. Each attribute in an entity must depend on the whole key, not just a part of it. Page 5

VTU-EDUSAT

Database Management System

THIRD NORMAL FORM

Third Normal form also helps to eliminate redundant information by eliminating interdependencies between non-key attributes. It is already in 2NF There are no non-key attributes that depend on another non-key attribute

VTU-EDUSAT

Page 6

Database Management System

General Normal Form Definitions (For Multiple Keys) The above definitions consider the primary key only The following more general definitions take into account relations with multiple candidate keys A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on every key of R Definition: Superkey of relation schema R - a set of attributes S of R that contains a key of R A relation schema R is in third normal form (3NF) if whenever a FD X -> A holds in R, then either: X is a superkey of R, or A is a prime attribute of R

Example 1 CUSTOMER CustomerID 12123 12443 354 Firstname Harry Leona Sarah Surname Enfield Lewis Brightman City London London Coventry PostCode SW7 2AP WC2H 7JY CV4 7AL

VTU-EDUSAT

Page 7

Database Management System


This is not in strict 3NF as the City could be obtained from the Post code attribute. If you create a table containing postcodes then city could be derived. CustomerID 12123 12443 354 POSTCODES PostCode SW7 2AP WC2H 7JY CV4 7AL Example 2. VideoID 12123 12443 354 Title Saw IV Igor Bambi Certificate 18 PG U Description Eighteen and over Parental Guidance Universal Classification City London London Coventry Firstname Harry Leona Sarah Surname Enfield Lewis Brightman PostCode* SW7 2AP WC2H 7JY CV4 7AL

The Description of what the certificate means could be obtained frome the certifcate attribute - it does not need to refer to the primary key VideoID. So split it out and use the primary key / secondary key approach. Example 3 CLIENT ClientID 12123 12443 354 CINEMAS CinemaID LON23 VTU-EDUSAT CinemaAddress 1 Leicester Square. London Page 8 CinemaID* LON23 COV2 MAN4 CinemaAddress 1 Leicester Square. London 34 Bramby St, Coventry 56 Croydon Rd, Manchester

Database Management System


COV2 MAN4 34 Bramby St, Coventry 56 Croydon Rd, Manchester

In this case the database is almost in 3NF - for some reason the Cinema Address is being repeated in the Client table, even though it can be obtained from the Cinemas table. So simply remove the column from the client table

BOYCE-CODD NORMAL FORM (BCNF) A relation schema R is in Boyce-Codd Normal Form (BCNF) if whenever an FD X -> A holds in R, then X is a superkey of R Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF There exist relations that are in 3NF but not in BCNF The goal is to have each relation in BCNF (or 3NF)

Definition: A multivalued dependency (MVD) X >> Y specified on relation schema R, where X and Y are both subsets of R, specifies the following constraint on any relation state r of R: If two tuples t1 and t2 exist in r such that t1[X] = t2[X], then two tuples t3 and t4 should also VTU-EDUSAT Page 9

Database Management System


exist in r with the following properties, where we use Z to denote (R - (X Y)): t3[X] = t4[X] = t1[X] = t2[X]. t3[Y] = t1[Y] and t4[Y] = t2[Y]. t3[Z] = t2[Z] and t4[Z] = t1[Z]. An MVD X >> Y in R is called a trivial MVD if (a) Y is a subset of X, or (b) X Y = R.

Fourth Normal Form (4NF)


Fourth normal form eliminates independent many-to-one relationships between columns. To be in Fourth Normal Form, a relation must first be in Boyce-Codd Normal Form. a given relation may not contain more than one multi-valued attribute. Defined as a relation that is in Boyce-Codd Normal Form and contains no nontrivial multi-valued dependencies.

Example

VTU-EDUSAT

Page 10

Database Management System


Fifth Normal Form (5NF) A relation decompose into two relations must have the lossless-join property, which ensures that no spurious tuples are generated when relations are reunited through a natural join operation. However, there are requirements to decompose a relation into more than two relations. Although rare, these cases are managed by join dependency and fifth normal form (5NF).

Constraints as Assertions VTU-EDUSAT Page 11

Database Management System


General constraints: constraints that do not fit in the basic SQL categories Mechanism: CREAT ASSERTION Components include: a constraint name, followed by CHECK, followed by a condition Assertions: An Example The salary of an employee must not be greater than the salary of the manager of the department that the employee works for CREAT ASSERTION SALARY_CONSTRAINT CHECK (NOT EXISTS (SELECT * FROM EMPLOYEE E, EMPLOYEE M, DEPARTMENT D WHERE E.SALARY > M.SALARY AND E.DNO=D.NUMBER AND D.MGRSSN=M.SSN)) SQL Triggers Objective: to monitor a database and take initiate action when a condition occurs Triggers are expressed in a syntax similar to assertions and include the following:Event Such as an insert, deleted, or update operation Condition Action To be taken when the condition is satisfied SQL Triggers: An Example A trigger to compare an employees salary to his/her supervisor during insert or update operations: CREATE TRIGGER INFORM_SUPERVISOR BEFORE INSERT OR UPDATE OF SALARY, SUPERVISOR_SSN ON EMPLOYEE FOR EACH ROW WHEN (NEW.SALARY> (SELECT SALARY FROM EMPLOYEE WHERE SSN=NEW.SUPERVISOR_SSN)) INFORM_SUPERVISOR (NEW.SUPERVISOR_SSN,NEW.SSN);

VTU-EDUSAT

Page 12

Database Management System


DATABASE DESIGN -2 DESIGNING A SET OF RELATIONS The Approach of Relational Synthesis (Bottom-up Design): Assumes that all possible functional dependencies are known. First constructs a minimal set of FDs Then applies algorithms that construct a target set of 3NF or BCNF relations. Additional criteria may be needed to ensure the the set of relations in a relational database are satisfactory (see Algorithms 11.2 and 11.4). Goals: Lossless join property (a must) Algorithm 11.1 tests for general losslessness. Dependency preservation property Algorithm 11.3 decomposes a relation into BCNF dependency preservation. Additional normal forms 4NF (based on multi-valued dependencies) 5NF (based on join dependencies) Properties of Relational Decompositions Relation Decomposition and Insufficiency of Normal Forms: Universal Relation Schema: A relation schema R = {A1, A2, , An} that includes all the attributes of the database.
VTU-EDUSAT Page 1

components by sacrificing the

Database Management System


Universal relation assumption: Every attribute name is unique Decomposition: The process of decomposing the universal relation schema R into a set of relation schemas D = {R1,R2, , Rm} that will become the relational database schema by using the functional dependencies. Attribute preservation condition: Each attribute in R will appear in at least one relation schema Ri in the decomposition so that no attributes are lost. Another goal of decomposition is to have each decomposition D be in BCNF or 3NF. needed to prevent from generating spurious individual relation Ri in the

Additional properties of decomposition are tuples.

Dependency Preservation Property of a Decomposition: Definition: Given a set of dependencies F on R, the projection of F on Ri, denoted by pRi(F) where Ri is a subset of R, is the set of dependencies X Y in F+ such that the attributes in X Y are all contained in Ri. Hence, the projection of F on each relation schema Ri in the decomposition D is the set of functional dependencies in F+, the closure of F, such that all their left- and right-handside attributes are in Ri. Dependency Preservation Property: A decomposition D = {R1, R2, ..., Rm} of R is dependency-preserving with respect to F if the union of the projections of F on each Ri in D is equivalent to F; that is
VTU-EDUSAT Page 2

Database Management System


((R1(F)) . . . (Rm(F)))+ = F+ (See examples in Fig 10.12a and Fig 10.11) Claim 1: It is always possible to find a dependency preserving F such that each relation Ri in D is in 3nf. Lossless (Non-additive) Join Property of Decomposition: Definition: Lossless join property: a decomposition D = {R1, R2, ..., Rm} of R has the lossless (nonadditive) join property with respect to the set of dependencies F on R if, for every relation state r of R that satisfies F, the following holds, where * is the natural join of all the relations in D: * ( R1(r), ..., Rm(r)) = r Note: The word loss in lossless refers to loss of information, not to loss of tuples. In fact, for loss of information a better term is addition of spurious information. Algorithm 11.1: Testing for Lossless Join Property Input: A universal relation R, a decomposition D = {R1, R2, ..., Rm} of R, and a set F of functional dependencies. 1. Create an initial matrix S with one row i for each relation Ri in D, and one column j for each attribute Aj in R. 2. Set S(i,j):=bij for all matrix entries. (* each bij is a distinct symbol associated with indices (i,j) *). 3. For each row i representing relation schema Ri {for each column j representing attribute Aj decomposition D with respect to

VTU-EDUSAT

Page 3

Database Management System


{if (relation Ri includes attribute Aj) then set S(i,j):= aj;};}; (* each aj is a distinct symbol associated with index (j) *) 4. Repeat the following loop until a complete loop execution results in no changes to S {for each functional dependency XY in F {for all rows in S which have the same symbols in the columns corresponding to attributes in X {make the symbols in each column that correspond to an attribute in Y be the same in all these rows as follows: If any of the rows has an a symbol for the column, set the other rows to that same a symbol in the column. If no a symbol exists for the attribute in any of the rows, choose one of the b symbols that appear in one of the rows for the attribute and set the other rows to that same b symbol in the column ;}; }; }; 5. If a row is made up entirely of a symbols, then the decomposition has the lossless join property; otherwise it does not. Lossless (nonadditive) join test for n-ary decompositions. (a) Case 1: Decomposition of EMP_PROJ into EMP_PROJ1 and EMP_LOCS fails test. (b) A decomposition of EMP_PROJ that has the lossless join property.

VTU-EDUSAT

Page 4

Database Management System

Lossless (nonadditive) join test for n-ary decompositions. (c) Case 2: Decomposition of EMP_PROJ into EMP, PROJECT, and WORKS_ON satisfies test

VTU-EDUSAT

Page 5

Database Management System


Testing Binary Decompositions for Lossless Join Property Binary Decomposition: Decomposition of a relation R into two relations. PROPERTY LJ1 (lossless join test for binary decompositions): A decomposition D = {R1, R2} of R has the lossless join property with respect to a set of functional dependencies F on R if and only if either The f.d. ((R1 R2) (R1- R2)) is in F+, or The f.d. ((R1 R2) (R2 - R1)) is in F+. Successive Lossless Join Decomposition: Claim 2 (Preservation of non-additivity in successive decompositions): If a decomposition D = {R1, R2, ..., Rm} of R has the lossless (non-additive) join property with respect to a set of functional dependencies F on R, and if a decomposition Di = {Q1, Q2, ..., Qk} of Ri has the lossless (non-additive) join property with respect to the projection of F on Ri, then the decomposition D2 = {R1, R2, ..., Ri-1, Q1, Q2, ..., Qk, Ri+1, ..., Rm} of R has the non-additive join property with respect to F 2. Algorithms for Relational Database Schema Design Algorithm 11.2: Relational Synthesis into 3NF with Dependency Preservation (Relational Synthesis Algorithm) Input: A universal relation R and a set of functional dependencies F on the attributes of R. 1. Find a minimal cover G for F (use Algorithm 10.2); 2. For each left-hand-side X of a functional dependency that appears in G, create a relation schema in D with attributes {X {A1} {A2} ...

VTU-EDUSAT

Page 6

Database Management System


{Ak}}, where X A1, X A2, ..., X Ak are the only dependencies in G with X as left-hand-side (X is the key of this relation) ; 3. Place any remaining attributes (that have not been placed in any relation) in a single relation schema to ensure the attribute preservation property. Claim 3: Every relation schema created by Algorithm 11.2 is in 3NF. Algorithm 11.3: Relational Decomposition into BCNF with Lossless (non-additive) join property Input: A universal relation R and a set of functional dependencies F on the attributes of R. 1. Set D := {R}; 2. While there is a relation schema Q in D that is not in BCNF do { choose a relation schema Q in D that is not in BCNF; find a functional dependency X Y in Q that violates BCNF; replace Q in D by two relation schemas (Q - Y) and (X Y); }; Assumption: No null values are allowed for the join attributes. Algorithm 11.4 Relational Synthesis into 3NF with Dependency Preservation and Lossless (Non-Additive) Join Property Input: A universal relation R and a set of functional dependencies F on the attributes of R. 1. Find a minimal cover G for F (Use Algorithm 10.2).
VTU-EDUSAT Page 7

Database Management System


2. For each left-hand-side X of a functional dependency that appears in G, create a relation schema in D with attributes {X {A1} {A2} ... {Ak}}, where X A1, X A2, ..., X >Ak are the only dependencies in G with X as left-hand-side (X is the key of this relation). 3. If none of the relation schemas in D contains a key of R, then create one more relation schema in D that contains attributes that form a key of R. (Use Algorithm 11.4a to find the key of R) Algorithm 11.4a Finding a Key K for R Given a set F of Functional Dependencies Input: A universal relation R and a set of functional dependencies F on the attributes of R. 1. Set K := R; 2. For each attribute A in K { Compute (K - A)+ with respect to F; If (K - A)+ contains all the attributes in R, then set K := K - {A};

VTU-EDUSAT

Page 8

Database Management System

VTU-EDUSAT

Page 9

Database Management System

VTU-EDUSAT

Page 10

Database Management System


Discussion of Normalization Algorithms: Problems: The database designer must first specify all the relevant functional dependencies among the database attributes. These algorithms are not deterministic in general. It is not always possible to find a decomposition into relation schemas that preserves dependencies and allows each relation schema in the decomposition to be in BCNF (instead of 3NF as in Algorithm 11.4).

VTU-EDUSAT

Page 11

Database Management System


Multi-valued Dependencies and Fourth Normal Form (a) The EMP relation with two MVDs: ENAME >> PNAME and ENAME >> DNAME. (b) Decomposing the EMP relation into two 4NF relations EMP_PROJECTS and EMP_DEPENDENTS.

(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has the JD(R1, R2, R3). (d) Decomposing the relation SUPPLY into the 5NF relations R1, R2, and R3.

VTU-EDUSAT

Page 12

Database Management System

Multi-valued Dependencies and Fourth Normal Form Definition: A multi-valued dependency (MVD) X >> Y specified on relation schema R, where X and Y are both subsets of R, specifies the following constraint on any relation state r of R: If two tuples t1 and t2 exist in r such that t1[X] = t2[X], then two tuples t3 and t4 should also exist in r with the following properties, where we use Z to denote (R - (X Y)): t3[X] = t4[X] = t1[X] = t2[X]. t3[Y] = t1[Y] and t4[Y] = t2[Y]. t3[Z] = t2[Z] and t4[Z] = t1[Z]. An MVD X >> Y in R is called a trivial MVD if (a) Y is a subset of X, or (b) X Y = R. Inference Rules for Functional and Multi-valued Dependencies: IR1 (reflexive rule for FDs): If X Y, then X > Y.

VTU-EDUSAT

Page 13

Database Management System


IR2 (augmentation rule for FDs): {X > Y} = XZ > YZ. = X > Z. = X >> (R (X Y))}.

IR3 (transitive rule for FDs): {X > Y, Y >Z}

IR4 (complementation rule for MVDs): {X >> Y}

IR5 (augmentation rule for MVDs): If X >> Y and W Z then WX >> YZ. IR6 (transitive rule for MVDs): {X >> Y, Y >> Z} = X >> (Z 2 Y). IR7 (replication rule for FD to MVD): {X > Y} = X >> Y. IR8 (coalescence rule for FDs and MVDs): If X >> Y and there exists W with the properties that (a) W Y is empty, (b) W > Z, and (c) Y Z, then X > Z. Definition: A relation schema R is in 4NF with respect to a set of dependencies F (that includes functional dependencies and multivalued dependencies) if, for every nontrivial multivalued dependency X >> Y in F+, X is a superkey for R. Note: F+ is the (complete) set of all dependencies (functional or multivalued) that will hold in every relation state r of R that satisfies F. It is also called the closure of F. Decomposing a relation state of EMP that is not in 4NF: (a) EMP relation with additional tuples. (b) Two corresponding 4NF relations EMP_PROJECTS and EMP_DEPENDENTS.

VTU-EDUSAT

Page 14

Database Management System

Lossless (Non-additive) Join Decomposition into 4NF Relations: PROPERTY LJ1 The relation schemas R1 and R2 form a lossless (non-additive) join decomposition of R with respect to a set F of functional and multi-valued dependencies if and only if (R1 R2) >> (R1 - R2) or by symmetry, if and only if (R1 R2) >> (R2 - R1)). Algorithm 11.5: Relational decomposition into 4NF relations with non-additive join property Input: A universal relation R and a set of functional and multi-valued dependencies F. 1. Set D := { R }; 2. While there is a relation schema Q in D that is not in 4NF do { choose a relation schema Q in D that is not in 4NF;
VTU-EDUSAT Page 15

Database Management System


find a nontrivial MVD X >> Y in Q that violates 4NF; replace Q in D by two relation schemas (Q - Y) and (X Y); }; Join Dependencies and Fifth Normal Form Definition: A join dependency (JD), denoted by JD(R1, R2, ..., Rn), specified on relation schema R, specifies a constraint on the states r of R. The constraint states that every legal state r of R should have a non-additive join decomposition into R1, R2, ..., Rn; that is, for every such r we have * (R1(r), R2(r), ..., Rn(r)) = r Note: an MVD is a special case of a JD where n = 2. A join dependency JD(R1, R2, ..., Rn), specified on relation schema R, is a trivial JD if one of the relation schemas Ri in JD(R1, R2, ..., Rn) is equal to R. Definition: A relation schema R is in fifth normal form (5NF) (or Project-Join Normal Form (PJNF)) with respect to a set F of functional, multivalued, and join dependencies if, for every nontrivial join dependency JD(R1, R2, ..., Rn) in F+ (that is, implied by F), every Ri is a superkey of R.

VTU-EDUSAT

Page 16

Database Management System


Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form

Inclusion Dependencies Definition: An inclusion dependency R.X < S.Y between two sets of attributesX of relation schema R, and Y of relation schema Sspecifies the constraint that, at any specific time when r is a relation state of R and s a relation state of S, we must have X(r(R)) Y(s(S)) Note: The ? (subset) relationship does not necessarily have to be a proper subset. The sets of attributes on which the inclusion dependency is specifiedX of R and Y of Smust have the same number of attributes. In addition, the domains for each pair of corresponding attributes should be compatible. Objective of Inclusion Dependencies:
VTU-EDUSAT Page 17

Database Management System


To formalize two types of interrelational constraints which cannot be expressed using F.D.s or MVDs: Referential integrity constraints Class/subclass relationships Inclusion dependency inference rules IDIR1 (reflexivity): R.X < R.X. IDIR2 (attribute correspondence): If R.X < S.Y where X = {A1, A2 ,..., An} and Y = {B1, B2, ..., Bn} and Ai Corresponds-to Bi, then R.Ai < S.Bi for 1 i n. IDIR3 (transitivity): If R.X < S.Y and S.Y < T.Z, then R.X < T.Z. Other Dependencies and Normal Forms Template Dependencies: Template dependencies provide a technique for representing constraints in relations that typically have no easy and formal definitions. The idea is to specify a templateor examplethat defines each constraint or dependency. There are two types of templates: - tuple-generating templates -constraint-generating templates. A template consists of a number of hypothesis tuples that are meant to show an example of the tuples that may appear in one or more relations. The other part of the template is the template conclusion.
VTU-EDUSAT Page 18

Database Management System

Domain-Key Normal Form (DKNF): Definition: A relation schema is said to be in DKNF if all constraints and dependencies that should hold on the valid relation states can be enforced simply by enforcing the domain constraints and key constraints on the relation.
VTU-EDUSAT Page 19

Database Management System


The idea is to specify (theoretically, at least) the ultimate normal form that takes into account all possible types of dependencies and constraints. . For a relation in DKNF, it becomes very straightforward to enforce all database constraints by simply checking that each attribute value in a tuple is of the appropriate domain and that every key constraint is enforced. The practical utility of DKNF is limited

VTU-EDUSAT

Page 20

Database Management System


TRANSACTION MANAGEMENT Transaction A transaction is a unit of a program execution that accesses and possibly modifies various data objects (tuples, relations). A transaction is a Logical unit of database processing that includes one or more access operations (read -retrieval, write - insert or update, delete). A transaction (set of operations) may be stand-alone specified in a high level language like SQL submitted interactively, or may be embedded within a program. A transaction (collection of actions) makes transformations of system states while preserving the database consistency. A users program may carry out many operations on the data retrieved from the database, but the DBMS is only concerned about what data is read/written from/to the database. A transaction is the DBMSs abstract view of a user program: a sequence of reads and writes.

PROPOERTIES OF TRANSACTION The DBMS need to ensure the following properties of transactions: 1. Atomicity Transactions are either done or not done They are never left partially executed An executing transaction completes in its entirety or it is aborted altogether. e.g., Transfer_Money (Amount, X, Y) means i) DEBIT (Amount, X); ii) CREDIT (Amount, Y). Either both take place or none

2. Consistency Transactions should leave the database in a consistent state If each Transaction is consistent, and the DB starts consistent, then the Database ends up consistent. If a transaction violates the databases consistency rules, the entire transaction will be rolled back and the database will be restored to a state consistent with those rules.

VTU EDUSAT

Page 1

Database Management System


3. Isolation Transactions must behave as if they were executed in isolation. An executing transaction cannot reveal its (incomplete) results before it commits. Consequently, the net effect is identical to executing all transactions, the one after the other in some serial order.

4. Durability Effects of completed transactions are resilient against failures Once a transaction commits, the system must guarantee that the results of its operations will never be lost, in spite of subsequent failures SIMPLE MODEL OF A DATABASE A database is a collection of named data items. Granularity of data - a field, a record, or a whole disk block (Concepts are independent of granularity). Basic operations are read and write: read_item(X): Reads a database item named X into a program variable. To simplify our notation, we assume that the program variable is also named X. write_item(X): Writes the value of program variable X into the database item named X. READ AND WRITE OPERATIONS: Basic unit of data transfer from the disk to the computer main memory is one block. In general, a data item (what is read or written) will be the field of some record in the database, although it may be a larger unit such as a record or even a whole block.

read_item(X) command includes the following steps:

1. Find the address of the disk block that contains item X. 2. Copy that disk block into a buffer in main memory (if that disk block is not already in some main memory buffer).
VTU EDUSAT Page 2

Database Management System


3. Copy item X from the buffer to the program variable named X. write_item(X) command includes the following steps:

1. Find the address of the disk block that contains item X. 2. Copy that disk block into a buffer in main memory (if that disk block is not already in some main memory buffer). 3. Copy item X from the program variable named X into its correct location in the buffer. 4. Store the updated block from the buffer back to disk (either immediately or at some later point in time).

Two sample transactions

Transaction Example in MySQL START TRANSACTION; SELECT@A:=SUM(salary) FROMtable1 WHEREtype=1; UPDATEtable2 SETsummary=@A WHEREtype=1; COMMIT;

Transaction Example in Oracle(same with SQL Server) When you connect to the database with sqlplus(Oracle command-line utility that runs SQL and PL/SQL commands interactively or from a script) a transaction begins. Once the transaction begins, every SQL DML (Data Manipulation Language) statement you issue subsequently becomes a part of this transaction

VTU EDUSAT

Page 3

Database Management System

TRANSACTION STATES 1. Active state 2. Partially committed state 3. Committed state 4. Failed state 5. Terminated State State transition diagram illustrating the states for transaction execution:

VTU EDUSAT

Page 4

Database Management System


Transaction Processing System

VTU EDUSAT

Page 5

Database Management System

CONCURRENCY CONTROL

Concurrency in a DBMS Concurrent execution of user programs is essential for good DBMS performance. Because disk accesses are frequent, and relatively slow, it is important to keep the CPU humming by working on several user programs concurrently.

Users submit transactions, and can think of each transaction as executing by itself.

Concurrency is achieved by the DBMS, which interleaves actions (reads/writes of DB objects) of various transactions. Each transaction must leave the database in a consistent state if the DB is consistent when the transaction begins. DBMS will enforce some ICs, depending on the ICs declared in CREATE TABLE statements. Beyond this, the DBMS does not really understand the semantics of the data. (e.g., it does not understand how the interest on a bank account is computed).

Things get even more complicated if we have several DBMS programs (transactions) executed concurrently.

Synchronization" of transactions; allowing concurrency (instead of insisting on a strict serial transaction execution, i.e., process complete T1, then T2, then T3 etc.) - increase the throughput of the system, - minimize response time for each transaction

Why do we need concurrent executions? It is essential for good DBMS performance! Disk accesses are frequent, and relatively slow. Overlapping I/O with CPU activity increases throughput and response time.
VTU EDUSAT Page 6

Database Management System

What is the problem with concurrent transactions? Interleaving transactions might lead the system to an inconsistent state (like previous example): Scenario: A Xact prints the monthly bank account statement for a user U (one bank transaction at-a-time).Before finalizing the report another Xact withdraws $X from user U. Result: Although the report contains an updated final balance, it shows nowhere the bank transaction that caused the decrease (unrepeatable read problem, explained next) A DBMS guarantees that these problems will not arise. Users are given the impression that the transactions are executed sequentially, the one after the other.

Why Concurrency Control is needed? Problems that can occur for certain transaction schedules without appropriate concurrency control mechanisms:

The Lost Update Problem This occurs when two transactions that access the same database items have their operations interleaved in a way that makes the value of some database item incorrect.
VTU EDUSAT Page 7

Database Management System

The Temporary Update (or Dirty Read) Problem This occurs when one transaction updates a database item and then the transaction fails for some reason. The updated item is accessed by another transaction before it is changed back to its original value.

The Incorrect Summary Problem If one transaction is calculating an aggregate summary function on a number of records while other transactions are updating some of these records, the aggregate function may calculate some values before they are updated and others after they are updated.

a) The Lost Update Problem

The update performed by T1 gets lost; possible solution: T1 locks/unlocks database object X =) T2 cannot read X while X is modified by T1
VTU EDUSAT Page 8

Database Management System


b) The temporary update problem

T1 modifies db object, and then the transactionT1 fails for some reason. Meanwhile the modified db object, however, has been accessed by another transaction T2. Thus T2 has read data that never existed.

VTU EDUSAT

Page 9

Database Management System


c) The incorrect summary problem

In this schedule, the total computed by T1 is wrong. =) T1 must lock/unlock several db objects.

VTU EDUSAT

Page 10

Database Management System


The following are the Problems that arise when interleaving Transactions (and they are already discussed above but the terminology is different):

Problem 1: Reading Uncommitted Data (WR Conflicts) Reading the value of an uncommitted object might yield an inconsistency Dirty Reads or Write-then-Read (WR) Conflicts.

Problem 2: Unrepeatable Reads (RW Conflicts) Reading the same object twice might yield an inconsistency Read-then-Write (RW) Conflicts ( Write-After-Read)

Problem 3: Overwriting Uncommitted Data (WW Conflicts) Overwriting an uncommitted object might yield an inconsistency Lost Update or Write-After-Write (WW) Conflicts.

Remark: There is no notion of RR-Conflicts no object is changed

VTU EDUSAT

Page 11

Database Management System


1. Reading Uncommitted Data (WR Conflicts) To illustrate the WR-conflict consider the following problem: T1: Transfer $100 from Account A to Account B T2: Add the annual interest of 6% to both A and B.

Problem caused by the WR-Conflict? Account B was credited with the interest on a smaller amount (i.e., 100$ less), thus the result is not equivalent to the serial schedule.

2. Unrepeatable Reads (RW Conflicts) To illustrate the RW-conflict, consider the following problem:

T1: Print Value of A T2: Decrease Global counter A by 1.

VTU EDUSAT

Page 12

Database Management System

Problem caused by the RW-Conflict? Although the A counter is read twice within T1 (without any intermediate change) it has two different values (unrepeatable read)! what happens if T2 aborts? 1 has shown an incorrect result.

3. Overwriting Uncommitted Data (WW Conflicts) To illustrate the WW-conflict consider the following problem: Salary of employees A and B must be kept equal T1: Set Salary to 1000; T2: Set Salary equal to 2000

Problem caused by the WW-Conflict? Employee A gets a salary of 2000 while employee B gets a salary of 1000, thus result is not equivalent to the serial schedule!

VTU EDUSAT

Page 13

Database Management System


Summary of Conflicts 1. WR Conflict (dirty read): A transaction T2 could read a database object A that has been modified by another transaction T1, which has not yet committed. 2. RW Conflict (unrepeatable read): A transaction T2could change the value of an object A that has been read by a transaction T1, while T1is still in progress.

3. WW Conflict (lost update): A transaction T2 could overwrite the value of an object A, which has already been modified by a transaction T1, while T1is still in progress.

Why recovery is needed: (What causes a Transaction to fail)

1. A computer failure (system crash): A hardware or software error occurs in the computer system during transaction execution. If the hardware crashes, the contents of the computers internal memory may be lost.

2. A transaction or system error: Some operation in the transaction may cause it to fail, such as integer overflow or division by zero. Transaction failure may also occur because of erroneous parameter values or because of a logical programming error. In addition, the user may interrupt the transaction during its execution.

3. Local errors or exception conditions detected by the transaction: Certain conditions necessitate cancellation of the transaction. For example, data for the transaction may not be found. A condition, such as insufficient account balance in a banking database, may cause a transaction, such as a fund withdrawal from that account, to be canceled. A programmed abort in the transaction causes it to fail.
VTU EDUSAT Page 14

Database Management System

4. Concurrency control enforcement: The concurrency control method may decide to abort the transaction, to be restarted later, because it violates serializability or because several transactions are in a state of deadlock.

5. Disk failure: Some disk blocks may lose their data because of a read or write malfunction or because of a disk read/write head crash. This may happen during a read or a write operation of the transaction.

6. Physical problems and catastrophes: This refers to an endless list of problems that includes power or air-conditioning failure, fire, theft, sabotage, overwriting disks or tapes by mistake, and mounting of a wrong tape by the operator.

Recovery manager keeps track of the following operations:

begin_transaction: This marks the beginning of transaction execution.

read or write: These specify read or write operations on the database items that are executed as part of a transaction.

end_transaction: This specifies that read and write transaction operations have ended and marks the end limit of transaction execution.

At this point it may be necessary to check whether the changes introduced by the transaction can be permanently applied to the database or whether the transaction has to be aborted because it violates concurrency control or for some other reason.

VTU EDUSAT

Page 15

Database Management System


commit_transaction: This signals a successful end of the transaction so that any changes (updates) executed by the transaction can be safely committed to the database and will not be undone.

rollback (or abort): This signals that the transaction has ended unsuccessfully, so that any changes or effects that the transaction may have applied to the database must be undone.

Recovery techniques use the following operators: undo: Similar to rollback except that it applies to a single operation rather than to a whole transaction.

redo: This specifies that certain transaction operations must be redone to ensure that all the operations of a committed transaction have been applied successfully to the database.

The System Log Log or Journal: The log keeps track of all transaction operations that affect the values of database items. This information may be needed to permit recovery from transaction failures. The log is kept on disk, so it is not affected by any type of failure except for disk or catastrophic failure. In addition, the log is periodically backed up to archival storage (tape) to guard against such catastrophic failures. T in the following discussion refers to a unique transaction-id that is generated automatically by the system and is used to identify each transaction:

The following actions are recorded in the log: _ Ti writes an object: the old value and the new value. Log record must go to disk before the changed page! _ Ti commits/aborts: a log record indicating this action.
VTU EDUSAT Page 16

Database Management System


Log records are chained together by Xact id, so its easy to undo a specific Xact. Log is often duplexed and archived on stable storage. All log related activities (and in fact, all CC related activities such as lock/unlock, dealing with deadlocks etc.) are handled transparently by the DBMS.

Types of log record: [start_transaction,T]: Records that transaction T has started execution. [write_item,T,X,old_value,new_value]: Records that transaction T has changed the value of database item X from old_value to new_value. [read_item,T,X]: Records that transaction T has read the value of database item X. [commit,T]: Records that transaction T has completed successfully, and affirms that its effect can be committed (recorded permanently) to the database. [abort,T]: Records that transaction T has been aborted. Protocols for recovery that avoid cascading rollbacks do not require that read operations be written to the system log, whereas other protocolsrequire these entries for recovery. Strict protocols require simpler write entries that do not include new_value Recovery using log records: If the system crashes, we can recover to a consistent database state by examining the log. 1. Because the log contains a record of every write operation that changes the value of some database item, it is possible to undo the effect of these write operations of a transaction T by tracing backward through the log and resetting all items changed by a write operation of T to their old_values. 2. We can also redo the effect of the write operations of a transaction T by tracing forward through the log and setting all items changed by a write operation of T (that did not get done permanently) to their new_values.

Commit Point of a Transaction: Definition a Commit Point:

VTU EDUSAT

Page 17

Database Management System


A transaction T reaches its commit point when all its operations that access the database have been executed successfully and the effect of all the transaction operations on the database has been recorded in the log.

Beyond the commit point, the transaction is said to be committed, and its effect is assumed to be permanently recorded in the database.

The transaction then writes an entry [commit,T] into the log. Roll Back of transactions: Needed for transactions that have a [start_transaction,T] entry into the log but no commit entry [commit,T] into the log.

Redoing transactions: Transactions that have written their commit entry in the log must also have recorded all their write operations in the log; otherwise they would not be committed, so their effect on the database can be redone from the log entries. (Notice that the log file must be kept on disk. At the time of a system crash, only the log entries that have been written back to disk are considered in the recovery process because the contents of main memory may be lost.)

Force writing a log: Before a transaction reaches its commit point, any portion of the log that has not been written to the disk yet must now be written to the disk. This process is called force-writing the log file before committing a transaction.

Characterizing Schedules based on Recoverability

Transaction schedule or history:

VTU EDUSAT

Page 18

Database Management System


When transactions are executing concurrently in an interleaved fashion, the order of execution of operations from the various transactions forms what is known as a transaction schedule (or history).

A schedule (or history) S of n transactions T1, T2, , Tn: It is an ordering of the operations of the transactions subject to the constraint that, for each transaction Ti that participates in S, the operations of T1 in S must appear in the same order in which they occur in T1. Note, however, that operations from other transactions Tj can be interleaved with the operations of Ti in S.

Serializability: DBMS must control concurrent execution of transactions to ensure read consistency, i.e., to avoid dirty reads etc. A (possibly concurrent) schedule S is serializable if it is equivalent to a serial schedule S0, i.e., S has the same result database state as S0.

How to ensure serializability of concurrent transactions? Conflicts between operations of two transactions:

VTU EDUSAT

Page 19

Database Management System


A schedule S is serializable with regard to the above conflicts iff S can be transformed into a serial schedule S' by a series of swaps of non-conflicting operations.

Checks for serializability are based on precedence graph that describes dependencies among concurrent transactions; if the graph has no cycle, and then the transactions are serializable. - they can be executed concurrently without affecting each others transaction result.

Atomicity of Transactions A transaction might commit after completing all its actions, or it could abort (or be aborted by the DBMS) after executing some actions. A very important property guaranteed by the DBMS for all transactions is that they are atomic. That is, a user can think of a Xact as always executing all its actions in one step, or not executing any actions at all. _ DBMS logs all actions so that it can undo the actions of aborted transactions. Example Consider two transactions (Xacts): T1: BEGIN A=A+100, B=B-100 END T2: BEGIN A=1.06*A, B=1.06*B END

Intuitively, the first transaction is transferring $100 from Bs account to As account. The second is crediting both accounts with a 6% interest payment. There is no guarantee that T1 will execute before T2 or vice-versa, if both are submitted together. However, the net effect must be equivalent to these two transactions running serially in some order.

VTU EDUSAT

Page 20

Database Management System

Scheduling Transactions Serial schedule: Schedule that does not interleave the actions of different transactions.

Equivalent schedules: For any database state, the effect (on the set of objects in the database) of executing the first schedule is identical to the effect of executing the second schedule.

Serializable schedule: transactions.

A schedule that is equivalent to some serial execution of the

(Note: If each transaction preserves consistency, every serializable schedule preserves consistency.)

VTU EDUSAT

Page 21

Database Management System

Schedules classified on recoverability: Recoverable schedule: One where no transaction needs to be rolled back. A schedule S is recoverable if no transaction T in S commits until all transactions T that have written an item that T reads have committed. Cascadeless schedule: One where every transaction reads only the items that are written by committed transactions. Schedules requiring cascaded rollback: A schedule in which uncommitted transactions that read an item from a failed transaction must be rolled back.

Strict Schedules: A schedule in which a transaction can neither read or write an item X until the last transaction that wrote X has committed.

VTU EDUSAT

Page 22

Database Management System


Serial schedule: A schedule S is serial if, for every transaction T participating in the schedule, all the operations of T are executed consecutively in the schedule. Otherwise, the schedule is called nonserial schedule.

Serializable schedule: A schedule S is serializable if it is equivalent to some serial schedule of the same n transactions

Result equivalent: Two schedules are called result equivalent if they produce the same final state of the database.

Conflict equivalent: Two schedules are said to be conflict equivalent if the order of any two conflicting operations is the same in both schedules.

Conflict serializable: A schedule S is said to be conflict serializable if it is conflict equivalent to some serial schedule S.

Being serializable is not the same as being serial. Being serializable implies that the schedule is a correct schedule. It will leave the database in a consistent state. The interleaving is appropriate and will result in a state as if the transactions were serially executed, yet will achieve efficiency due to concurrent execution.

Serializability is hard to check. Interleaving of operations occurs in an operating system through some scheduler Difficult to determine beforehand how the operations in a schedule will be interleaved

Practical approach: Come up with methods (protocols) to ensure serializability.


VTU EDUSAT Page 23

Database Management System


Its not possible to determine when a schedule begins and when it ends. Hence, we reduce the problem of checking the whole schedule to checking only a committed project of the schedule (i.e. operations from only the committed transactions.)

Current approach used in most DBMSs: Use of locks with two phase locking

View equivalence: A less restrictive definition of equivalence of schedules

View serializability: Definition of serializability based on view equivalence. A schedule is view serializable if it is view equivalent to a serial schedule.

Two schedules are said to be view equivalent if the following three conditions hold: 1. The same set of transactions participates in S and S, and S and S include the same operations of those transactions. 2. For any operation Ri(X) of Ti in S, if the value of X read by the operation has been written by an operation Wj(X) of Tj (or if it is the original value of X before the schedule started), the same condition must hold for the value of X read by operation Ri(X) of Ti in S. 3. If the operation Wk(Y) of Tk is the last operation to write item Y in S, then Wk(Y) of Tk must also be the last operation to write item Y in S.

The premise behind view equivalence: As long as each read operation of a transaction reads the result of the same write operation in both schedules, the write operations of each transaction must produce the same results. The view: the read operations are said to see the same view in both schedules

Relationship between view and conflict equivalence: The two are same under constrained write assumption which assumes that if T writes X, it is constrained by the value of X it read; i.e., new X = f(old X)
VTU EDUSAT Page 24

Database Management System


Conflict serializability is stricter than view serializability. With unconstrained write (or blind write), a schedule that is view serializable is not necessarily conflict serializable. Any conflict serializable schedule is also view serializable, but not vice versa.

Relationship between view and conflict equivalence Consider the following schedule of three transactions T1: r1(X), w1(X); T2: w2(X); and T3: w3(X): Schedule Sa: r1(X); w2(X); w1(X); w3(X); c1; c2; c3; In Sa, the operations w2(X) and w3(X) are blind writes, since T1 and T3 do not read the value of X. Sa is view serializable, since it is view equivalent to the serial schedule T1, T2, T3. However, Sa is not conflict serializable, since it is not conflict equivalent to any serial schedule

Testing for conflict serializability: Algorithm 17.1: 1. Looks at only read_Item (X) and write_Item (X) operations 2. Constructs a precedence graph (serialization graph) - a graph with directed edges 3. An edge is created from Ti to Tj if one of the operations in Ti appears before a conflicting operation in Tj 4. The schedule is serializable if and only if the precedence graph has no cycles.

Lock-Based Concurrency Control

Strict Two-phase Locking (Strict 2PL) Protocol: _ Each Xact must obtain a S (shared) lock on object before reading, and an X (exclusive) lock on object before writing. _ All locks held by a transaction are released when the transaction completes _ If an Xact holds an X lock on an object, no other Xact can get a lock (S or X) on that object. Strict 2PL allows only serializable schedules.

VTU EDUSAT

Page 25

Database Management System

Aborting a Transaction If a transaction Ti is aborted, all its actions have to be undone. Not only that, if Tj reads an object last written by Ti, Tj must be aborted as well Most systems avoid such cascading aborts by releasing a transactions locks only at commit time. _ If Ti writes an object, Tj can read this only after Ti commits. In order to undo the actions of an aborted transaction, the DBMS maintains a log in which every write is recorded. This mechanism is also used to recover from system crashes: all active Xacts at the time of the crash are aborted when the system comes back up. Recovering From a Crash There are 3 phases in the Aries recovery algorithm: _ Analysis: Scan the log forward (from the most recent checkpoint) to identify all Xacts that were active, and all dirty pages in the buffer pool at the time of the crash. _ Redo: Redoes all updates to dirty pages in the buffer pool, as needed, to ensure that all logged updates are in fact carried out and written to disk. _ Undo: The writes of all Xacts that were active at the crash are undone (by restoring the before value of the update, which is in the log record for the update), working backwards in the log. (Some care must be taken to handle the case of a crash occurring during the recovery process!)

Conflict Serializable Schedules Two schedules are conflict equivalent if: _ Involve the same actions of the same transactions _ Every pair of conflicting actions is ordered the same way Schedule S is conflict serializable if S is conflict equivalent to some serial schedule

Example A schedule that is not conflict serializable:

VTU EDUSAT

Page 26

Database Management System

The cycle in the graph reveals the problem. The output of T1 depends on T2, and viceversa.

Dependency Graph Dependency graph: One node per Xact; edge from Ti to Tj if Tj reads/writes an object last written by Ti. Theorem: Schedule is conflict serializable if and only if its dependency graph is acyclic

Review: Strict 2PL Strict Two-phase Locking (Strict 2PL) Protocol: _ Each Xact must obtain a S (shared) lock on object before reading, and an X (exclusive) lock on object before writing. _ All locks held by a transaction are released when the transaction completes _ If an Xact holds an X lock on an object, no other Xact can get a lock (S or X) on that object. Strict 2PL allows only schedules whose precedence graph is acyclic

Two-Phase Locking (2PL) Two-Phase Locking Protocol _ Each Xact must obtain a S (shared) lock on object before reading, and an X (exclusive) lock on object before writing. _ A transaction can not request additional locks once it releases any locks. _ If an Xact holds an X lock on an object, no other Xact can get a lock (S or X) on that object.

View Serializability Schedules S1 and S2 are view equivalent if: _ If Ti reads initial value of A in S1, then Ti also reads initial value of A in S2
VTU EDUSAT Page 27

Database Management System


_ If Ti reads value of A written by Tj in S1, then Ti also reads value of A written by Tj in S2 _ If Ti writes final value of A in S1, then Ti also writes final value of A in S2

Lock Management Lock and unlock requests are handled by the lock manager Lock table entry: _ Number of transactions currently holding a lock _ Type of lock held (shared or exclusive) _ Pointer to queue of lock requests Locking and unlocking have to be atomic operations Lock upgrade: transaction that holds a shared lock can be upgraded to hold an exclusive lock

Deadlocks Deadlock: Cycle of transactions waiting for locks to be released by each other. Two ways of dealing with deadlocks: _ Deadlock prevention _ Deadlock detection

Deadlock Prevention Assign priorities based on timestamps. Assume Ti wants a lock that Tj holds. Two policies are possible: _ Wait-Die: It Ti has higher priority, Ti waits for Tj; otherwise Ti aborts _ Wound-wait: If Ti has higher priority, Tj aborts; otherwise Ti waits If a transaction re-starts, make sure it has its original timestamp

Deadlock Detection Create a waits-for graph: _ Nodes are transactions


VTU EDUSAT Page 28

Database Management System


_ There is an edge from Ti to Tj if Ti is waiting for Tj to release a lock Periodically check for cycles in the waits-for Graph

Deadlock Detection

Multiple-Granularity Locks Hard to decide what granularity to lock (tuples vs. pages vs. tables).

VTU EDUSAT

Page 29

Das könnte Ihnen auch gefallen