Sie sind auf Seite 1von 41

WHAT IS A DATABASE?

A database is a collection of data that is related to a particular topic or purpose that can be processed to
produce information. Mostly data represents recordable facts. Data aids in producing information, which
is based on facts. For example, if we have data about marks obtained by all students, we can then
conclude about toppers and average marks.
A database management system (DBMS) is a system that stores and retrieves information in a
database. It is used to help you organize your data according to a subject, so that it is easy to track and
verify your data, and you can store information about how different subjects are related, so that it makes
it easy to bring related data together. A database management system stores data in such a way that it
becomes easier to retrieve, manipulate, and produce information.
Characteristics
A modern DBMS has the following characteristics
Real-world entity A modern DBMS is more realistic and uses real-world entities to
design its architecture. It uses the behavior and attributes too. For example, a school
database may use students as an entity and their age as an attribute.
Relation-based tables DBMS allows entities and relations among them to form tables.
A user can understand the architecture of a database just by looking at the table names.
Less redundancy DBMS follows the rules of normalization, which splits a relation when
any of its attributes is having redundancy in values.
Consistency Consistency is a state where every relation in a database remains
consistent. There exist methods and techniques, which can detect attempt of leaving
database in inconsistent state.
Query Language DBMS is equipped with query language, which makes it more
efficient to retrieve and manipulate data. A user can apply as many and as different
filtering options as required to retrieve a set of data. Traditionally it was not possible
where file-processing system was used.
Multiuser and Concurrent Access DBMS supports multi-user environment and allows
them to access and manipulate data in parallel.
ACID PROPERTY
ACID properties are an important concept for databases. The acronym stands for Atomicity, Consistency,
Isolation, and Durability.
The ACID properties of a DBMS allow safe sharing of data. Without these ACID properties, everyday
occurrences such using computer systems to buy products would be difficult and the potential for
inaccuracy would be huge. Imagine more than one person trying to buy the same size and color of a
sweater at the same time -- a regular occurrence. The ACID properties make it possible for the merchant
to keep these sweater purchasing transactions from overlapping each other -- saving the merchant from
erroneous inventory and account balances.
A transaction is a very small unit of a program and it may contain several lowlevel tasks. A transaction in
a database system must maintain Atomicity, Consistency, Isolation, and Durability commonly known as
ACID properties in order to ensure accuracy, completeness, and data integrity.

Atomicity This property states that a transaction must be treated as an atomic unit, that
is, either all of its operations are executed or none. There must be no state in a database
where a transaction is left partially completed. States should be defined either before the
execution of the transaction or after the execution/abortion/failure of the transaction.

Consistency The database must remain in a consistent state after any transaction. No
transaction should have any adverse effect on the data residing in the database. If the

database was in a consistent state before the execution of a transaction, it must remain
consistent after the execution of the transaction as well.

Isolation In a database system where more than one transaction are being executed
simultaneously and in parallel, the property of isolation states that all the transactions will
be carried out and executed as if it is the only transaction in the system. No transaction
will affect the existence of any other transaction.

Durability The database should be durable enough to hold all its latest updates even if
the system fails or restarts. If a transaction updates a chunk of data in a database and
commits, then the database will hold the modified data. If a transaction commits but the
system fails before the data could be written on to the disk, then that data will be updated
once the system springs back into action.

Advantages of DBMS
The database management system has a number of advantages as compared to traditional computer
file-based processing approach. The DBA must keep in mind these benefits or capabilities during
databases and monitoring the DBMS.
The Main advantages of DBMS are described below.
Controlling Data Redundancy
In non-database systems each application program has its own private files. In this case, the duplicated
copies of the same data is created in many places. In DBMS, all data of an organization is integrated into
a single database file. The data is recorded in only one place in the database and it is not duplicated.
Sharing of Data
In DBMS, data can be shared by authorized users of the organization. The database administrator
manages the data and gives rights to users to access the data. Many users can be authorized to access
the same piece of information simultaneously. The remote users can also share same data. Similarly, the
data of same database can be shared between different application programs.
Data Consistency
By controlling the data redundancy, the data consistency is obtained. If a data item appears only once,
any update to its value has to be performed only once and the updated value is immediately available to
all users. If the DBMS has controlled redundancy, the database system enforces consistency.
Integration of Data
In Database management system, data in database is stored in tables. A single database contains
multiple tables and relationships can be created between tables (or associated data entities). This makes
easy to retrieve and update data.
Data Security
Form is very important object of DBMS. You can create forms very easily and quickly in DBMS. Once a
form is created, it can be used many times and it can be modified very easily. The created forms are also
saved along with database and behave like a software component. A form provides very easy way (userfriendly) to enter data into database, edit data and display data from database. The non-technical users
can also perform various operations on database through forms without going into technical details of a
database.
Users

A typical DBMS has users with different rights and permissions who use it for different purposes. Some
users retrieve data and some back it up. The users of a DBMS can be broadly categorized as follows

Administrators Administrators maintain the DBMS and are responsible for


administrating the database. They are responsible to look after its usage and by whom it
should be used. They create access profiles for users and apply limitations to maintain
isolation and force security. Administrators also look after DBMS resources like system
license, required tools, and other software and hardware related maintenance.

Designers Designers are the group of people who actually work on the designing part
of the database. They keep a close watch on what data should be kept and in what
format. They identify and design the whole set of entities, relations, constraints, and
views.

End Users End users are those who actually reap the benefits of having a DBMS. End
users can range from simple viewers who pay attention to the logs or market rates to
sophisticated users such as business analysts.

DBMS ARCHITECTURE
The design of a DBMS depends on its architecture. It can be centralized or decentralized or hierarchical.
The architecture of a DBMS can be seen as either single tier or multi-tier. An n-tier architecture divides
the whole system into related but independent n modules, which can be independently modified, altered,
changed, or replaced.
In 1-tier architecture, the DBMS is the only entity where the user directly sits on the DBMS and uses it.
Any changes done here will directly be done on the DBMS itself. It does not provide handy tools for endusers. Database designers and programmers normally prefer to use single-tier architecture.
If the architecture of DBMS is 2-tier, then it must have an application through which the DBMS can be
accessed. Programmers use 2-tier architecture where they access the DBMS by means of an
application. Here the application tier is entirely independent of the database in terms of operation, design,
and programming.
3-tier Architecture

A 3-tier architecture separates its tiers from each other based on the complexity of the users and how
they use the data present in the database. It is the most widely used architecture to design a DBMS.

Database (Data) Tier At this tier, the database


resides along with its query processing
languages. We also have the relations that
define the data and their constraints at this level.
Application (Middle) Tier At this tier reside
the application server and the programs that
access the database. For a user, this application
tier presents an abstracted view of the database.
End-users are unaware of any existence of the
database beyond the application.
At the other end, the database tier is not aware of any other user beyond the application tier. Hence, the
application layer sits in the middle and acts as a mediator between the end-user and the database.

User (Presentation) Tier End-users operate on this tier and they know nothing about
any existence of the database beyond this layer. At this layer, multiple views of the
database can be provided by the application. All views are generated by applications that
reside in the application tier.

Multiple-tier database architecture is highly modifiable, as almost all its components are independent and
can be changed independently.

Components of the Database System Environment

There are five major components in the database system environment and their interrelationships are.
Hardware
Software
Data
Users
Procedures

3. Data : It is the most important component of


DBMS environment from the end users point of
view. As shown in observes that data acts as a
bridge between the machine components and the
user components. The database contains the
operational data and the meta-data, the 'data
about data'. The database should contain all the
data needed by the organization.

1.Hardware: The hardware is the actual computer


system used for keeping and accessing the database.
Conventional DBMS hardware consists of secondary
storage devices, usually hard disks, on which the
database physically resides, together with the associated
Input-Output devices, device controllers and so forth.
Databases run on a' range of machines, from
Microcomputers to large mainframes.
2. Software: The software is the actual DBMS.
Between the physical database itself (i.e. the data as
actually stored) and the users of the system is a layer of
software, usually called the Database Management
System or DBMS. All requests from users for access to
the database are handled by the DBMS. One general
function provided by the DBMS is thus the shielding of
database users from complex hardware-level detail.
The DBMS allows the users to communicate with the
database. In a sense, it is the mediator between the
database and the users. The DBMS controls the access
and helps to maintain the consistency of the data.
Utilities are usually included as part of the DBMS.
4. Users : There are a number of users who can access
or retrieve data on demand using the applications and
interfaces provided by the DBMS. Each type of user
needs different software capabilities. The users of a
database system can be classified in the following
groups, depending on their degrees of expertise or the
mode of their interactions with the DBMS.

5. Procedures: Procedures refer to the instructions and rules that govern the design and use of the
database. The users of the system and the staff that manage the database require documented procedures
on how to use or run the system.
These may consist of instructions on how to:
Log on to the DBMS.
Use a particular DBMS facility or application program.

Start and stop the DBMS.


Make backup copies of the database.
Handle hardware or software failures.
Change the structure of a table, reorganize the database across multiple disks, improve performance, or
archive data to secondary storage.
Database Schema

A database schema is the skeleton structure that represents the logical view of the entire database. It defines how the
data is organized and how the relations among them are associated. It formulates all the constraints that are to be
applied on the data.
A database schema defines its entities and the relationship among them. It contains a descriptive detail of the database,
which can be depicted by means of schema diagrams. Its the database designers who design the schema to help
programmers understand the database and make it useful.

A database schema can be divided broadly into two categories

Physical Database Schema This schema pertains to the


actual storage of data and its form of storage like files,
indices, etc. It defines how the data will be stored in a
secondary storage.

Logical Database Schema This schema defines all the


logical constraints that need to be applied on the data stored.
It defines tables, views, and integrity constraints.

Database Instance

It is important that we distinguish these two terms individually. Database schema is the skeleton of database. It is
designed when the database doesn't exist at all. Once the database is operational, it is very difficult to make any
changes to it. A database schema does not contain any data or information.
A database instance is a state of operational database with data at any given time. It contains a snapshot of the
database. Database instances tend to change with time. A DBMS ensures that its every instance (state) is in a valid
state, by diligently following all the validations, constraints, and conditions that the database designers have imposed.
ER Model
The ER model defines the conceptual view of a database. It works around real-world entities and the
associations among them. At view level, the ER model is considered a good option for designing
databases.
Entity

An entity can be a real-world object, either animate or inanimate, that can be easily identifiable. For
example, in a school database, students, teachers, classes, and courses offered can be considered as
entities. All these entities have some attributes or properties that give them their identity.
An entity set is a collection of similar types of entities. An entity set may contain entities with attribute
sharing similar values. For example, a Students set may contain all the students of a school; likewise a
Teachers set may contain all the teachers of a school from all faculties.
Attributes
Entities are represented by means of their properties, called attributes. All attributes have values. For
example, a student entity may have name, class, and age as attributes.
There exists a domain or range of values that can be assigned to attributes. For example, a student's name
cannot be a numeric value. It has to be alphabetic. A student's age cannot be negative, etc.
Types of Attributes

Simple attribute Simple attributes are atomic values, which cannot be divided further. For
example, a student's phone number is an atomic value of 10 digits.

Composite attribute Composite attributes are made of more than one simple attribute.
For example, a student's complete name may have first_name and last_name.

Derived attribute Derived attributes are the attributes that do not exist in the physical
database, but their values are derived from other attributes present in the database. For
example, average_salary in a department should not be saved directly in the database,
instead it can be derived. For another example, age can be derived from data_of_birth.

Single-value attribute Single-value attributes contain single value. For example


Social_Security_Number.

Multi-value attribute Multi-value attributes may contain more than one values. For
example, a person can have more than one phone number, email_address, etc.

These attribute types can come together in a way like


1. simple single-valued attributes
3. composite single-valued attributes
Entity-Set and Keys

2. simple multi-valued attributes


4. composite multi-valued attributes

A key is an attribute or a set of attributes in a relation that identifies a tuple in a relation. The keys are
defined in a table to access or sequence the stored data quickly and smoothly. They are also used to create
relationship between different tables.
Types of Keys
Following are the different types of keys.

1. Super key.
3. Primary Key.
5. Composite key.
Super Key

2. Candidate key.
4. Alternate Key.
6. Foreign key.

A super key is an attribute or combination of attributes in a relation that identifies a tuple uniquely within the
relation. A super key is the most general type of key. For example, in a relation STUDENT consists of
different attributes like RegistrationNo, Name, FatherName, Class and Address. The only attribute that can
uniquely identify a Tuple in a relation is RegistrationNo. The Name attribute cannot identify a tuple because
two or more students may have the same Name. Similarly FaththeName, Class and Address can not be
used to identify a tuple. It means that RegistrationNo is the super key for the relation. Any combination of
attributes with the super key is also a super key. It means any attribute or set of attributes combined with
the super key Registrationno will also become a super key. A combination of two attributes {RegistrationNo,
Name} is also a super key. This combination can also be used to identify a tuple in a relation. Similarly
{RegistrationNo, Class} or {RegistrationNo, Name, Class} are super keys.
Candidate Key
A candidate key is a super key that contains no extra attribute. It consists of minimum possible attributes. A
super key like {RegistrationNo, Name} contains an extra field Name. It can be used to identify a tuple
uniquely in the relation, But it does not consist of minimum possible attribute as only RegistrationNo can be
used to identify a tuple in a relation. It means that {RegistrationNo, Name} is a super key but it is not a
candidate key because it contains an extra field. On the other hand, RegistrationNo is a super key as well
as candidate key.
Primary Key
A primary key is a candidate key that is selected by the database designer to identify tuples uniquely in a
relation. A relation may contain many candidate keys. When the designer selects one of them to identify a
tuple in the relation, it becomes a primary key. It means that if there is only one candidate key, it will be
automatically selected as primary key.
Some most important points about a primary key are.
1. A relation can have only one primary key.
2. Each value in primary key attribute must be unique.
3. Primary key can not contain null values.
Suppose a relation STUDENT contains different attributes such as RegNo, Name and Class. The attribute
RegNo uniquely identifies each student in a table. It can be used as primary key for this table. The attribute
Name can not uniquely identify each row because two students can have same names. It can not be used
as a primary key.
Alternate Key
The candidate keys that are not selected as primary key are known as alternate keys. Suppose STUDENT
relation contains different attributes such as RegNo, RollNo, Name and Class. The attributes RegNo and
RollNo can be used to identify each student in the table. If Regno is selected as primary key then RollNo
attribute is known as alternate key.

Composite Key
A primary key that consists of two or more attributes is known as composite key.
Foreign Key
A foreign key is an attribute or set of attributes in a relation whose values match a primary key in another
relation. The relation in which foreign key is created is known as Dependent Table or Child Table. The
relation to which the foreign key refers is known as Parent Table. The key connects to another relation
when a relationship is established between two relations. A relation may contain more than one foreign
keys.

Relationship
The association among entities is called a relationship. For example, an employee works_at a department,
a student enrolls in a course. Here, Works_at and Enrolls are called relationships.
Relationship Set
A set of relationships of similar type is called a relationship set. Like entities, a relationship too can have
attributes. These attributes are called descriptive attributes.
Degree of Relationship
The number of participating entities in a relationship defines the degree of the relationship.

Binary = degree 2
Ternary = degree 3
n-ary = degree

Mapping Cardinalities

Cardinality defines the number of entities in one entity set, which can be associated with the number of entities of
other set via relationship set.

One-to-one One entity from entity set A can be


associated with at most one entity of entity set B and vice
versa.

One-to-many One entity from entity set A can be


associated with more than one entities of entity set B
however an entity from entity set B, can be associated
with at most one entity.

Many-to-one More than one entities from entity set A


can be associated with at most one entity of entity set B,
however an entity from entity set B can be associated
with more than one entity from entity set A.
ER Diagram Representation

Many-to-many One entity from A can be associated


with more than one entity from B and vice versa.

Let us now learn how the ER Model is represented by means of an ER diagram. Any object, for example,
entities, attributes of an entity, relationship sets, and attributes of relationship sets, can be represented with
the help of an ER diagram.
Entity
Entities are represented by means of rectangles. Rectangles are named with the entity set they represent.

Attributes:- Attributes are the properties of entities.


Attributes are represented by means of ellipses. Every
ellipse represents one attribute and is directly connected
to its entity (rectangle).

If the attributes are composite, they are further


divided in a tree like structure. Every node is
then connected to its attribute. That is,
composite attributes are represented by ellipses
that are connected with an ellipse.

Multivalued attributes are depicted by double ellipse.

Derived attributes are depicted by dashed


ellipse.

Relationship
Relationships are represented by diamond-shaped box. Name of the relationship is written inside the
diamond-box. All the entities (rectangles) participating in a relationship, are connected to it by a line.
Binary Relationship and Cardinality
A relationship where two entities are participating is called a binary relationship. Cardinality is the number
of instance of an entity from a relation that can be associated with the relation.

One-to-one When only one instance of an entity is


associated with the relationship, it is marked as '1:1'. The
following image reflects that only one instance of each
entity should be associated with the relationship. It depicts
one-to-one relationship.

One-to-many When more than one instance of an


entity is associated with a relationship, it is marked
as '1:N'. The following image reflects that only one
instance of entity on the left and more than one
instance of an entity on the right can be associated
with the relationship. It depicts one-to-many
relationship

Many-to-one When more than one instance of entity is


associated with the relationship, it is marked as 'N:1'. The
following image reflects that more than one instance of an
entity on the left and only one instance of an entity on the
right can be associated with the relationship. It depicts
many-to-one relationship.

Many-to-many The following image reflects that


more than one instance of an entity on the left and
more than one instance of an entity on the right can
be associated with the relationship. It depicts manyto-many relationship.

Participation Constraints

Total Participation Each entity is involved in the relationship. Total participation is


represented by double lines.

Partial participation Not all entities are involved in the relationship. Partial participation is
represented by single lines.

Disadvantages of DBMS
The disadvantages of the database approach are summarized as follows:
1. Complexity : The provision of the functionality that is expected of a good DBMS makes the DBMS an extremely
complex piece of software. Failure to understand the system can lead to bad design decisions, which can have serious
consequences for an organization.
2. Size : The complexity and breadth of functionality makes the DBMS an extremely large piece of software,
occupying many megabytes of disk space and requiring substantial amounts of memory to run efficiently.
3. Performance: Typically, a File Based system is written for a specific application, such as invoicing. As result,
performance is generally very good. However, the DBMS is written to be more general, to cater for many applications
rather than just one. The effect is that some applications may not run as fast as they used to.
4. Higher impact of a failure: The centralization of resources increases the vulnerability of the system. Since all users
and applications rely on the availabi1ity of the DBMS, the failure of any component can bring operations to a halt.
5. Cost of DBMS: The cost of DBMS varies significantly, depending on the environment and functionality provided.
There is also the recurrent annual maintenance cost.
6. Additional Hardware costs: The disk storage requirements for the DBMS and the database may necessitate the
purchase of additional storage space. Furthermore, to achieve the required performance it may be necessary to
purchase a larger machine, perhaps even a machine dedicated to running the DBMS. The procurement of additional
hardware results in further expenditure.
Database Model

A database model is a type of data model that determines the logical structure of a database and fundamentally
determines in which manner data can be stored, organized, and manipulated. The most popular example of a database
model is the relational model, which uses a table-based format.
Common logical data models for databases include:
Hierarchical database model

Network model

Relational model

Entityrelationship model

Hierarchical database Model


A hierarchical database model is a data model in which the data is organized into a tree-like structure. The data is
stored as records which are connected to one another through links. A record is a collection of fields, with each field
containing only one value. The entity type of a record defines which fields the record contains.

The hierarchical database model mandates that each child record


has only one parent, whereas each parent record can have one or
more child records. In order to retrieve data from a hierarchical
database the whole tree needs to be traversed starting from the
root node.

Network model

The network model is a database model conceived as a flexible


way of representing objects and their relationships. While the
hierarchical database model structures data as a tree of records,
with each record having one parent record and many children, the
network model allows each record to have multiple parent and
child records, forming a generalized graph structure.

Relational Model

The most recent and popular model of database design is the relational database
model. This model was developed to
overcome the problems of complexity and
inflexibility of the earlier two models in
handling databases with many-to-many
relationships between entities.
These models are not only simple but also
powerful. In the relational database, each
file is perceived as a flat file (a two
dimensional table) consisting of many lines
(records), each record having key and nonkey data item(s). The key item(s) is the data
element(s) that identifies the record.
Figure shows the files, and the fields that each record shall have in a customer invoicing system.
In these files, the key data items are customer id, invoice no, and product code. Each of the files can be used separately
to generate reports. However, data can also be obtained from any combination of files as all these files are related to
each other with the help of key data items specified above.

Generalization- Aggregation- Specialization


The ER Model has the power of expressing database entities in a conceptual hierarchical manner. As the hierarchy
goes up, it generalizes the view of entities, and as we go deep in the hierarchy, it gives us the detail of every entity
included.
Going up in this structure is called generalization, where entities are clubbed together to represent a more generalized
view. For example, a particular student named Mira can be generalized along with all the students. The entity shall be
a student, and further, the student is a person. The reverse is called specialization where a person is a student, and that
student is Mira.

Generalization
As mentioned above, the process of generalizing entities,
where the generalized entities contain the properties of all
the generalized entities, is called generalization. In
generalization, a number of entities are brought together
into one generalized entity based on their similar
characteristics. For example, pigeon, house sparrow, crow
and dove can all be generalized as Birds.

Specialization
Specialization is the opposite of generalization. In
specialization, a group of entities is divided into sub-groups
based on their characteristics. Take a group Person for
example. A person has name, date of birth, gender, etc.
These properties are common in all persons, human beings.
But in a company, persons can be identified as employee,
employer, customer, or vendor, based on what role they play
in the company.
Similarly, in a school database, persons can be
specialized as teacher, student, or a staff,
based on what role they play in school as
entities.
Aggregration
Aggregration is a process when relation between two entity
is treated as a single entity. Here the relation between
Center and Course, is acting as an Entity in relation with
Visitor.

Codd's 12 Rules
Dr Edgar F. Codd, after his extensive research on the Relational Model of database systems, came up with twelve
rules of his own, which according to him, a database must obey in order to be regarded as a true relational database.
These rules can be applied on any database system that manages stored data using only its relational capabilities.
This is a foundation rule, which acts as a base for all the other rules.
Rule 1: Information Rule
The data stored in a database, may it be user data or metadata, must be a value of some table cell. Everything in a
database must be stored in a table format.
Rule 2: Guaranteed Access Rule

Every single data element (value) is guaranteed to be accessible logically with a combination of table-name,
primary-key (row value), and attribute-name (column value).
Rule 3: Systematic Treatment of NULL Values
The NULL values in a database must be given a systematic and uniform treatment. This is a very important rule
because a NULL can be interpreted as one the following data is missing, data is not known, or data is not
applicable.
Rule 4: Active Online Catalog
The structure description of the entire database must be stored in an online catalog, known as data dictionary,
which can be accessed by authorized users. Users can use the same query language to access the catalog which they
use to access the database itself.
Rule 5: Comprehensive Data Sub-Language Rule
A database can only be accessed using a language having linear syntax that supports data definition, data
manipulation, and transaction management operations. This language can be used directly or by means of some
application. If the database allows access to data without any help of this language, then it is considered as a
violation.
Rule 6: View Updating Rule
All the views of a database, which can theoretically be updated, must also be updatable by the system.
Rule 7: High-Level Insert, Update, and Delete Rule
A database must support high-level insertion, updation, and deletion. This must not be limited to a single row, that is,
it must also support union, intersection and minus operations to yield sets of data records.
Rule 8: Physical Data Independence
The data stored in a database must be independent of the applications that access the database. Any change in the
physical structure of a database must not have any impact on how the data is being accessed by external applications.
Rule 9: Logical Data Independence
The logical data in a database must be independent of its users view (application). Any change in logical data must
not affect the applications using it. For example, if two tables are merged or one is split into two different tables,
there should be no impact or change on the user application. This is one of the most difficult rule to apply.
Rule 10: Integrity Independence
A database must be independent of the application that uses it. All its integrity constraints can be independently
modified without the need of any change in the application. This rule makes a database independent of the front-end
application and its interface.
Rule 11: Distribution Independence

The end-user must not be able to see that the data is distributed over various locations. Users should always get the
impression that the data is located at one site only. This rule has been regarded as the foundation of distributed
database systems.
Rule 12: Non-Subversion Rule
If a system has an interface that provides access to low-level records, then the interface must not be able to subvert
the system and bypass security and integrity constraints
Relational data model
Relational data model is the primary data model, which is used widely around the world for data storage and
processing. This model is simple and it has all the properties and capabilities required to process data with storage
efficiency.
Concepts
Tables In relational data model, relations are saved in the format of Tables. This format stores the relation among
entities. A table has rows and columns, where rows represents records and columns represent the attributes.
Tuple A single row of a table, which contains a single record for that relation is called a tuple.
Relation instance A finite set of tuples in the relational database system represents relation instance. Relation
instances do not have duplicate tuples.
Relation schema A relation schema describes the relation name (table name), attributes, and their names.
Relation key Each row has one or more attributes, known as relation key, which can identify the row in the
relation (table) uniquely.
Attribute domain Every attribute has some pre-defined value scope, known as attribute domain.
Constraints
Every relation has some conditions that must hold for it to be a valid relation. These conditions are called
Relational Integrity Constraints. There are three main integrity constraints
Key constraints
Key Constraints

Domain constraints

Referential integrity constraints

There must be at least one minimal subset of attributes in the relation, which can identify a tuple uniquely. This
minimal subset of attributes is called key for that relation. If there are more than one such minimal subsets, these
are called candidate keys.
Key constraints force that

in a relation with a key attribute, no two tuples can have identical values for key attributes.

a key attribute can not have NULL values.

Key constraints are also referred to as Entity Constraints.


Domain Constraints
Attributes have specific values in real-world scenario. For example, age can only be a positive integer. The same
constraints have been tried to employ on the attributes of a relation. Every attribute is bound to have a specific
range of values. For example, age cannot be less than zero and telephone numbers cannot contain a digit outside 09.
Referential integrity Constraints
Referential integrity constraints work on the concept of Foreign Keys. A foreign key is a key attribute of a relation
that can be referred in other relation.
Referential integrity constraint states that if a relation refers to a key attribute of a different or same relation, then
that key element must exist.

ER Model to Relational Model


ER Model, when conceptualized into diagrams, gives a good overview of entity-relationship, which is
easier to understand. ER diagrams can be mapped to relational schema, that is, it is possible to create
relational schema using ER diagram. We cannot import all the ER constraints into relational model, but an
approximate schema can be generated.
There are several processes and algorithms available to convert ER Diagrams into Relational Schema.
Some of them are automated and some of them are manual. We may focus here on the mapping diagram
contents to relational basics.
ER diagrams mainly comprise of
Entity and its attributes
Relationship, which is association among entities.
Mapping Entity
An entity is a real-world object with some attributes.
Mapping Process (Algorithm)
Create table for each entity.
Entity's attributes should become fields of tables with their
respective data types.
Declare primary key.

Mapping Relationship
A relationship is an association among
entities.

Mapping Process
Create table for a relationship.
Add the primary keys of all participating Entities as fields of table with their respective data
types.
If relationship has any attribute, add each attribute as field of table.
Declare a primary key composing all the primary keys of participating entities.
Declare all foreign key constraints.
Mapping Weak Entity Sets
A weak entity set is one which does not have
any primary key associated with it.
Mapping Process
Create table for weak entity set.
Add all its attributes to table as field.
Add the primary key of identifying entity set.
Declare all foreign key constraints.
SQL
SQL stands for Structured Query Language. SQL is a programming language for Relational Databases. It
is designed over relational algebra and tuple relational calculus. SQL comes as a package with all major
distributions of RDBMS.
SQL comprises both data definition and data manipulation languages. Using the data definition properties
of SQL, one can design and modify database schema, whereas data manipulation properties allows SQL
to store and retrieve data from database.
SQL commands can be divided into three subgroups, DDL, DML and DCL
DDL
DDL is short name of Data Definition Language, which deals with database schemas and descriptions, of
how the data should reside in the database.
CREATE to create database and its objects like (table, index)
ALTER alters the structure of the existing database
DROP delete objects from the database
TRUNCATE remove all records from a table, including all spaces allocated for the records are removed
COMMENT add comments to the data dictionary

RENAME rename an object


DML
DML is short name of Data Manipulation Language which deals with data manipulation, and includes
most common SQL statements such SELECT, INSERT, UPDATE, DELETE etc, and it is used to store,
modify, retrieve, delete and update data in database.
SELECT retrieve data from the a database
INSERT insert data into a table
UPDATE updates existing data within a table
DELETE Delete all records from a database table
DCL
DCL is short name of Data Control Language which includes commands such as GRANT, and mostly
concerned with rights, permissions and other controls of the database system.
GRANT allow users access privileges to database
REVOKE withdraw users access privileges given by using the GRANT command
The SQL CREATE DATABASE Statement
The CREATE DATABASE statement is used to create a database.
SQL CREATE DATABASE Syntax
CREATE DATABASE dbname;
SQL CREATE DATABASE Example
The following SQL statement creates a database called "my_db":
CREATE DATABASE my_db;
The SQL CREATE TABLE Statement
The CREATE TABLE statement is used to create a table in a database.
Tables are organized into rows and columns; and each table must have a name.
SQL CREATE TABLE Syntax
CREATE TABLE table_name
(
column_name1 data_type(size),
column_name2 data_type(size),
....
);
The column_name parameters specify the names of the columns of the table.
The data_type parameter specifies what type of data the column can hold (e.g. varchar, integer, decimal,
date, etc.).
The size parameter specifies the maximum length of the column of the table.
SQL CREATE TABLE Example

Now we want to create a table called "Persons" that contains five columns: PersonID, LastName,
FirstName, Address, and City. We use the following CREATE TABLE statement:
Example

CREATE TABLE Persons


(
PersonID int,
LastName varchar(255),
FirstName varchar(255),
Address varchar(255),
City varchar(255)
);

The empty "Persons" table will now look like this:

The DROP TABLE Statement


The DROP TABLE statement is used to delete a table.
DROP TABLE table_name
The DROP DATABASE Statement
The DROP DATABASE statement is used to delete a database.
DROP DATABASE database_name
The TRUNCATE TABLE Statement
What if we only want to delete the data inside the table, and not the table itself?
Then, use the TRUNCATE TABLE statement:
TRUNCATE TABLE table_name

Database Tables
A database most often contains one or more tables. Each table is identified by a name (e.g. "Customers" or
"Orders"). Tables contain records (rows) with data.
Below is a selection from the "Customers" table:

The table above contains five records (one for each customer) and seven columns (CustomerID,
CustomerName, ContactName, Address, City, PostalCode, and Country).
SELECT STATEMENT
The SELECT statement is used to select data from a database.
The SQL SELECT Statement
The SELECT statement is used to select data from a database.
The result is stored in a result table, called the result-set.
SQL SELECT Syntax
SELECT column_name,column_name
FROM table_name;
and
SELECT * FROM table_name;
SELECT Column Example
The following SQL statement selects the "CustomerName" and "City" columns from the "Customers" table:
Example
SELECT CustomerName,City
FROM Customers;

SELECT * Example
The following SQL statement selects all the columns from the "Customers" table:
Example
SELECT * FROM Customers;

SQL WHERE Clause


The WHERE clause is used to extract only those records that fulfill a specified criterion.
SQL WHERE Syntax
SELECT column_name,column_name
FROM table_name
WHERE column_name operator value;
WHERE Clause Example

The following SQL statement selects all the


customers from the country "Mexico", in
the "Customers" table:
Example
SELECT * FROM Customers
WHERE Country='Mexico';
Text Fields vs. Numeric Fields
SQL requires single quotes around text values (most database systems will also allow double quotes). However,
numeric fields should not be enclosed in quotes:
Example
SELECT * FROM Customers
WHERE CustomerID=1;

Operators in The WHERE Clause


The following operators can be used in the WHERE clause:

Operator

Description

Operator Description

Equal

<>

Not equal.

>

Greater than

<

Less Than

>=

Greater than or equal

<=

Less than or equal

BETWEE
N

Between an inclusive range

LIKE

Between an inclusive range

IN

To specify multiple possible values for a column

Note: In some versions of SQL this


operator may be written as !=

The SQL AND & OR Operators


The AND operator displays a record if both the first condition AND the second condition are true.
The OR operator displays a record if either the first condition OR the second condition is true.

AND Operator Example


The following SQL statement selects all customers from the country "Germany" AND the city "Berlin", in the
"Customers" table:
Example
SELECT * FROM Customers
WHERE Country='Germany'
AND City='Berlin';
OR Operator Example
The following SQL statement selects all customers from the city "Berlin" OR "Mnchen", in the "Customers" table:
Example
SELECT * FROM
Customers
WHERE City='Berlin'
OR City='Mnchen';
Combining AND & OR
You can also combine AND and OR (use parenthesis to form complex expressions).
The following SQL statement selects all customers from the country "Germany" AND the city must be equal to "Berlin"
OR "Mnchen", in the "Customers" table:
Example
SELECT * FROM
Customers
WHERE
Country='Germany'
AND (City='Berlin' OR
City='Mnchen');
SQL ORDER BY Keyword
The ORDER BY keyword is used to sort the result-set by one or more columns.
The ORDER BY keyword sorts the records in ascending order by default. To sort the records in a descending order,
you can use the DESC keyword.
SQL ORDER BY Syntax
SELECT column_name, column_name
FROM table_name
ORDER BY column_name ASC|DESC, column_name ASC|DESC;
ORDER BY Example
The following SQL statement selects
all customers from the "Customers"
table, sorted by the "Country"
column:
Example
SELECT * FROM Customers

ORDER BY Country;
ORDER
Example

BY

Several

Columns

The following SQL statement selects


all customers from the "Customers"
table, sorted by the "Country" and
the "CustomerName" column:
Example
SELECT * FROM Customers
ORDER BY Country,
CustomerName;

The SQL INSERT INTO Statement


The INSERT INTO statement is used to insert new records in a table.
SQL INSERT INTO Syntax
It is possible to write the INSERT INTO statement in two forms.
The second form specifies both the column names and
The first form does not specify the column names where the the values to be inserted:
data will be inserted, only their values:
INSERT INTO table_name
INSERT INTO table_name
(column1,column2,column3,...)
VALUES (value1,value2,value3,...);
VALUES (value1,value2,value3,...);

INSERT INTO Example


Assume we wish to insert a new row in the
"Customers" table.
We can use the following SQL statement:
Example
INSERT INTO Customers (CustomerName,
ContactName, Address, City, PostalCode,
Country)
VALUES ('Cardinal','Tom B. Erichsen','Skagen
21','Stavanger','4006','Norway');
Insert Data Only in Specified Columns

It is also possible to only insert data in specific columns.


The following SQL statement will insert a new row, but only insert data in the "CustomerName", "City", and "Country"
columns (and the CustomerID field will of course also be updated automatically):
Example
INSERT INTO Customers (CustomerName,
City, Country)
VALUES ('Cardinal', 'Stavanger', 'Norway');

The SQL UPDATE Statement


The UPDATE statement is used to update existing records in a table.
SQL UPDATE Syntax
UPDATE table_name
SET column1=value1,column2=value2,...
WHERE some_column=some_value;
SQL UPDATE Example
Assume we wish to update the customer "Alfreds
Futterkiste" with a new contact person and city.
We use the following SQL statement:

UPDATE Customers
SET ContactName='Alfred Schmidt', City='Hamburg'
WHERE CustomerName='Alfreds Futterkiste';

The SQL DELETE Statement


The DELETE statement is used to delete rows in a table.
SQL DELETE Syntax
DELETE FROM table_name
WHERE some_column=some_value;
SQL DELETE Example
Assume we wish to delete the customer "Alfreds
Futterkiste" from the "Customers" table. We use the
following SQL statement:
Example
DELETE FROM Customers
WHERE CustomerName='Alfreds Futterkiste' AND

ContactName='Maria Anders';

Delete All Data


It is possible to delete all rows in a table without deleting the table. This means that the table structure, attributes, and
indexes will be intact:
DELETE FROM table_name;
or
DELETE * FROM table_name;
SQL - Using Joins
The SQL Joins clause is used to combine records from two or more tables in a database. A JOIN is a means for
combining fields from two tables by using values common to each.
Consider the following two tables,
(b) Another table is ORDERS as follows:

(a) CUSTOMERS table is as follows:

Now, let us join these two tables in our SELECT


statement as follows:
SELECT ID, NAME, AGE, AMOUNT
FROM CUSTOMERS, ORDERS
WHERE CUSTOMERS.ID =
ORDERS.CUSTOMER_ID;
This would produce the following result:
Here, it is noticeable that the join is performed in the WHERE clause. Several operators can be used to join tables,
such as =, <, >, <>, <=, >=, !=, BETWEEN, LIKE, and NOT; they can all be used to join tables. However, the most
common operator is the equal symbol.
SQL Join Types:
There are different types of joins available in SQL:

INNER JOIN: returns rows when there is a match in both tables.

LEFT JOIN: returns all rows from the left table, even if there are no matches in the right table.

RIGHT JOIN: returns all rows from the right table, even if there are no matches in the left table.

FULL JOIN: returns rows when there is a match in one of the tables.

SELF JOIN: is used to join a table to itself as if the table were two tables, temporarily renaming at
least one table in the SQL statement.

CARTESIAN JOIN: returns the Cartesian product of the sets of records from the two or more joined
tables.

SQL - INNER JOINS


The most frequently used and important of the joins is the INNER JOIN. They are also referred to as an EQUIJOIN.
The INNER JOIN creates a new result table by combining column values of two tables (table1 and table2) based
upon the join-predicate. The query compares each row of table1 with each row of table2 to find all pairs of rows
which satisfy the join-predicate. When the join-predicate is satisfied, column values for each matched pair of rows
of A and B are combined into a result row.
Syntax:
The basic syntax of INNER JOIN is as follows:
SELECT table1.column1, table2.column2...
FROM table1
INNER JOIN table2
ON table1.common_field = table2.common_field;
Example:
Consider the following two tables,
(a) CUSTOMERS table is as follows:

Now, let us join these two tables using INNER


JOIN as follows:

(b) Another table is ORDERS as follows:

This would produce the following result:

SELECT ID, NAME, AMOUNT, DATE


FROM CUSTOMERS
INNER JOIN ORDERS
ON CUSTOMERS.ID =
ORDERS.CUSTOMER_ID;

SQL - LEFT JOINS


The SQL LEFT JOIN returns all rows from the left table, even if there are no matches in the right table. This means
that if the ON clause matches 0 (zero) records in right table, the join will still return a row in the result, but with
NULL in each column from right table.
This means that a left join returns all the values from the left table, plus matched values from the right table or NULL
in case of no matching join predicate.
Syntax:
The basic syntax of LEFT JOIN is as follows:
SELECT table1.column1, table2.column2...
FROM table1
LEFT JOIN table2
ON table1.common_field = table2.common_field;

Here given condition could be any given expression based on your requirement.
Example:
Consider the following two tables, a) CUSTOMERS table is as follows: & (b) Another table is ORDERS as follows:
(given above)
Now, let us join these two tables using LEFT JOIN as
This would produce the following result:
follows:
SELECT ID, NAME, AMOUNT, DATE
FROM CUSTOMERS
LEFT JOIN ORDERS
ON CUSTOMERS.ID =
ORDERS.CUSTOMER_ID;

SQL - RIGHT JOINS


The SQL RIGHT JOIN returns all rows from the right table, even if there are no matches in the left table. This
means that if the ON clause matches 0 (zero) records in left table, the join will still return a row in the result, but with
NULL in each column from left table.
This means that a right join returns all the values from the right table, plus matched values from the left table or
NULL in case of no matching join predicate.
Syntax:
The basic syntax of RIGHT JOIN is as follows:
SELECT table1.column1, table2.column2...
FROM table1
RIGHT JOIN table2
ON table1.common_field = table2.common_field;
Example:
Consider the following two tables, a) CUSTOMERS table is as follows: & (b) Another table is ORDERS as follows:
(given above)
Now, let us join these two tables using RIGHT JOIN
This would produce the following result:
as follows:
SELECT ID, NAME, AMOUNT, DATE
FROM CUSTOMERS
RIGHT JOIN ORDERS
ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID;

SQL - FULL JOINS


The SQL FULL JOIN combines the results of both left and right outer joins.
The joined table will contain all records from both tables, and fill in NULLs for missing matches on either side.
Syntax:

The basic syntax of FULL JOIN is as follows:


SELECT table1.column1, table2.column2...
FROM table1
FULL JOIN table2
ON table1.common_field = table2.common_field;
Here given condition could be any given expression based on your requirement.
Example:
Consider the following two tables, a) CUSTOMERS table is as follows: (given above)
Now, let us join these two tables using FULL JOIN as
follows:
SELECT ID, NAME, AMOUNT, DATE
FROM CUSTOMERS
FULL JOIN ORDERS
ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID;

This would produce the following result:


SQL - SELF JOINS
The SQL SELF JOIN is used to join a table to itself as if the table were two tables, temporarily renaming at least
one table in the SQL statement.
Syntax:
The basic syntax of SELF JOIN is as follows:
SELECT a.column_name, b.column_name...
FROM table1 a, table1 b
WHERE a.common_field = b.common_field;
Here, WHERE clause could be any given expression based on your requirement.
Example:
Consider the following two tables, a) CUSTOMERS table is as follows: & (b) Another table is ORDERS as follows:
(given above)

Now, let us join this table using SELF JOIN as follows:


SELECT a.ID, b.NAME, a.SALARY
FROM CUSTOMERS a, CUSTOMERS b
WHERE a.SALARY < b.SALARY;

This would produce the following result:

SQL - CARTESIAN or CROSS JOINS


The CARTESIAN JOIN or CROSS JOIN returns the Cartesian product of the sets of records from the two or more
joined tables. Thus, it equates to an inner join where the join-condition always evaluates to True or where the joincondition is absent from the statement.
Syntax:
The basic syntax of CARTESIAN JOIN or CROSS JOIN is as follows:
SELECT table1.column1, table2.column2...
FROM table1, table2 [, table3 ]
Example:
Consider the following two tables, a) CUSTOMERS table is as follows: & (b) Another table is ORDERS as follows:
(given above)
Now, let us join these two tables using INNER JOIN as follows:
SELECT ID, NAME, AMOUNT, DATE
FROM CUSTOMERS, ORDERS;

Normalization is a process of organizing the data in database to avoid data redundancy, insertion anomaly, update
anomaly & deletion anomaly. Lets discuss about anomalies first then we will discuss normal forms with examples.
Anomalies in DBMS
There are three types of anomalies that occur when the database is not normalized. These are Insertion, update and
deletion anomaly. Lets take an example to understand this.

Example: Suppose a manufacturing company stores


the employee details in a table named employee that
has four attributes: emp_id for storing employees id,
emp_name
for
storing
employees name,
emp_address for storing employees address and
emp_dept for storing the department details in which
the employee works. At some point of time the table
looks like this:
The above table is not normalized. We will see the
problems that we face when a table is not
normalized.
Update anomaly: In the above table we have two rows for employee Rick as he belongs to two departments of the
company. If we want to update the address of Rick then we have to update the same in two rows or the data will
become inconsistent. If somehow, the correct address gets updated in one department but not in other then as per the
database, Rick would be having two different addresses, which is not correct and would lead to inconsistent data.
Insert anomaly: Suppose a new employee joins the company, who is under training and currently not assigned to any
department then we would not be able to insert the data into the table if emp_dept field doesnt allow nulls.
Delete anomaly: Suppose, if at a point of time the company closes the department D890 then deleting the rows that
are having emp_dept as D890 would also delete the information of employee Maggie since she is assigned only to
this department.
To overcome these anomalies we need to normalize the data. In the next section we will discuss about normalization.
Normalization
Here are the most commonly used normal forms:

First normal form(1NF)

Second normal form(2NF)

Third normal form(3NF)

Boyce & Codd normal form (BCNF)

First normal form (1NF)


As per the rule of first normal form, an attribute (column) of a table cannot hold multiple values. It should hold only
atomic values.
Example: Suppose a company wants to store
the names and contact details of its employees.
It creates a table that looks like this:

Two employees (Jon & Lester) are having two


mobile numbers so the company stored them in
the same field as you can see in the table
above.

This table is not in 1NF as the rule says each


attribute of a table must have atomic (single)
values, the emp_mobile values for employees
Jon & Lester violates that rule.
To make the table complies with 1NF we
should have the data like this:

Second normal form (2NF)


A table is said to be in 2NF if both the following conditions hold:

Table is in 1NF (First normal form)

No non-prime attribute is dependent on the proper subset of any candidate key of table.

An attribute that is not part of any candidate key is known as non-prime attribute.
Example: Suppose a school wants to store the data of teachers
and the subjects they teach. They create a table that looks like
this: Since a teacher can teach more than one subjects, the
table can have multiple rows for a same teacher.
Candidate Keys: {teacher_id, subject}
Non prime attribute: teacher_age

The table is in 1 NF because each attribute has atomic values. However, it is not in 2NF because non prime attribute
teacher_age is dependent on teacher_id alone which is a proper subset of candidate key. This violates the rule for 2NF
as the rule says no non-prime attribute is dependent on the proper subset of any candidate key of the table.
To make the table complies with 2NF we can break it in two tables like this:
Teacher_details table
Teacher_subject table

Now the tables comply with Second normal form (2NF).


Third Normal form (3NF)
A table design is said to be in 3NF if both the following conditions hold:

Table must be in 2NF

Transitive functional dependency of non-prime attribute on any super key should be removed.

An attribute that is not part of any candidate key is known as non-prime attribute.
In other words 3NF can be explained like this: A table is in 3NF if it is in 2NF and for each functional
dependency X-> Y at least one of the following conditions hold:

X is a super key of table

Y is a prime attribute of table

An attribute that is a part of one of the candidate keys is known as prime attribute.

Example: Suppose a company wants to store


the complete address of each employee, they
create a table named employee_details that
looks like this:

Super keys: {emp_id}, {emp_id, emp_name},


{emp_id, emp_name, emp_zip}so on
Candidate
Keys:
{emp_id}
Non-prime attributes: all attributes except
emp_id are non-prime as they are not part of
any candidate keys.
Here, emp_state, emp_city & emp_district dependent on emp_zip. And, emp_zip is dependent on emp_id that makes
non-prime attributes (emp_state, emp_city & emp_district) transitively dependent on super key (emp_id). This violates
the rule of 3NF.
To make this table complies with 3NF we have to break the table into two tables to remove the transitive dependency:

Boyce Codd normal form (BCNF)


It is an advance version of 3NF thats why it is also referred as 3.5NF. BCNF is stricter than 3NF. A table complies with
BCNF if it is in 3NF and for every functional dependency X->Y, X should be the super key of the table.
Example: Suppose there is a company wherein employees work in more than one department. They store the data like
this:
emp_id emp_nationality

emp_dept

dept_type

dept_no_of_emp

1001

Austrian

Production and planning

D001

200

1001

Austrian

stores

D001

250

1002

American

design and technical support

D134

100

1002
American
Purchasing department
D134
600
Functional dependencies in the table above:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}
Candidate key: {emp_id, emp_dept}
The table is not in BCNF as neither emp_id nor emp_dept alone are keys.
To make the table comply with BCNF we can break the table in three tables like this:
emp_dept_mapping table:
emp_nationality table: emp_dept table:
emp_id emp_nationality

emp_dept dept_type dept_no_of_emp

emp_id

emp_dept

1001

Austrian

1001

Production and planning

1002

American

Production
and
D001
planning

200

1001

stores

stores

D001

250

1002

design and technical support

design and
technical D134
support

1002

Purchasing department

100

Purchasing
D134
department

600

Functional dependencies:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}
Candidate keys:
For first table: emp_id
For second table: emp_dept
For third table: {emp_id, emp_dept}
This is now in BCNF as in both the functional dependencies left side part is a key.

Backup and Recovery


Backups of the database are copies of the database that can be used in the event of an emergency. Without
adequate backups of the database, there can be no recovery of that database.
Backup determines how a copy of the database(s) and/or transaction logs is made and which media are used for
this process. All of these precautions have to be taken to prevent data loss. You can lose data as a result of
different hardware or software errors,
Software and Hardware Failures
The reasons for data loss can be divided into five main groups:
Program errors
Administrator (human) errors
Computer failures (system crash)
Disk failures
Catastrophes(fire,earthquake)or theft
During execution of a program, conditions may arise that abnormally terminate the program. Such program
errors concern only the database application and usually have no impact on the entire database system. As
these errors are based on faulty program logic, the database system cannot recover in such situations. The
recovery should therefore be done by the programmer, who has to handle such exceptions using the COMMIT
and ROLLBACK statements. (Both of these Transact-SQL statements are described in Chapter 14.)
Another source of data loss is human error. Users with sufficient permissions, or the database administrator,
may accidentally lose or corrupt data (people have been known to drop the wrong table, update or delete data
incorrectly, and so on). Of course, we would prefer that this never happen, and we can establish practices that
make it unlikely that production data is compromised in this way, but we have to recognize that people make
mistakes, and data can be affected. The best we can do is try to avoid it, and be prepared to recover when it
happens.
A computer failure specifies different hardware or software errors. A hardware crash is an example of a
system failure. In this case, the contents of the computers main memory may be lost. A disk failure occurs either
when a read/write head of the disk crashes or when the I/O system discovers corrupted disk blocks during I/O
operations.
In the case of catastrophes or theft, the system must keep enough information to recover from the failure. This
is normally done by means of media that offer the needed recovery information on a piece of hardware that has
not been damaged by the failure.
Transaction Log
SQL Server keeps an image of the old contents of the record of each row that has been changed during a
transaction. (A transaction specifies a sequence of Transact-SQL statements that build a logical unit) This is
necessary in case an error occurs later during the execution of the transaction and all executed statements

inside the transaction have to be rolled back. As soon as SQL Server detects such a situation, it uses the stored
records to bring the database back to the consistent state it was in before the transaction was started.
SQL Server keeps all those records in one or more system files, called the transaction log. In particular, this
log contains the before and after values of each changed column during transactions. The transaction log can
then be used to perform automatic recovery or a restore process. After a failure, SQL Server uses stored values
from the transaction log (called before images) to restore all pages on the disk to their previous consistent state.
In the case of a restore, the transaction log is always used together with a database backup copy to recover the
database. The transaction log is generally needed to prevent a loss of all changes that have been executed
since the last database backup.
Backup
SQL Server provides static as well as dynamic backup. (Dynamic backup means that a database backup can be
performed while users are working on data.) In contrast to some other DBMSs, which back up all databases
together, SQL Server does the backup of each database separately. This method increases security when it
comes time to restore each database, because a restoration of each database separately is more secure than
restoring all databases at once.
SQL Server provides four different backup methods:
Full database backup
Differential database backup
Transaction log backup
Database file (or filegroup) backup
Full Database Backup
A full database backup captures the state of the database at the time the backup is started. During the full
database backup, the system copies the data as well as the schema of all tables of the database and the
corresponding file structures. If the full database backup is executed dynamically, SQL Server records any activity
that took place during the backup. Therefore, even all uncommitted transactions in the transaction log are written
to the backup media.
Differential Backup
Using differential backup, only the parts of the database that have changed since the last full database backup
are read and then written to the copy. (As in the full database backup, any activity that took place during the
differential backup is backed up, too.) The advantage of a differential backup is speed. It minimizes the time
required to back up a database, because the amount of data to be backed up is considerably smaller than in the
case of the full backup. (Remember that a full database backup includes a copy of all database pages.)
Transaction Log Backup
A transaction log backup considers only the changes recorded in the log. This form of backup is therefore not
based on physical parts (pages) of the database, but on logical operationsthat is, changes executed using the
DML statements INSERT, UPDATE, and DELETE. Again, because the amount of data is smaller, this process can
be performed significantly quicker than the full database backup and quicker than a differential backup.
There are two main reasons to perform the transaction log backup: first, to store the data that has changed since
the last transaction log backup or database backup on a secure medium; second (and more importantly), to
properly close the transaction log up to the beginning of the active portion of it. (The active portion of the
transaction log contains all uncommitted transactions.)
Using the full database backup and the valid chain of all closed transaction logs, it is possible to propagate a
database copy on a different computer. This database copy can then be used to replace the original database in
case of a failure. (The same scenario can be established using a full database backup and the last differential
backup.)
SQL Server does not allow you to store the transaction log in the same file in which the database is stored. One
reason for this is that if the file is damaged, the use of the transaction log to restore all changes since the last
backup will not be possible.

Using a transaction log to record changes in the database is a common feature used by nearly all existing
relational DBMSs. Nevertheless, situations may arise when it becomes helpful to switch this feature off. For
example, the execution ofa heavy load can last for hours. Such a program runs much faster when the logging is
switched off. On the other hand, switching off the logging process is dangerous, as it destroys the valid chain of
transaction logs. To ensure the database recovery, it is strongly recommended that you perform full database
backup after the successful end of the load.
One of the most common system failures occurs because the transaction log is filled up. Be aware that the use of
a transaction log in itself may cause a complete standstill of the system. If the storage used for the transaction log
fills up to 100 percent, SQL Server must stop all running transactions until the transaction log storage is freed
again. This problem can only be avoided by making frequent backups of the transaction log: each time you close
a portion of the actual transaction log and store it to a different storage media, this portion of the log becomes
reusable, and SQL Server thus regains disk space.
Some differences between transaction log backups and differential backups are worth noting. The benefit of
differential backups is that you save time in the restore process, because to recover a database completely, you
need a full database backup and only the latest differential backup. If you use transaction logs for the same
scenario, you have to apply the full database backup and all existing transaction logs to bring the database to a
consistent state. A disadvantage of differential backups is that you cannot use them to recover data to a specific
point in time because they do not store intermediate changes to the database.
Database File Backup
Database file backup allows you to back up specific database files (or filegroups) instead of the entire database.
In this case, SQL Server backs up only files you specify. Individual files (or filegroups) can be restored from
database backup, allowing recovery from a failure that affects only a small subset of the database files. Individual
files or filegroups can be restored from either a database backup or a filegroup backup. This means that you can
use database and transaction log backups as your backup procedure and still be able to restore individual files (or
filegroups) from the database backup.
Recovery
Whenever a transaction is submitted for execution, SQL Server is responsible either for executing the transaction
completely and recording its changes permanently in the database or for guaranteeing that the transaction has no
effect at all on the database. This approach ensures that the database is consistent in case of a failure, because
failures do not damage the database itself, but instead affect transactions that are in progress at the time of the
failure. SQL Server supports both automatic and manual recovery.
Automatic Recovery
Automatic recovery is a fault-tolerant feature that SQL Server executes every time it is restarted after a failure or
shutdown. The automatic recovery process checks to see if the restoration of databases is necessary. If it is, each
database is returned to its last consistent state using the transaction log.
SQL Server examines the transaction log from the last checkpoint to the point at which the system failed or was
shut down. (A checkpoint is the most recent point at which all data changes are written permanently to the
database from memory. Therefore, a checkpoint ensures the physical consistency of the data.) The transaction
log contains committed transactions (transactions that are successfully executed, but their changes have not yet
been written to the database) and uncommitted transactions (transactions that are not successfully executed
before a shutdown or failure occurred). SQL Server rolls forward all committed transactions, thus making
permanent changes to the database, and undoes the part of the uncommitted transactions that occurred before
the checkpoint.
SQL Server first performs the automatic recovery of the master database, followed by the recovery of all other
system databases. Then, all user-defined databases are recovered.
Manual Recovery

A manual recovery of a database specifies the application of the backup of your database and subsequent
application of all transaction logs in the sequence of their creation. After this, the database is in the same
(consistent) state as it was at the point when the transaction log was backed up for the last time.
When you recover a database using a full database backup, SQL Server first re-creates all database files and
places them in the corresponding physical locations. After that, the system re-creates all database objects.
Recovery Models
Microsoft introduced in SQL Server 2000 the new feature called database recovery model, which allows you to
control to what extent you are ready to risk losing committed transactions if a database is damaged. Additionally,
the choice of a recovery model has an impact on the size of the transaction log and therefore on the time period
needed to back up the log. SQL Server supports three recovery models:
Full
Bulk-logged
Simple
The following sections describe the three recovery models.
Full Recovery
During full recovery, all operations are written to the transaction log. Therefore, this model provides complete
protection against media failure. This means that you can restore your database up to the last committed
transaction that is stored in the log file. Additionally, data can be recovered to any point in time (prior to the point
of failure). To guarantee this, such operations as SELECT INTO and the execution of the bcp utility are fully
logged, too.
Besides point-in-time recovery, the full recovery model allows you also to recover to a log mark. Log marks
correspond to a specific transaction and are inserted only if the transaction commits. (For more information on log
marks, see the corresponding section later in this chapter.)
The full recovery model also logs all operations concerning the CREATE INDEX statement, implying that the
process of data recovery now includes the restoration of index creations. That way, the re-creation of the indices
is faster, because you do not have to rebuild them separately.
The disadvantage of this recovery model is that the corresponding transaction log may be very voluminous and
the files on the disk containing the log will be filled up very quickly. Also, for such a voluminous log you will need
significantly more time for backup.
Bulk-Logged Recovery
Bulk-logged recovery supports log backups by using minimal space in the transaction log for certain large-scale
or bulk operations. The logging of the following operations is minimal and cannot be controlled on an operationby-operation basis:
SELECT INTO
CREATE INDEX (including indexed views)
bcp utility and BULK INSERT
WRITETEXT and UPDATETEXT
Although bulk operations are not fully logged, you do not have to perform a full database backup after the
completion of such an operation. During bulk-logged recovery, transaction log backups contain both the log as
well as the results of a bulk operation. This simplifies the transition between full and bulk-logged recovery models.
The bulk-logged recovery model allows you to recover a database to the end of a transaction log backup (i.e., up
to the last committed transaction). In contrast to the full recovery model, bulk-logged recovery does not support
generally point-in-time recovery. (You can use bulk-logged recovery for the point-in-time recovery if no bulk
operations have been performed.)
The advantage of the bulk-logged recovery model is that bulk operations are performed much faster than under
the full recovery model, because they are not fully logged.
Simple Recovery

In the simple recovery model, the transaction log is not used to protect your database against any media failure.
Therefore, you can recover a damaged database only using full database or differential backup. Backup strategy
for this model is very simple: Restore the database using existing database backups and, if differential backups
exist, apply the most recent one.
The advantages of the simple recovery model are that the performance of all bulk operations is very high and
requirements for the log space very small. On the other hand, this model requires the most manual work because
all changes since the most recent database (or differential) backup must be redone. Point-in-time as well as page
restore are not allowed with this recovery model. Also, file restore is available only for read-only secondary
filegroups.
Recovery with Concurrent Transactions
When more than one transaction are being executed in parallel, the logs are interleaved. At the time of recovery, it
would become hard for the recovery system to backtrack all logs, and then start recovering. To ease this situation,
most modern DBMS use the concept of 'checkpoints'.
Checkpoint
Keeping and maintaining logs in real time and in real environment may fill out all the memory space available in
the system. As time passes, the log file may grow too big to be handled at all. Checkpoint is a mechanism where
all the previous logs are removed from the system and stored permanently in a storage disk. Checkpoint declares
a point before which the DBMS was in consistent state, and all the transactions were committed.
Recovery
When a system with concurrent transactions crashes and recovers, it behaves in the following manner

The recovery system reads the logs


backwards from the end to the last
checkpoint.

It maintains two lists, an undo-list and a


redo-list.

If the recovery system sees a log with <T n,


Start> and <Tn, Commit> or just <T n,
Commit>, it puts the transaction in the
redo-list.

If the recovery system sees a log with <T n, Start> but no commit or abort log found, it puts the
transaction in undo-list.

All the transactions in the undo-list are then undone and their logs are removed. All the transactions in the redolist and their previous logs are removed and then redone before saving their logs.

Das könnte Ihnen auch gefallen