Beruflich Dokumente
Kultur Dokumente
A database is a collection of data that is related to a particular topic or purpose that can be processed to
produce information. Mostly data represents recordable facts. Data aids in producing information, which
is based on facts. For example, if we have data about marks obtained by all students, we can then
conclude about toppers and average marks.
A database management system (DBMS) is a system that stores and retrieves information in a
database. It is used to help you organize your data according to a subject, so that it is easy to track and
verify your data, and you can store information about how different subjects are related, so that it makes
it easy to bring related data together. A database management system stores data in such a way that it
becomes easier to retrieve, manipulate, and produce information.
Characteristics
A modern DBMS has the following characteristics
Real-world entity A modern DBMS is more realistic and uses real-world entities to
design its architecture. It uses the behavior and attributes too. For example, a school
database may use students as an entity and their age as an attribute.
Relation-based tables DBMS allows entities and relations among them to form tables.
A user can understand the architecture of a database just by looking at the table names.
Less redundancy DBMS follows the rules of normalization, which splits a relation when
any of its attributes is having redundancy in values.
Consistency Consistency is a state where every relation in a database remains
consistent. There exist methods and techniques, which can detect attempt of leaving
database in inconsistent state.
Query Language DBMS is equipped with query language, which makes it more
efficient to retrieve and manipulate data. A user can apply as many and as different
filtering options as required to retrieve a set of data. Traditionally it was not possible
where file-processing system was used.
Multiuser and Concurrent Access DBMS supports multi-user environment and allows
them to access and manipulate data in parallel.
ACID PROPERTY
ACID properties are an important concept for databases. The acronym stands for Atomicity, Consistency,
Isolation, and Durability.
The ACID properties of a DBMS allow safe sharing of data. Without these ACID properties, everyday
occurrences such using computer systems to buy products would be difficult and the potential for
inaccuracy would be huge. Imagine more than one person trying to buy the same size and color of a
sweater at the same time -- a regular occurrence. The ACID properties make it possible for the merchant
to keep these sweater purchasing transactions from overlapping each other -- saving the merchant from
erroneous inventory and account balances.
A transaction is a very small unit of a program and it may contain several lowlevel tasks. A transaction in
a database system must maintain Atomicity, Consistency, Isolation, and Durability commonly known as
ACID properties in order to ensure accuracy, completeness, and data integrity.
Atomicity This property states that a transaction must be treated as an atomic unit, that
is, either all of its operations are executed or none. There must be no state in a database
where a transaction is left partially completed. States should be defined either before the
execution of the transaction or after the execution/abortion/failure of the transaction.
Consistency The database must remain in a consistent state after any transaction. No
transaction should have any adverse effect on the data residing in the database. If the
database was in a consistent state before the execution of a transaction, it must remain
consistent after the execution of the transaction as well.
Isolation In a database system where more than one transaction are being executed
simultaneously and in parallel, the property of isolation states that all the transactions will
be carried out and executed as if it is the only transaction in the system. No transaction
will affect the existence of any other transaction.
Durability The database should be durable enough to hold all its latest updates even if
the system fails or restarts. If a transaction updates a chunk of data in a database and
commits, then the database will hold the modified data. If a transaction commits but the
system fails before the data could be written on to the disk, then that data will be updated
once the system springs back into action.
Advantages of DBMS
The database management system has a number of advantages as compared to traditional computer
file-based processing approach. The DBA must keep in mind these benefits or capabilities during
databases and monitoring the DBMS.
The Main advantages of DBMS are described below.
Controlling Data Redundancy
In non-database systems each application program has its own private files. In this case, the duplicated
copies of the same data is created in many places. In DBMS, all data of an organization is integrated into
a single database file. The data is recorded in only one place in the database and it is not duplicated.
Sharing of Data
In DBMS, data can be shared by authorized users of the organization. The database administrator
manages the data and gives rights to users to access the data. Many users can be authorized to access
the same piece of information simultaneously. The remote users can also share same data. Similarly, the
data of same database can be shared between different application programs.
Data Consistency
By controlling the data redundancy, the data consistency is obtained. If a data item appears only once,
any update to its value has to be performed only once and the updated value is immediately available to
all users. If the DBMS has controlled redundancy, the database system enforces consistency.
Integration of Data
In Database management system, data in database is stored in tables. A single database contains
multiple tables and relationships can be created between tables (or associated data entities). This makes
easy to retrieve and update data.
Data Security
Form is very important object of DBMS. You can create forms very easily and quickly in DBMS. Once a
form is created, it can be used many times and it can be modified very easily. The created forms are also
saved along with database and behave like a software component. A form provides very easy way (userfriendly) to enter data into database, edit data and display data from database. The non-technical users
can also perform various operations on database through forms without going into technical details of a
database.
Users
A typical DBMS has users with different rights and permissions who use it for different purposes. Some
users retrieve data and some back it up. The users of a DBMS can be broadly categorized as follows
Designers Designers are the group of people who actually work on the designing part
of the database. They keep a close watch on what data should be kept and in what
format. They identify and design the whole set of entities, relations, constraints, and
views.
End Users End users are those who actually reap the benefits of having a DBMS. End
users can range from simple viewers who pay attention to the logs or market rates to
sophisticated users such as business analysts.
DBMS ARCHITECTURE
The design of a DBMS depends on its architecture. It can be centralized or decentralized or hierarchical.
The architecture of a DBMS can be seen as either single tier or multi-tier. An n-tier architecture divides
the whole system into related but independent n modules, which can be independently modified, altered,
changed, or replaced.
In 1-tier architecture, the DBMS is the only entity where the user directly sits on the DBMS and uses it.
Any changes done here will directly be done on the DBMS itself. It does not provide handy tools for endusers. Database designers and programmers normally prefer to use single-tier architecture.
If the architecture of DBMS is 2-tier, then it must have an application through which the DBMS can be
accessed. Programmers use 2-tier architecture where they access the DBMS by means of an
application. Here the application tier is entirely independent of the database in terms of operation, design,
and programming.
3-tier Architecture
A 3-tier architecture separates its tiers from each other based on the complexity of the users and how
they use the data present in the database. It is the most widely used architecture to design a DBMS.
User (Presentation) Tier End-users operate on this tier and they know nothing about
any existence of the database beyond this layer. At this layer, multiple views of the
database can be provided by the application. All views are generated by applications that
reside in the application tier.
Multiple-tier database architecture is highly modifiable, as almost all its components are independent and
can be changed independently.
There are five major components in the database system environment and their interrelationships are.
Hardware
Software
Data
Users
Procedures
5. Procedures: Procedures refer to the instructions and rules that govern the design and use of the
database. The users of the system and the staff that manage the database require documented procedures
on how to use or run the system.
These may consist of instructions on how to:
Log on to the DBMS.
Use a particular DBMS facility or application program.
A database schema is the skeleton structure that represents the logical view of the entire database. It defines how the
data is organized and how the relations among them are associated. It formulates all the constraints that are to be
applied on the data.
A database schema defines its entities and the relationship among them. It contains a descriptive detail of the database,
which can be depicted by means of schema diagrams. Its the database designers who design the schema to help
programmers understand the database and make it useful.
Database Instance
It is important that we distinguish these two terms individually. Database schema is the skeleton of database. It is
designed when the database doesn't exist at all. Once the database is operational, it is very difficult to make any
changes to it. A database schema does not contain any data or information.
A database instance is a state of operational database with data at any given time. It contains a snapshot of the
database. Database instances tend to change with time. A DBMS ensures that its every instance (state) is in a valid
state, by diligently following all the validations, constraints, and conditions that the database designers have imposed.
ER Model
The ER model defines the conceptual view of a database. It works around real-world entities and the
associations among them. At view level, the ER model is considered a good option for designing
databases.
Entity
An entity can be a real-world object, either animate or inanimate, that can be easily identifiable. For
example, in a school database, students, teachers, classes, and courses offered can be considered as
entities. All these entities have some attributes or properties that give them their identity.
An entity set is a collection of similar types of entities. An entity set may contain entities with attribute
sharing similar values. For example, a Students set may contain all the students of a school; likewise a
Teachers set may contain all the teachers of a school from all faculties.
Attributes
Entities are represented by means of their properties, called attributes. All attributes have values. For
example, a student entity may have name, class, and age as attributes.
There exists a domain or range of values that can be assigned to attributes. For example, a student's name
cannot be a numeric value. It has to be alphabetic. A student's age cannot be negative, etc.
Types of Attributes
Simple attribute Simple attributes are atomic values, which cannot be divided further. For
example, a student's phone number is an atomic value of 10 digits.
Composite attribute Composite attributes are made of more than one simple attribute.
For example, a student's complete name may have first_name and last_name.
Derived attribute Derived attributes are the attributes that do not exist in the physical
database, but their values are derived from other attributes present in the database. For
example, average_salary in a department should not be saved directly in the database,
instead it can be derived. For another example, age can be derived from data_of_birth.
Multi-value attribute Multi-value attributes may contain more than one values. For
example, a person can have more than one phone number, email_address, etc.
A key is an attribute or a set of attributes in a relation that identifies a tuple in a relation. The keys are
defined in a table to access or sequence the stored data quickly and smoothly. They are also used to create
relationship between different tables.
Types of Keys
Following are the different types of keys.
1. Super key.
3. Primary Key.
5. Composite key.
Super Key
2. Candidate key.
4. Alternate Key.
6. Foreign key.
A super key is an attribute or combination of attributes in a relation that identifies a tuple uniquely within the
relation. A super key is the most general type of key. For example, in a relation STUDENT consists of
different attributes like RegistrationNo, Name, FatherName, Class and Address. The only attribute that can
uniquely identify a Tuple in a relation is RegistrationNo. The Name attribute cannot identify a tuple because
two or more students may have the same Name. Similarly FaththeName, Class and Address can not be
used to identify a tuple. It means that RegistrationNo is the super key for the relation. Any combination of
attributes with the super key is also a super key. It means any attribute or set of attributes combined with
the super key Registrationno will also become a super key. A combination of two attributes {RegistrationNo,
Name} is also a super key. This combination can also be used to identify a tuple in a relation. Similarly
{RegistrationNo, Class} or {RegistrationNo, Name, Class} are super keys.
Candidate Key
A candidate key is a super key that contains no extra attribute. It consists of minimum possible attributes. A
super key like {RegistrationNo, Name} contains an extra field Name. It can be used to identify a tuple
uniquely in the relation, But it does not consist of minimum possible attribute as only RegistrationNo can be
used to identify a tuple in a relation. It means that {RegistrationNo, Name} is a super key but it is not a
candidate key because it contains an extra field. On the other hand, RegistrationNo is a super key as well
as candidate key.
Primary Key
A primary key is a candidate key that is selected by the database designer to identify tuples uniquely in a
relation. A relation may contain many candidate keys. When the designer selects one of them to identify a
tuple in the relation, it becomes a primary key. It means that if there is only one candidate key, it will be
automatically selected as primary key.
Some most important points about a primary key are.
1. A relation can have only one primary key.
2. Each value in primary key attribute must be unique.
3. Primary key can not contain null values.
Suppose a relation STUDENT contains different attributes such as RegNo, Name and Class. The attribute
RegNo uniquely identifies each student in a table. It can be used as primary key for this table. The attribute
Name can not uniquely identify each row because two students can have same names. It can not be used
as a primary key.
Alternate Key
The candidate keys that are not selected as primary key are known as alternate keys. Suppose STUDENT
relation contains different attributes such as RegNo, RollNo, Name and Class. The attributes RegNo and
RollNo can be used to identify each student in the table. If Regno is selected as primary key then RollNo
attribute is known as alternate key.
Composite Key
A primary key that consists of two or more attributes is known as composite key.
Foreign Key
A foreign key is an attribute or set of attributes in a relation whose values match a primary key in another
relation. The relation in which foreign key is created is known as Dependent Table or Child Table. The
relation to which the foreign key refers is known as Parent Table. The key connects to another relation
when a relationship is established between two relations. A relation may contain more than one foreign
keys.
Relationship
The association among entities is called a relationship. For example, an employee works_at a department,
a student enrolls in a course. Here, Works_at and Enrolls are called relationships.
Relationship Set
A set of relationships of similar type is called a relationship set. Like entities, a relationship too can have
attributes. These attributes are called descriptive attributes.
Degree of Relationship
The number of participating entities in a relationship defines the degree of the relationship.
Binary = degree 2
Ternary = degree 3
n-ary = degree
Mapping Cardinalities
Cardinality defines the number of entities in one entity set, which can be associated with the number of entities of
other set via relationship set.
Let us now learn how the ER Model is represented by means of an ER diagram. Any object, for example,
entities, attributes of an entity, relationship sets, and attributes of relationship sets, can be represented with
the help of an ER diagram.
Entity
Entities are represented by means of rectangles. Rectangles are named with the entity set they represent.
Relationship
Relationships are represented by diamond-shaped box. Name of the relationship is written inside the
diamond-box. All the entities (rectangles) participating in a relationship, are connected to it by a line.
Binary Relationship and Cardinality
A relationship where two entities are participating is called a binary relationship. Cardinality is the number
of instance of an entity from a relation that can be associated with the relation.
Participation Constraints
Partial participation Not all entities are involved in the relationship. Partial participation is
represented by single lines.
Disadvantages of DBMS
The disadvantages of the database approach are summarized as follows:
1. Complexity : The provision of the functionality that is expected of a good DBMS makes the DBMS an extremely
complex piece of software. Failure to understand the system can lead to bad design decisions, which can have serious
consequences for an organization.
2. Size : The complexity and breadth of functionality makes the DBMS an extremely large piece of software,
occupying many megabytes of disk space and requiring substantial amounts of memory to run efficiently.
3. Performance: Typically, a File Based system is written for a specific application, such as invoicing. As result,
performance is generally very good. However, the DBMS is written to be more general, to cater for many applications
rather than just one. The effect is that some applications may not run as fast as they used to.
4. Higher impact of a failure: The centralization of resources increases the vulnerability of the system. Since all users
and applications rely on the availabi1ity of the DBMS, the failure of any component can bring operations to a halt.
5. Cost of DBMS: The cost of DBMS varies significantly, depending on the environment and functionality provided.
There is also the recurrent annual maintenance cost.
6. Additional Hardware costs: The disk storage requirements for the DBMS and the database may necessitate the
purchase of additional storage space. Furthermore, to achieve the required performance it may be necessary to
purchase a larger machine, perhaps even a machine dedicated to running the DBMS. The procurement of additional
hardware results in further expenditure.
Database Model
A database model is a type of data model that determines the logical structure of a database and fundamentally
determines in which manner data can be stored, organized, and manipulated. The most popular example of a database
model is the relational model, which uses a table-based format.
Common logical data models for databases include:
Hierarchical database model
Network model
Relational model
Entityrelationship model
Network model
Relational Model
The most recent and popular model of database design is the relational database
model. This model was developed to
overcome the problems of complexity and
inflexibility of the earlier two models in
handling databases with many-to-many
relationships between entities.
These models are not only simple but also
powerful. In the relational database, each
file is perceived as a flat file (a two
dimensional table) consisting of many lines
(records), each record having key and nonkey data item(s). The key item(s) is the data
element(s) that identifies the record.
Figure shows the files, and the fields that each record shall have in a customer invoicing system.
In these files, the key data items are customer id, invoice no, and product code. Each of the files can be used separately
to generate reports. However, data can also be obtained from any combination of files as all these files are related to
each other with the help of key data items specified above.
Generalization
As mentioned above, the process of generalizing entities,
where the generalized entities contain the properties of all
the generalized entities, is called generalization. In
generalization, a number of entities are brought together
into one generalized entity based on their similar
characteristics. For example, pigeon, house sparrow, crow
and dove can all be generalized as Birds.
Specialization
Specialization is the opposite of generalization. In
specialization, a group of entities is divided into sub-groups
based on their characteristics. Take a group Person for
example. A person has name, date of birth, gender, etc.
These properties are common in all persons, human beings.
But in a company, persons can be identified as employee,
employer, customer, or vendor, based on what role they play
in the company.
Similarly, in a school database, persons can be
specialized as teacher, student, or a staff,
based on what role they play in school as
entities.
Aggregration
Aggregration is a process when relation between two entity
is treated as a single entity. Here the relation between
Center and Course, is acting as an Entity in relation with
Visitor.
Codd's 12 Rules
Dr Edgar F. Codd, after his extensive research on the Relational Model of database systems, came up with twelve
rules of his own, which according to him, a database must obey in order to be regarded as a true relational database.
These rules can be applied on any database system that manages stored data using only its relational capabilities.
This is a foundation rule, which acts as a base for all the other rules.
Rule 1: Information Rule
The data stored in a database, may it be user data or metadata, must be a value of some table cell. Everything in a
database must be stored in a table format.
Rule 2: Guaranteed Access Rule
Every single data element (value) is guaranteed to be accessible logically with a combination of table-name,
primary-key (row value), and attribute-name (column value).
Rule 3: Systematic Treatment of NULL Values
The NULL values in a database must be given a systematic and uniform treatment. This is a very important rule
because a NULL can be interpreted as one the following data is missing, data is not known, or data is not
applicable.
Rule 4: Active Online Catalog
The structure description of the entire database must be stored in an online catalog, known as data dictionary,
which can be accessed by authorized users. Users can use the same query language to access the catalog which they
use to access the database itself.
Rule 5: Comprehensive Data Sub-Language Rule
A database can only be accessed using a language having linear syntax that supports data definition, data
manipulation, and transaction management operations. This language can be used directly or by means of some
application. If the database allows access to data without any help of this language, then it is considered as a
violation.
Rule 6: View Updating Rule
All the views of a database, which can theoretically be updated, must also be updatable by the system.
Rule 7: High-Level Insert, Update, and Delete Rule
A database must support high-level insertion, updation, and deletion. This must not be limited to a single row, that is,
it must also support union, intersection and minus operations to yield sets of data records.
Rule 8: Physical Data Independence
The data stored in a database must be independent of the applications that access the database. Any change in the
physical structure of a database must not have any impact on how the data is being accessed by external applications.
Rule 9: Logical Data Independence
The logical data in a database must be independent of its users view (application). Any change in logical data must
not affect the applications using it. For example, if two tables are merged or one is split into two different tables,
there should be no impact or change on the user application. This is one of the most difficult rule to apply.
Rule 10: Integrity Independence
A database must be independent of the application that uses it. All its integrity constraints can be independently
modified without the need of any change in the application. This rule makes a database independent of the front-end
application and its interface.
Rule 11: Distribution Independence
The end-user must not be able to see that the data is distributed over various locations. Users should always get the
impression that the data is located at one site only. This rule has been regarded as the foundation of distributed
database systems.
Rule 12: Non-Subversion Rule
If a system has an interface that provides access to low-level records, then the interface must not be able to subvert
the system and bypass security and integrity constraints
Relational data model
Relational data model is the primary data model, which is used widely around the world for data storage and
processing. This model is simple and it has all the properties and capabilities required to process data with storage
efficiency.
Concepts
Tables In relational data model, relations are saved in the format of Tables. This format stores the relation among
entities. A table has rows and columns, where rows represents records and columns represent the attributes.
Tuple A single row of a table, which contains a single record for that relation is called a tuple.
Relation instance A finite set of tuples in the relational database system represents relation instance. Relation
instances do not have duplicate tuples.
Relation schema A relation schema describes the relation name (table name), attributes, and their names.
Relation key Each row has one or more attributes, known as relation key, which can identify the row in the
relation (table) uniquely.
Attribute domain Every attribute has some pre-defined value scope, known as attribute domain.
Constraints
Every relation has some conditions that must hold for it to be a valid relation. These conditions are called
Relational Integrity Constraints. There are three main integrity constraints
Key constraints
Key Constraints
Domain constraints
There must be at least one minimal subset of attributes in the relation, which can identify a tuple uniquely. This
minimal subset of attributes is called key for that relation. If there are more than one such minimal subsets, these
are called candidate keys.
Key constraints force that
in a relation with a key attribute, no two tuples can have identical values for key attributes.
Mapping Relationship
A relationship is an association among
entities.
Mapping Process
Create table for a relationship.
Add the primary keys of all participating Entities as fields of table with their respective data
types.
If relationship has any attribute, add each attribute as field of table.
Declare a primary key composing all the primary keys of participating entities.
Declare all foreign key constraints.
Mapping Weak Entity Sets
A weak entity set is one which does not have
any primary key associated with it.
Mapping Process
Create table for weak entity set.
Add all its attributes to table as field.
Add the primary key of identifying entity set.
Declare all foreign key constraints.
SQL
SQL stands for Structured Query Language. SQL is a programming language for Relational Databases. It
is designed over relational algebra and tuple relational calculus. SQL comes as a package with all major
distributions of RDBMS.
SQL comprises both data definition and data manipulation languages. Using the data definition properties
of SQL, one can design and modify database schema, whereas data manipulation properties allows SQL
to store and retrieve data from database.
SQL commands can be divided into three subgroups, DDL, DML and DCL
DDL
DDL is short name of Data Definition Language, which deals with database schemas and descriptions, of
how the data should reside in the database.
CREATE to create database and its objects like (table, index)
ALTER alters the structure of the existing database
DROP delete objects from the database
TRUNCATE remove all records from a table, including all spaces allocated for the records are removed
COMMENT add comments to the data dictionary
Now we want to create a table called "Persons" that contains five columns: PersonID, LastName,
FirstName, Address, and City. We use the following CREATE TABLE statement:
Example
Database Tables
A database most often contains one or more tables. Each table is identified by a name (e.g. "Customers" or
"Orders"). Tables contain records (rows) with data.
Below is a selection from the "Customers" table:
The table above contains five records (one for each customer) and seven columns (CustomerID,
CustomerName, ContactName, Address, City, PostalCode, and Country).
SELECT STATEMENT
The SELECT statement is used to select data from a database.
The SQL SELECT Statement
The SELECT statement is used to select data from a database.
The result is stored in a result table, called the result-set.
SQL SELECT Syntax
SELECT column_name,column_name
FROM table_name;
and
SELECT * FROM table_name;
SELECT Column Example
The following SQL statement selects the "CustomerName" and "City" columns from the "Customers" table:
Example
SELECT CustomerName,City
FROM Customers;
SELECT * Example
The following SQL statement selects all the columns from the "Customers" table:
Example
SELECT * FROM Customers;
Operator
Description
Operator Description
Equal
<>
Not equal.
>
Greater than
<
Less Than
>=
<=
BETWEE
N
LIKE
IN
ORDER BY Country;
ORDER
Example
BY
Several
Columns
UPDATE Customers
SET ContactName='Alfred Schmidt', City='Hamburg'
WHERE CustomerName='Alfreds Futterkiste';
ContactName='Maria Anders';
LEFT JOIN: returns all rows from the left table, even if there are no matches in the right table.
RIGHT JOIN: returns all rows from the right table, even if there are no matches in the left table.
FULL JOIN: returns rows when there is a match in one of the tables.
SELF JOIN: is used to join a table to itself as if the table were two tables, temporarily renaming at
least one table in the SQL statement.
CARTESIAN JOIN: returns the Cartesian product of the sets of records from the two or more joined
tables.
Here given condition could be any given expression based on your requirement.
Example:
Consider the following two tables, a) CUSTOMERS table is as follows: & (b) Another table is ORDERS as follows:
(given above)
Now, let us join these two tables using LEFT JOIN as
This would produce the following result:
follows:
SELECT ID, NAME, AMOUNT, DATE
FROM CUSTOMERS
LEFT JOIN ORDERS
ON CUSTOMERS.ID =
ORDERS.CUSTOMER_ID;
Normalization is a process of organizing the data in database to avoid data redundancy, insertion anomaly, update
anomaly & deletion anomaly. Lets discuss about anomalies first then we will discuss normal forms with examples.
Anomalies in DBMS
There are three types of anomalies that occur when the database is not normalized. These are Insertion, update and
deletion anomaly. Lets take an example to understand this.
No non-prime attribute is dependent on the proper subset of any candidate key of table.
An attribute that is not part of any candidate key is known as non-prime attribute.
Example: Suppose a school wants to store the data of teachers
and the subjects they teach. They create a table that looks like
this: Since a teacher can teach more than one subjects, the
table can have multiple rows for a same teacher.
Candidate Keys: {teacher_id, subject}
Non prime attribute: teacher_age
The table is in 1 NF because each attribute has atomic values. However, it is not in 2NF because non prime attribute
teacher_age is dependent on teacher_id alone which is a proper subset of candidate key. This violates the rule for 2NF
as the rule says no non-prime attribute is dependent on the proper subset of any candidate key of the table.
To make the table complies with 2NF we can break it in two tables like this:
Teacher_details table
Teacher_subject table
Transitive functional dependency of non-prime attribute on any super key should be removed.
An attribute that is not part of any candidate key is known as non-prime attribute.
In other words 3NF can be explained like this: A table is in 3NF if it is in 2NF and for each functional
dependency X-> Y at least one of the following conditions hold:
An attribute that is a part of one of the candidate keys is known as prime attribute.
emp_dept
dept_type
dept_no_of_emp
1001
Austrian
D001
200
1001
Austrian
stores
D001
250
1002
American
D134
100
1002
American
Purchasing department
D134
600
Functional dependencies in the table above:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}
Candidate key: {emp_id, emp_dept}
The table is not in BCNF as neither emp_id nor emp_dept alone are keys.
To make the table comply with BCNF we can break the table in three tables like this:
emp_dept_mapping table:
emp_nationality table: emp_dept table:
emp_id emp_nationality
emp_id
emp_dept
1001
Austrian
1001
1002
American
Production
and
D001
planning
200
1001
stores
stores
D001
250
1002
design and
technical D134
support
1002
Purchasing department
100
Purchasing
D134
department
600
Functional dependencies:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}
Candidate keys:
For first table: emp_id
For second table: emp_dept
For third table: {emp_id, emp_dept}
This is now in BCNF as in both the functional dependencies left side part is a key.
inside the transaction have to be rolled back. As soon as SQL Server detects such a situation, it uses the stored
records to bring the database back to the consistent state it was in before the transaction was started.
SQL Server keeps all those records in one or more system files, called the transaction log. In particular, this
log contains the before and after values of each changed column during transactions. The transaction log can
then be used to perform automatic recovery or a restore process. After a failure, SQL Server uses stored values
from the transaction log (called before images) to restore all pages on the disk to their previous consistent state.
In the case of a restore, the transaction log is always used together with a database backup copy to recover the
database. The transaction log is generally needed to prevent a loss of all changes that have been executed
since the last database backup.
Backup
SQL Server provides static as well as dynamic backup. (Dynamic backup means that a database backup can be
performed while users are working on data.) In contrast to some other DBMSs, which back up all databases
together, SQL Server does the backup of each database separately. This method increases security when it
comes time to restore each database, because a restoration of each database separately is more secure than
restoring all databases at once.
SQL Server provides four different backup methods:
Full database backup
Differential database backup
Transaction log backup
Database file (or filegroup) backup
Full Database Backup
A full database backup captures the state of the database at the time the backup is started. During the full
database backup, the system copies the data as well as the schema of all tables of the database and the
corresponding file structures. If the full database backup is executed dynamically, SQL Server records any activity
that took place during the backup. Therefore, even all uncommitted transactions in the transaction log are written
to the backup media.
Differential Backup
Using differential backup, only the parts of the database that have changed since the last full database backup
are read and then written to the copy. (As in the full database backup, any activity that took place during the
differential backup is backed up, too.) The advantage of a differential backup is speed. It minimizes the time
required to back up a database, because the amount of data to be backed up is considerably smaller than in the
case of the full backup. (Remember that a full database backup includes a copy of all database pages.)
Transaction Log Backup
A transaction log backup considers only the changes recorded in the log. This form of backup is therefore not
based on physical parts (pages) of the database, but on logical operationsthat is, changes executed using the
DML statements INSERT, UPDATE, and DELETE. Again, because the amount of data is smaller, this process can
be performed significantly quicker than the full database backup and quicker than a differential backup.
There are two main reasons to perform the transaction log backup: first, to store the data that has changed since
the last transaction log backup or database backup on a secure medium; second (and more importantly), to
properly close the transaction log up to the beginning of the active portion of it. (The active portion of the
transaction log contains all uncommitted transactions.)
Using the full database backup and the valid chain of all closed transaction logs, it is possible to propagate a
database copy on a different computer. This database copy can then be used to replace the original database in
case of a failure. (The same scenario can be established using a full database backup and the last differential
backup.)
SQL Server does not allow you to store the transaction log in the same file in which the database is stored. One
reason for this is that if the file is damaged, the use of the transaction log to restore all changes since the last
backup will not be possible.
Using a transaction log to record changes in the database is a common feature used by nearly all existing
relational DBMSs. Nevertheless, situations may arise when it becomes helpful to switch this feature off. For
example, the execution ofa heavy load can last for hours. Such a program runs much faster when the logging is
switched off. On the other hand, switching off the logging process is dangerous, as it destroys the valid chain of
transaction logs. To ensure the database recovery, it is strongly recommended that you perform full database
backup after the successful end of the load.
One of the most common system failures occurs because the transaction log is filled up. Be aware that the use of
a transaction log in itself may cause a complete standstill of the system. If the storage used for the transaction log
fills up to 100 percent, SQL Server must stop all running transactions until the transaction log storage is freed
again. This problem can only be avoided by making frequent backups of the transaction log: each time you close
a portion of the actual transaction log and store it to a different storage media, this portion of the log becomes
reusable, and SQL Server thus regains disk space.
Some differences between transaction log backups and differential backups are worth noting. The benefit of
differential backups is that you save time in the restore process, because to recover a database completely, you
need a full database backup and only the latest differential backup. If you use transaction logs for the same
scenario, you have to apply the full database backup and all existing transaction logs to bring the database to a
consistent state. A disadvantage of differential backups is that you cannot use them to recover data to a specific
point in time because they do not store intermediate changes to the database.
Database File Backup
Database file backup allows you to back up specific database files (or filegroups) instead of the entire database.
In this case, SQL Server backs up only files you specify. Individual files (or filegroups) can be restored from
database backup, allowing recovery from a failure that affects only a small subset of the database files. Individual
files or filegroups can be restored from either a database backup or a filegroup backup. This means that you can
use database and transaction log backups as your backup procedure and still be able to restore individual files (or
filegroups) from the database backup.
Recovery
Whenever a transaction is submitted for execution, SQL Server is responsible either for executing the transaction
completely and recording its changes permanently in the database or for guaranteeing that the transaction has no
effect at all on the database. This approach ensures that the database is consistent in case of a failure, because
failures do not damage the database itself, but instead affect transactions that are in progress at the time of the
failure. SQL Server supports both automatic and manual recovery.
Automatic Recovery
Automatic recovery is a fault-tolerant feature that SQL Server executes every time it is restarted after a failure or
shutdown. The automatic recovery process checks to see if the restoration of databases is necessary. If it is, each
database is returned to its last consistent state using the transaction log.
SQL Server examines the transaction log from the last checkpoint to the point at which the system failed or was
shut down. (A checkpoint is the most recent point at which all data changes are written permanently to the
database from memory. Therefore, a checkpoint ensures the physical consistency of the data.) The transaction
log contains committed transactions (transactions that are successfully executed, but their changes have not yet
been written to the database) and uncommitted transactions (transactions that are not successfully executed
before a shutdown or failure occurred). SQL Server rolls forward all committed transactions, thus making
permanent changes to the database, and undoes the part of the uncommitted transactions that occurred before
the checkpoint.
SQL Server first performs the automatic recovery of the master database, followed by the recovery of all other
system databases. Then, all user-defined databases are recovered.
Manual Recovery
A manual recovery of a database specifies the application of the backup of your database and subsequent
application of all transaction logs in the sequence of their creation. After this, the database is in the same
(consistent) state as it was at the point when the transaction log was backed up for the last time.
When you recover a database using a full database backup, SQL Server first re-creates all database files and
places them in the corresponding physical locations. After that, the system re-creates all database objects.
Recovery Models
Microsoft introduced in SQL Server 2000 the new feature called database recovery model, which allows you to
control to what extent you are ready to risk losing committed transactions if a database is damaged. Additionally,
the choice of a recovery model has an impact on the size of the transaction log and therefore on the time period
needed to back up the log. SQL Server supports three recovery models:
Full
Bulk-logged
Simple
The following sections describe the three recovery models.
Full Recovery
During full recovery, all operations are written to the transaction log. Therefore, this model provides complete
protection against media failure. This means that you can restore your database up to the last committed
transaction that is stored in the log file. Additionally, data can be recovered to any point in time (prior to the point
of failure). To guarantee this, such operations as SELECT INTO and the execution of the bcp utility are fully
logged, too.
Besides point-in-time recovery, the full recovery model allows you also to recover to a log mark. Log marks
correspond to a specific transaction and are inserted only if the transaction commits. (For more information on log
marks, see the corresponding section later in this chapter.)
The full recovery model also logs all operations concerning the CREATE INDEX statement, implying that the
process of data recovery now includes the restoration of index creations. That way, the re-creation of the indices
is faster, because you do not have to rebuild them separately.
The disadvantage of this recovery model is that the corresponding transaction log may be very voluminous and
the files on the disk containing the log will be filled up very quickly. Also, for such a voluminous log you will need
significantly more time for backup.
Bulk-Logged Recovery
Bulk-logged recovery supports log backups by using minimal space in the transaction log for certain large-scale
or bulk operations. The logging of the following operations is minimal and cannot be controlled on an operationby-operation basis:
SELECT INTO
CREATE INDEX (including indexed views)
bcp utility and BULK INSERT
WRITETEXT and UPDATETEXT
Although bulk operations are not fully logged, you do not have to perform a full database backup after the
completion of such an operation. During bulk-logged recovery, transaction log backups contain both the log as
well as the results of a bulk operation. This simplifies the transition between full and bulk-logged recovery models.
The bulk-logged recovery model allows you to recover a database to the end of a transaction log backup (i.e., up
to the last committed transaction). In contrast to the full recovery model, bulk-logged recovery does not support
generally point-in-time recovery. (You can use bulk-logged recovery for the point-in-time recovery if no bulk
operations have been performed.)
The advantage of the bulk-logged recovery model is that bulk operations are performed much faster than under
the full recovery model, because they are not fully logged.
Simple Recovery
In the simple recovery model, the transaction log is not used to protect your database against any media failure.
Therefore, you can recover a damaged database only using full database or differential backup. Backup strategy
for this model is very simple: Restore the database using existing database backups and, if differential backups
exist, apply the most recent one.
The advantages of the simple recovery model are that the performance of all bulk operations is very high and
requirements for the log space very small. On the other hand, this model requires the most manual work because
all changes since the most recent database (or differential) backup must be redone. Point-in-time as well as page
restore are not allowed with this recovery model. Also, file restore is available only for read-only secondary
filegroups.
Recovery with Concurrent Transactions
When more than one transaction are being executed in parallel, the logs are interleaved. At the time of recovery, it
would become hard for the recovery system to backtrack all logs, and then start recovering. To ease this situation,
most modern DBMS use the concept of 'checkpoints'.
Checkpoint
Keeping and maintaining logs in real time and in real environment may fill out all the memory space available in
the system. As time passes, the log file may grow too big to be handled at all. Checkpoint is a mechanism where
all the previous logs are removed from the system and stored permanently in a storage disk. Checkpoint declares
a point before which the DBMS was in consistent state, and all the transactions were committed.
Recovery
When a system with concurrent transactions crashes and recovers, it behaves in the following manner
If the recovery system sees a log with <T n, Start> but no commit or abort log found, it puts the
transaction in undo-list.
All the transactions in the undo-list are then undone and their logs are removed. All the transactions in the redolist and their previous logs are removed and then redone before saving their logs.